Release of the updated tokenizer

June 8, 2015 / David Sriboonreuang / 0 Comments

The tokenizer has been updated! Version 3.0 is now on GitHub. It has introduced a training data component that learns from our annotators’ most common tokenization and correction practices. The tokenizer breaks Coptic text segmented as bound groups into morphemes for analysis/annotation

(Originally posted on copticscriptorium.org on 5/22/15.)

Release Notes

tokenizer tools

Coptic SCRIPTORIUM Blog

Release of the updated tokenizer

Related

Leave a Reply Cancel reply

Recent Posts

Categories

Tags

Follow us on Twitter

Meta

Coptic SCRIPTORIUM Blog

Release of the updated tokenizer

Share this:

Related

Previous post

Next post

Leave a Reply Cancel reply

Recent Posts

Categories

Tags

Follow us on Twitter

Meta