The tokenizer has been updated! Version 3.0 is now on GitHub. It has introduced a training data component that learns from our annotators’ most common tokenization and correction practices. The tokenizer breaks Coptic text segmented as bound groups into morphemes for analysis/annotation
(Originally posted on copticscriptorium.org on 5/22/15.)
Leave a Reply