Check out the new Universal Dependencies (UD) release V2.6! This is the twelfth release of the annotated treebanks at http://universaldependencies.org/. The project now covers syntactically annotated corpora in 92 languages, including Coptic. The size of the Coptic Treebank is now around 43,000 words, and growing. For the latest version of the Coptic data, see our development branch here: https://github.com/UniversalDependencies/UD_Coptic-Scriptorium/tree/dev. For documentation, see the UD Coptic annotation guidelines.
The inclusion of the Coptic Treebank in the UD dataset means that many standard parsers and other NLP tools trained on all well attested UD languages now support Coptic out-of-the-box, including Stanford NLP’s Stanza and UFAL’s UDPipe. Feel free to try out these libraries for your data! For optimal performance on open domain Coptic text, we still recommend our custom tool-chain, Coptic-NLP, which is highly optimized to Coptic and uses additional resources beyond the treebank. Or try it out online:
Leave a Reply