We are pleased to announce the release of version 6.2.0 of the Coptic Scriptorium data! Our corpora now total 2,375,875 words. This release provides significant new annotated data in both the Bohairic and Sahidic Dialects:
- New parts of works in Bohairic, including:
- The Life of Shenoute, Parts 2 & 3 (part 1 was released in September)
- The Lausiac History, Parts 2 & 3 (part 1 was released in September)
- New corpora
- The Gospel of Thomas, edited from the manuscript by Paul Dilley
- The Sahidic book of Jonah (with manual edits and corrections to NLP annotations by Stephan Claassen; the automatically processed Jonah is in the Coptic OT corpus)
- New documents in the following existing corpora:
- Apophthegmata Patrum
- Shenoute’s work known as Acephalous Work 22
- More Arabic and English translations for documents previously published
We are grateful to our collaborators and contributors who have made this release possible, particularly Caroline T. Schroeder, Amir Zeldes, Nicholas Wagner, and Paul Dilley as well as Nina Speranskaja, Rebecca Krawiec, Christine Luckritz Marquis, Stephan Claassen, Philippe Zaher, and Safaa Mahfouz. We also want to thank Hany Takla and the St. Shenouda the Archimandrite Coptic Society for their collaborations and support. Additionally, we thank our donors for contributions that made much of the work on this release possible. Please consider supporting Coptic Scriptorium as we navigate the new funding environment in the USA.
As with all our releases, the raw machine-readable data for all corpora—including morphological and syntactic annotations, as well as named entity recognition—are available in our GitHub repository. Data can be downloaded in a variety of popular formats to suit your research needs.
You can read and browse entire documents in an online portal. Our corpora are also linked in entries on the Coptic Dictionary Online.
For searching, including advanced linguistic queries, you can explore the data using our ANNIS server. To help you get started, check out our tutorial with query tips and a convenient cheat sheet. Currently, the Arabic translations are only available in ANNIS, as well.


