It is our great pleasure to announce the latest release of data from Coptic Scriptorium, version 4.0.0. This release contains both new Coptic material and extensive additions to our suite of tools and annotations, focusing on the addition of support for entity annotation and named-entity linking across our new and old datasets. The new material, including more digitized data courtesy of the Marcion project and other scholars, includes:
- John of Constantinople, on Penitence and Abstinence (annotations by Mitchell Abrams, Lance Martin and Amir Zeldes)
- Pseudo-Chrysostom: (Elizabeth Davidson, Mitchell Abrams, Lance Martin, Amir Zeldes)
- Pseudo-Basil of Caesarea, on the End of the World and the Temple of Solomon (Lance and Amir)
- Life of Pisentius, parts 1-2 (Lance and Amir)
- Expansions and improvements of existing corpora:
- More Apophthegmata Patrum (Hayley Curtis, Elizabeth Davidson, Duncan Feiges, Elizabeth Platte, Caroline T. Schroeder, Amir Zeldes)
- Further material from Shenoute’s works:
- God Says Through Those Who Are His (including parallel witnesses and new material, data courtesy of David Brakke, annotations by Rebecca Krawiec, Lance Martin, Dana Robinson, Caroline T. Schroeder)
- Some Kinds of People Sift Dirt (data courtesy of David Brakke, annotations by Christine Luckritz Marquis, Caroline T. Schroeder, Amir Zeldes)
With this new release, the semi-automatically annotated data (excluding automatically processed Bible materials) in the project covers close to 260,000 words of Sahidic Coptic annotated for entities, including 50,000 words of gold-standard treebanked data with manual syntactic analyses.
In addition to new texts, new tools and analyses have been added to the project:
- Complete entity annotation, classifying all non-pronominal references to people, places and other entities into 10 entity categories
- Entity linking:
- Linking of all named entities which have corresponding Wikipedia articles to their respective Wikipedia entries, including geo-location information where available
- A browseable index of people and places mentioned in the texts, also linked to Wikipedia and Google Maps and including both real and fictional entities
- Search and visualization:
- Search by entity type and named entity in ANNIS
- New configurable analytic visualization which displays nested entity types, highlights named entities and links them to Wikipedia
- Natural Language Processing
- Automatic entity recognition is now available (work by Amir Zeldes, Lance Martin and Sichang Tu)
- A new neural parser adapted for Coptic with higher accuracy syntactic analyses, which are deployed in ANNIS (work by Luke Gessler)
This release represents a tremendous amount of work over the past few months by the entire Coptic Scriptorium team. We would also like to thank individual contributors (which you can always find in the ‘annotation’ metadata for each document) and the Marcion and PAThs projects who shared their data with us, and the National Endowment for the Humanities for supporting us. We are continuing to work on more data, links to other resources and new kinds of annotations and tools. Please let us know if you have any feedback!