Uncategorized – Coptic SCRIPTORIUM Blog

Category: Uncategorized (page 1 of 3)

New Corpora Release 5.0.0

May 30, 2024 / Lydia Bremer-McCollum / 0 Comments

We are pleased to announce release 5.0.0 of Coptic Scriptorium! Our data now includes over 1,288,229 tokens of searchable, linguistically analyzed Coptic data from dozens of ancient Coptic works.

This release also marks the introduction of Bohairic Coptic data to our corpus holdings: the repository now contains Bohairic Bible materials, covering Mark 1-16 and 1 Cor. 1-16, with manually reviewed segmentation for the entire corpus, and manual tagging and treebanking for chapters 1-5 in each book. Segmentation and tagging were reviewed in collaboration with Nicholas Wagner, and treebanking was done in collaboration with Nina Speranskaja. As a result of this work, we are in the process of compiling new NLP tools and guidelines specifically for Bohairic.

In addition, the release includes corrections and updates to existing corpora as well as the addition of several new Sahidic works and documents:

A. Sections of five works by Shenoute of Atripe:

B. New documents were added to existing works:

C. Newly added translation spans for Pistis Sophia, aligned by Randy Komforty

These join the newly treebanked and tagged Bohairic data, which can be found here:

We are very grateful to all of our collaborators and contributors, without whom this project could not function. We welcome Nicholas Wagner to the team and warmly thank Randy Komforty for his work on Pistis Sophia, and Nina Sepranskaja for her treebanking work.

As with all our releases, raw machine readable data for all corpora can be found, including morphological and syntactic analysis, as well as named entity recognition and entity linking (currently only for Sahidic), in this GitHub repository, in a variety of popular formats: https://github.com/CopticScriptorium/corpora

You can also search for complex linguistic annotations in the data using our ANNIS server – please see our tutorial here to get started with some query tips and a helpful cheat sheet: https://copticscriptorium.org/ANNIS_tutorial

New Corpora Release 4.5.0

October 24, 2023 / Lydia Bremer-McCollum / 0 Comments

Image of a search in ANNIS — Searching for “people” in Shenoute’s *God Who Alone is True*

We are pleased to announce release 4.5.0 of Coptic Scriptorium! Our data now includes over 1,278,500 tokens of searchable, linguistically analyzed Coptic data from dozens of ancient Coptic works (an increase of over 11,500 tokens from the previous release).

This release corrects a large number of consistency errors identified in our existing data, and also adds some new documents:

Revisions to five works of Besa:
Sections of three works by Shenoute of Atripe:
New documents were added to existing works:
- Acephalous Work 22
- Apophthegmata Patrum
Newly treebanked data with syntactic gold standard annotations for 1 Corinthians 7

We are very grateful to all of our collaborators and contributors, without whom this project could not function. We welcome Christine Ayad, Lydia Bremer-McCollum, Adeline Harrington, and Nina Speranskaja.

As with all releases, raw machine readable data for all corpora can be found, including morphological and syntactic analysis, as well as named entity recognition and entity linking, on our GitHub repository, in a variety of popular formats:

https://github.com/copticscriptorium/corpora

You can also search for complex linguistic annotations in the data using our ANNIS server – please see our new tutorial here to get started with some query tips and a helpful cheat sheet:

https://copticscriptorium.org/ANNIS_tutorial

We hope this release will be useful and look forward to the next one as always!

Online Coptic Scriptorium Reading Group

October 4, 2023 / Lydia Bremer-McCollum / 1 Comment

Description:

The online Coptic Scriptorium Reading Group consists of a series of short term reading units focused on a particular Coptic text or corpus. The group will focus on reading, analyzing, and translating a set of Coptic texts found in the digital corpus of the Coptic Scriptorium project. We will also practice reading out loud. In addition, the reading units will introduce and demonstrate a variety of Coptic focused digital humanities tools. These include especially the Coptic Online Dictionary and tools for linguistic and corpus analysis. Participants will leave with stronger Coptic reading facility, grammar review, and an introduction to a range of open-access digital humanities tools for studying Coptic.

Audience:

This group is perfect for Coptologists, historians, linguists, heritage learners, and hobbyists alike. All are welcome! The reading group and text/corpus sequence are designed for introductory and lower-intermediate level reading ability (1-2 semesters of instruction or a roughly equivalent amount of self-directed learning). In other words, some previous grammar training and knowledge of the alphabet will be presumed.

Email organizer (lcbm@ou.edu) if you would like to join or if you have questions about your fit for the group! Zoom link will be shared with the email list.

Schedule:

Meet via Zoom for 1 hour 3x per week.

Time: 10:00am EST | 9:00am CST | 7:00am PST

4:00pm CEST | 3:00pm BST | 5:00pm EEST | 11:00pm JST

Note: Attendance at all sessions within a series is not required. Moreover, the series can be taken sequentially, but it is not assumed or required. Feel free to drop-in as your schedule and interest allow!

Fall/Winter 2023 Dates:

Sequence A:

October 23rd, 25th, 27th, 30th, 1st, 3rd (6 hours total)

Reading Material: Apothegmata Patrum Selections

Sequence B:

November 6th, 8th, 10th, 13th, 15th, 17th (6 hours total)

Reading Material: Ruth Selections

Sequence C:

December 4th, 6th, 8th, 11th, 13th, 15th, 18th, 20th (8 hours total)

Reading Material: Apothegmata Patrum Selections

2024 Schedule TBD

Note: Time and weekly meeting frequency of sequence B and C are subject to change based on group feedback.

Group Organizer:

Dr. Lydia Bremer-McCollum (she/hers) holds ten+ years of experience reading and translating Coptic. She has taught introductory and intermediate courses at Harvard and Princeton Universities. As part of the NEH funded Coptic Scriptorium project, she aims to create an online Coptic reading community focused on reading and utilizing the rich set of open-access tools. Feel free to contact her with any questions at lcbm@ou.edu

Coptic Scriptorium Awarded an NEH Grant to Expand Corpora and Add More Dialects

The Coptic Scriptorium team is honored to have been awarded an NEH Preservation and Access/Humanities Collections and Reference Resources Implementation Grant in the amount of $349,887. This award will fund a 3-year project Expanding Coptic Digital Online Collections. You can read the press release and list of awarded grants on the NEH site. This initiative will enable Coptic Scriptorium to improve the user experience, to expand our digital database of richly annotated texts in the Sahidic dialect, and to develop natural language tools and searchable, annotated, digitized corpora for additional dialects, including Bohairic. Caroline T. Schroeder (University of Oklahoma) is PI, and Amir Zeldes (Georgetown University) is co-PI. The team also includes Rebecca Krawiec (Canisius College), Christine Luckritz Marquis (Union Presbyterian Seminary), and Hany Takla (St. Shenouda Society), as well as a diverse advisory board. We thank our whole team past and present for the work that led to this stage, and we are grateful to the National Endowment for the Humanities for their ongoing support.

New Corpora Release 4.2.0

September 30, 2021 / Amir Zeldes / 0 Comments

**Automatic linguistic analysis and Entity Linking from I Samuel 25**

It is our pleasure to announce the latest data release from Coptic Scriptorium, version 4.2.0. This release contains both new Coptic material and additions to older datasets, as well as expanding our entity annotations and named-entity linking to all of our data, including the semi-automatically annotated Old Testament. The also means automatic updates to all of our interfaces, such as the recently added example usage functionality in the Coptic Dictionary Online, which is linked to the corpora.

The new material, including more digitized data courtesy of the Marcion project, as well as manually digitized and corrected OCR data from out of print editions includes:

Encomium of Pseudo-Celestinus on Victor (annotations by Mitchell Abrams and Lance Martin)
Encomium of Pseudo-Flavianus on Demetrius, Archbishop of Alexandria (annotations by Mitchell Abrams, Lance Martin and Amir Zeldes)
Added works by Shenoute of Atripe:
- In the Night (Canons 9, annotations by Lance Martin, Caroline T. Schroeder and Amir Zeldes)
- Because of You Too O Prince of Evil (Discourses 4, annotations by Tamara Siuda, Lance Martin and Caroline T. Schroeder)
Expansions and improvements of existing corpora:
- More Apophthegmata Patrum (work by Christine Luckritz Marquis, So Miyagawa, Caroline T. Schroeder and Amir Zeldes)
- Further material from Shenoute’s works:
  - God Says Through Those Who Are His (including parallel witnesses and new material, data courtesy of David Brakke, annotations by Rebecca Krawiec, Lance Martin, Dana Robinson, Caroline T. Schroeder)
  - Acephalous Work 22 (data courtesy of David Brakke, annotations by Elizabeth Davidson, Rebecca Krawiec, Elizabeth Platte, Caroline T. Schroeder, Amir Zeldes)
- More syntactically annotated gold treebanked data in the Coptic Treebank
- Completely re-annotated Old Testament corpus, based on the base text courtesy of the Digital Edition of the Coptic Old Testament (CoptOT) project – with improved segmentation and parsing, now complete with semi-automatic entity recognition and linking to Wikipedia entries for people and places

With this new release, the semi-automatically annotated data (excluding automatically processed Bible materials) in the project covers close to 300,000 words of Sahidic Coptic annotated for entities.

This release represents a tremendous amount of work over the past few months by the Coptic Scriptorium team. We would also like to thank individual contributors (which you can always find in the ‘annotation’ metadata for each document), and specifically So Miyagawa for help with Coptic OCR models, as well as the Marcion and CoptOT project for sharing their data with us, and the National Endowment for the Humanities for supporting us. We are continuing to work on more data, links to other resources and new kinds of annotations and tools. Please let us know if you have any feedback!

Winter 2021 Corpora Release 4.1.0

April 2, 2021 / Lance Martin / 0 Comments

Amulet to protect place and animals. O. Crum ST 18 (KYP T344), side-by-side with its rendition on Coptic Scriptorium. Image source.

We are pleased to announce the latest release of data from Coptic Scriptorium, version 4.1.0. The new release adds new Coptic texts and annotation additions, underscored by the application of named and non-named entity annotation to our New Testament corpus.

In total, we released approximately 40,000 tokens of manually edited text in 17 documents from new works, as well as adding material to already existing works. The new material, including more digitized data courtesy of the Marcion project, the Kyprianos Magical Text Database, and other scholars, includes:

Life of John the Kalybites, parts 1 and 2 (annotations by Lance Martin, Tamara, Siuda, and Caroline T. Schroeder)
Mysteries of John the Evangelist, parts 1 and 2 (Mitchell Abrams, Lance Martin, Tamara Siuda, Caroline T. Schroeder)
Pseudo-Ephrem, The Asketicon of Apa Ephrem, parts 1 and 2 (Lance Martin and Caroline T. Schroeder)
Pseudo-Timothy of Alexandria Discourses, Discourse on Abbaton, parts 1 and 2 (Elizabeth Davidson, Lance Martin, Caroline T. Schroeder, and Amir Zeldes)

We are especially excited to announce the first release of several magical papyri and an ostracon on the Coptic Scriptorium platform in collaboration with the Kyprianos team at the University of Würzburg:

Magical Texts (Korshi Dosoo, Edward O. D. Love, Markéta Preininger, Lance Martin, Caroline T. Schroeder, and Amir Zeldes)

Expansions and Improvements of existing corpora:

Apa Johannes Canons (Diliana Atanassova, Caroline T. Schroeder, Lance Martin, and Amir Zeldes)
Apophthegmata Patrum (Marina Ghaly, Christine Luckritz Marquis, Caroline T. Schroeder)

We have extended our semi-automatic entity annotation coverage to encompass our New Testament material (over 248,000 tokens). Entity annotations, like our other annotations, were added to these specific corpora automatically and include:

The classification of all non-pronominal references to people, places and other entities into 10 entity categories
Entity linking:
- Linking of all named entities which have corresponding Wikipedia articles to their respective Wikipedia entries, including geo-location information where available

This addition complements the existing named and non-named entity annotations of our entire collections of Coptic corpora.

We would also like to thank individual contributors (which you can always find in the ‘annotation’ metadata for each document), each of whom put in a colossal amount of work, and the Marcion and Kyprianos projects who shared their data with us, as well as the National Endowment for the Humanities for supporting us. We are continuing to create more data and tools. Please let us know if you have any feedback!

Comprehensive Coptic Lexicon v1.2

October 1, 2020 / Amir Zeldes / 0 Comments

The “Thesaurus Linguae Aegyptiae” project (“Strukturen und Transformationen des Wortschatzes der ägyptischen Sprache”, BBAW), the “Database and Dictionary of Greek Loanwords in Coptic” (DDGLC, Freie Universität Berlin), and “Coptic Scriptorium: Digital Research in Coptic Language and Literature” are pleased to announce the latest release of the “Comprehensive Coptic Lexicon”: Version 1.2. The raw data can be downloaded from:

D. Burns, F. Feder, K. John, M. Kupreyev, et al. 2020-07-24. Comprehensive Coptic Lexicon: Including Loanwords from Ancient Greek, Berlin: Freie Universität Berlin, http://dx.doi.org/10.17169/refubium-27566

The processed data has been published by the

Coptic Dictionary Online, ed. Koptische/Coptic Electronic Language and Literature International Alliance (KELLIA), https://coptic-dictionary.org/

The major new features include:

Standardized use of parentheses “( )” in word forms.
Optimizated data structure (e.g., <sense/> element now contains a unique ID, facilitating the ongoing work on linking CCL to the databases of semantic relations such as Coptic WordNet).
Correction of orthographic, grammatical and semantic information of the existing entries and addition of new entries.
Linking to Perseus Greek morphology tool via the Greek head words. DDGLC lemma IDs are now displayed in the entry view of Coptic Dictionary Online.
Improved usability of the section of Greek loanwords due to exclusion or change of a number of senses.
Link to attestation search for nouns filtered by entity-type (e.g., search for ⲟⲩⲟⲛ standing for a person, an animal, or an inanimate object) in Coptic Scriptorium.
Phrase network visualization of most common word sequences containing nouns, verbs and prepositions.

For the full description of the Version 1.2 please refer to the the “Release Notes”: https://refubium.fu-berlin.de/bitstream/handle/fub188/27813/Comprehensive_Coptic_Lexicon_v1.2_Release_Notes.pdf?sequence=5&isAllowed=y

The Comprehensive Coptic Lexicon V 1.2 now contains 11263 entries and 31847 forms of Egyptian-Coptic and Greek-Coptic datasets. TLA, DDGLC and Coptic Scriptorium invite you to take a look at the new data and would welcome your feedback.

Summer 2020 Corpora Release 4.0.0

August 31, 2020 / Amir Zeldes / 0 Comments

Place name index on data.copticscriptorium.org

It is our great pleasure to announce the latest release of data from Coptic Scriptorium, version 4.0.0. This release contains both new Coptic material and extensive additions to our suite of tools and annotations, focusing on the addition of support for entity annotation and named-entity linking across our new and old datasets. The new material, including more digitized data courtesy of the Marcion project and other scholars, includes:

John of Constantinople, on Penitence and Abstinence (annotations by Mitchell Abrams, Lance Martin and Amir Zeldes)
Pseudo-Chrysostom: (Elizabeth Davidson, Mitchell Abrams, Lance Martin, Amir Zeldes)
- On the Canaanite Woman
- On Susanna
Pseudo-Basil of Caesarea, on the End of the World and the Temple of Solomon (Lance and Amir)
Life of Pisentius, parts 1-2 (Lance and Amir)
Expansions and improvements of existing corpora:
- More Apophthegmata Patrum (Hayley Curtis, Elizabeth Davidson, Duncan Feiges, Elizabeth Platte, Caroline T. Schroeder, Amir Zeldes)
- Further material from Shenoute’s works:
  - God Says Through Those Who Are His (including parallel witnesses and new material, data courtesy of David Brakke, annotations by Rebecca Krawiec, Lance Martin, Dana Robinson, Caroline T. Schroeder)
  - Some Kinds of People Sift Dirt (data courtesy of David Brakke, annotations by Christine Luckritz Marquis, Caroline T. Schroeder, Amir Zeldes)

With this new release, the semi-automatically annotated data (excluding automatically processed Bible materials) in the project covers close to 260,000 words of Sahidic Coptic annotated for entities, including 50,000 words of gold-standard treebanked data with manual syntactic analyses.

In addition to new texts, new tools and analyses have been added to the project:

Complete entity annotation, classifying all non-pronominal references to people, places and other entities into 10 entity categories
Entity linking:
- Linking of all named entities which have corresponding Wikipedia articles to their respective Wikipedia entries, including geo-location information where available
- A browseable index of people and places mentioned in the texts, also linked to Wikipedia and Google Maps and including both real and fictional entities
Search and visualization:
- Search by entity type and named entity in ANNIS
- New configurable analytic visualization which displays nested entity types, highlights named entities and links them to Wikipedia
Natural Language Processing
- Automatic entity recognition is now available (work by Amir Zeldes, Lance Martin and Sichang Tu)
- A new neural parser adapted for Coptic with higher accuracy syntactic analyses, which are deployed in ANNIS (work by Luke Gessler)

The new configurable Analytic Visualization with toggleable entity types and links

This release represents a tremendous amount of work over the past few months by the entire Coptic Scriptorium team. We would also like to thank individual contributors (which you can always find in the ‘annotation’ metadata for each document) and the Marcion and PAThs projects who shared their data with us, and the National Endowment for the Humanities for supporting us. We are continuing to work on more data, links to other resources and new kinds of annotations and tools. Please let us know if you have any feedback!

Digital Coptic 3 – program online!

June 25, 2020 / Amir Zeldes / 1 Comment

The program for the third edition of Digital Coptic is now online. Check out the workshop website for the list of projects, talks and presenters.

Please join us for the workshop on July 12 and 13 – participants will receive a Zoom link and password for interactive presentations and discussion, and the workshop will also be cast to YouTube for larger audiences and offline viewing after the workshop. We look forward to hearing all the talks!

A bird’s eye view of Coptic entities

June 15, 2020 / Lance Martin / 0 Comments

Coptic Scriptorium recently annotated its Treebank for entities and will soon use automated tools to annotate all corpora. Entity recognition provides a window into what a text discusses, allowing readers to discover information about people and places of interest found throughout a large number of texts that they could not possibly read exhaustively. The Coptic Scriptorium team has developed a number of tools to visualize and search for entities, which you can browse here:

https://copticscriptorium.org/entities/breakdown.html

Already, we are seeing some interesting trends. Let’s take a closer look!

TreeMapping

Entities are divided into two broad categories—named and non-named. Named entities are headed by a proper noun, e.g., “Apa Pamoun” or “Scetis.” Non-named entities, which constitute the majority of annotations, are headed by a common noun, e.g., “the monk” or “the monastery.” All entities, whether named or non-named, have one of ten entity types, such as ‘person’, ‘place’ and more (see our previous post). In the image below, we see the TreeMap of unnamed places. With the nested view of data such as this, one can easily see patterns that may be missed when viewing the information in another format.

Let’s look at the TreeMap data for non-named place entities. The desert holds an unparalleled place (no pun intended) in Coptic literature, but what exactly do Coptic texts say about it? One click of the mouse would show all eighteen mentions of entities headed by ϫⲁⲓⲉ ‘desert’ (see image below). We can see every instance of the word on the same screen and are able to compare usages. Another search would do the same for all references headed, i.e., no adjectival usages included, by the Greek word ⲉⲣⲏⲙⲟⲥ ‘desert.’ If you want to continue this line of inquiry and read every single instance of ‘desert’ in its larger context, a search for these entities in ANNIS (this function is coming soon) would display every mention in the Coptic corpora, allowing one to quickly see the texts in which these words appear and how they are used.

Entity Term Networking

Entity Term Networks provide a graphic visualization of an entity’s relationships with other words in its span. For an example, let’s look at ⲙⲁ ‘place.’ From the outset, we see that ⲙⲁ is used with a wide variety of determiners and is followed by an even wider variety of constructions, but we simultaneously see that attributive adjectives, such as ⲙⲁ ⲛϣ(ⲱ)ⲱⲡⲉ ‘dwelling place, monk’s cell,’ are more commonly used with ⲙⲁ than relative or genitive constructions. The entity network for ⲙⲁ gives us a clearer idea of its potential semantic relationships: almost always followed by ⲛ ‘of’, continuing to nouns indicating purpose (place of dwelling, lavatory with ⲣⲙⲏ ‘urination’), events (ϣⲉⲗⲉⲉⲧ ‘wedding’), directions (ϣⲁ ‘East’) and more. Try pulling up the network for other Coptic nouns by yourself! As with the TreeMap, the network presents a large amount of data in a small space, revealing patterns and their relative frequency more readily.

Lexical network of words in entities headed by ⲙⲁ ‘place’

Entity Type Proportions

Entity proportions compare entity types among the corpora, visualizing them with a ratio. An average ratio is provided for all Coptic corpora and for a sample of English fiction, so viewers can see how far any given corpus departs from either baseline. The chart below sets the ratio of animals and people side by side. If you are interested in late-antique animals, you may be a little disappointed⁠—they only appear sparsely in the corpora. Any other combination juxtaposing entity types is possible. After looking through the data, it is clear that the Coptic average has a consistently higher ratio of abstract entities than the English fiction counterpart, perhaps representative of the monastic origin of much of its corpora.

Entity Type Proportion of Animals and People

Named/Non-Named by Corpus

The last visualization compares the ratio between named and non-named entities in each corpus. Once again, there is much variation between individual works, including those of the same genre (cf. The Life of Cyrus and The Life of Onnophrius), but the ratio dissimilitude may indicate where differences in content lie, pointing the way toward further research: this surprising difference between saints’ lives may merit more attention.

What’s Next?

Entity annotation makes detailed philological, literary, and historical inquiries from a large number of documents possible by enabling analysis of texts based on the quantity, proportion and dispersion of entity types. They allow us to describe texts on a level of ‘who did what to whom’ and abstract away from individual ways of phrasing references to people and places. We’re looking forward to releasing more tools and data for working with Coptic entities!

Coptic SCRIPTORIUM Blog

Category: Uncategorized (page 1 of 3)

New Corpora Release 5.0.0

New Corpora Release 4.5.0

Online Coptic Scriptorium Reading Group

Coptic Scriptorium Awarded an NEH Grant to Expand Corpora and Add More Dialects

New Corpora Release 4.2.0

Winter 2021 Corpora Release 4.1.0

Comprehensive Coptic Lexicon v1.2

Summer 2020 Corpora Release 4.0.0

Digital Coptic 3 – program online!

A bird’s eye view of Coptic entities

TreeMapping

Entity Term Networking

Entity Type Proportions

Named/Non-Named by Corpus

What’s Next?

Recent Posts

Categories

Tags

Follow us on Twitter

Meta

Category: Uncategorized (page 1 of 3)

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

TreeMapping

Entity Term Networking

Entity Type Proportions

Named/Non-Named by Corpus

What’s Next?

Share this:

Recent Posts

Categories

Tags

Follow us on Twitter

Meta