New Corpora Release 6.0.0

Searching for Greek loanwords in Bohairic Habakkuk

We are pleased to announce the release of version 6.0.0 of Coptic Scriptorium! Our corpus has been dramatically expanded in this release, now exceeding 2.2 million tokens of searchable, linguistically annotated Coptic texts. Among the highlights of this update is the exponential growth of our Bohairic corpus, now comprising approximately 750,000 words and featuring translated texts such as the Bohairic Bible (Old and New Testament), as well as original works such as the Life of Isaac. This milestone brings substantial enhancements to our collections, including modern editions processed with Optical Character Recognition (OCR) technology alongside both new and updated Coptic texts.

New OCR Material and Automatic Tagging

This release includes the addition of OCR-based editions. For the first time, fully automated tagging has been applied to a selection of OCR datasets:

  1. Budge Texts
  2. Lacau, Acts of Pilate
  3. Lacau, Lament of Mary
  4. Sobhy, Martyrdom and Encomium of Helias

Version 6.0.0 also includes several newly curated corpora, reflecting a diversity of dialects, genres, and textual traditions:

  1. More selections now with parallel Arabic translations:
  2. Pseudo-Theophilus:
  3. Mercurius:
  4. Additions to Shenoute of Atripe’s Acephalous Work 22 (A22):
  5. Bohairic texts:
  6. Bohairic Bible selection manually segmented and tagged:

Collaborative Efforts and Future Directions

We are grateful to our collaborators and contributors who have made this release possible, particularly Caroline T. Schroeder and Amir Zeldes, as well as Randy Komforty, Lydia Bremer-McCollum, Lawrence Rafferty, Nina Speranskaja, and Nicholas Wagner. We also want to thank the National Endowment for the Humanities for their ongoing support. The integration of OCR materials and the expansion of our Bohairic collection reflect ongoing efforts to enhance accessibility and analytical tools for Coptic studies. These advances also pave the way for further development of NLP tools for our users.

Accessing the Data

As with all our releases, the raw machine-readable data for all corpora—including morphological and syntactic annotations, as well as named entity recognition—are available in our GitHub repository. Data can be downloaded in a variety of popular formats to suit your research needs.

For advanced linguistic queries, you can explore the data using our ANNIS server. To help you get started, check out our tutorial with query tips and a convenient cheat sheet.

We invite you to explore this latest release and we look forward to your feedback!

New Webinar Video on Searching Our Database Now Online

Earlier today, the Coptic Scriptorium project hosted an online workshop/webinar on searching text and annotations in our database (ANNIS). The video is now on YouTube. The cheat-sheet with an online tutorial that Dr. Zeldes shows in this video is on our website.

Webinar/online workshop on how to search the Coptic Scriptorium database (ANNIS)

If you watch the video, we’d also appreciate your feedback in this brief survey.

We thank the National Endowment for the Humanities, the University of Oklahoma, and Georgetown University for supporting the project and this workshop.

New Corpora Release 5.0.0

We are pleased to announce release 5.0.0 of Coptic Scriptorium! Our data now includes over 1,288,229 tokens of searchable, linguistically analyzed Coptic data from dozens of ancient Coptic works.

This release also marks the introduction of Bohairic Coptic data to our corpus holdings: the repository now contains Bohairic Bible materials, covering Mark 1-16 and 1 Cor. 1-16, with manually reviewed segmentation for the entire corpus, and manual tagging and treebanking for chapters 1-5 in each book. Segmentation and tagging were reviewed in collaboration with Nicholas Wagner, and treebanking was done in collaboration with Nina Speranskaja. As a result of this work, we are in the process of compiling new NLP tools and guidelines specifically for Bohairic.

In addition, the release includes corrections and updates to existing corpora as well as the addition of several new Sahidic works and documents:

A. Sections of five works by Shenoute of Atripe:

B. New documents were added to existing works:

C. Newly added translation spans for Pistis Sophia, aligned by Randy Komforty

These join the newly treebanked and tagged Bohairic data, which can be found here:

We are very grateful to all of our collaborators and contributors, without whom this project could not function. We welcome Nicholas Wagner to the team and warmly thank Randy Komforty for his work on Pistis Sophia, and Nina Sepranskaja for her treebanking work.

As with all our releases, raw machine readable data for all corpora can be found, including morphological and syntactic analysis, as well as named entity recognition and entity linking (currently only for Sahidic), in this GitHub repository, in a variety of popular formats: https://github.com/CopticScriptorium/corpora

You can also search for complex linguistic annotations in the data using our ANNIS server – please see our tutorial here to get started with some query tips and a helpful cheat sheet: https://copticscriptorium.org/ANNIS_tutorial

New Corpora Release 4.5.0 

We are pleased to announce release 4.5.0 of Coptic Scriptorium! Our data now includes over 1,278,500 tokens of searchable, linguistically analyzed Coptic data from dozens of ancient Coptic works (an increase of over 11,500 tokens from the previous release). 

This release corrects a large number of consistency errors identified in our existing data, and also adds some new documents:

We are very grateful to all of our collaborators and contributors, without whom this project could not function. We welcome Christine Ayad, Lydia Bremer-McCollum, Adeline Harrington, and Nina Speranskaja.

As with all releases, raw machine readable data for all corpora can be found, including morphological and syntactic analysis, as well as named entity recognition and entity linking, on our GitHub repository, in a variety of popular formats:

https://github.com/copticscriptorium/corpora

You can also search for complex linguistic annotations in the data using our ANNIS server – please see our new tutorial here to get started with some query tips and a helpful cheat sheet:

https://copticscriptorium.org/ANNIS_tutorial

We hope this release will be useful and look forward to the next one as always!

Online Coptic Scriptorium Reading Group

Description: 

The online Coptic Scriptorium Reading Group consists of a series of short term reading units focused on a particular Coptic text or corpus. The group will focus on reading, analyzing, and translating a set of Coptic texts found in the digital corpus of the Coptic Scriptorium project. We will also practice reading out loud. In addition, the reading units will introduce and demonstrate a variety of Coptic focused digital humanities tools. These include especially the Coptic Online Dictionary and tools for linguistic and corpus analysis. Participants will leave with stronger Coptic reading facility, grammar review, and an introduction to a range of open-access digital humanities tools for studying Coptic. 

Audience:

This group is perfect for Coptologists, historians, linguists, heritage learners, and hobbyists alike. All are welcome! The reading group and text/corpus sequence are designed for introductory and lower-intermediate level reading ability (1-2 semesters of instruction or a roughly equivalent amount of self-directed learning). In other words, some previous grammar training and knowledge of the alphabet will be presumed.

Email organizer (lcbm@ou.edu) if you would like to join or if you have questions about your fit for the group! Zoom link will be shared with the email list.

Schedule: 

Meet via Zoom for 1 hour 3x per week. 

Time:  10:00am EST | 9:00am CST | 7:00am PST

           4:00pm CEST | 3:00pm BST | 5:00pm EEST | 11:00pm JST 

Note: Attendance at all sessions within a series is not required. Moreover, the series can be taken sequentially, but it is not assumed or required. Feel free to drop-in as your schedule and interest allow! 

Fall/Winter 2023 Dates: 

  Sequence A:

    October 23rd, 25th, 27th, 30th, 1st, 3rd (6 hours total)

      Reading Material: Apothegmata Patrum Selections

  Sequence B

    November 6th, 8th, 10th, 13th, 15th, 17th (6 hours total)

      Reading Material: Ruth Selections

  Sequence C

    December 4th, 6th, 8th, 11th, 13th, 15th, 18th, 20th (8 hours total)

      Reading Material: Apothegmata Patrum Selections

  2024 Schedule TBD 

Note: Time and weekly meeting frequency of sequence B and C are subject to change based on group feedback. 

Group Organizer: 

Dr. Lydia Bremer-McCollum (she/hers) holds ten+ years of experience reading and translating Coptic. She has taught introductory and intermediate courses at Harvard and Princeton Universities. As part of the NEH funded Coptic Scriptorium project, she aims to create an online Coptic reading community focused on reading and utilizing the rich set of open-access tools. Feel free to contact her with any questions at lcbm@ou.edu 

Learning and Teaching Coptic with Coptic Scriptorium’s Resources

Today I had the pleasure of giving a presentation at the annual St. Shenouda Society-UCLA Conference of Coptic Studies. My talk illustrates various ways the Coptic Scriptorium project’s texts and tools can help Coptic language learners become more proficient in Coptic. Whether you are a student in a course, an instructor, or someone wanting to learn or improve your knowledge of Coptic, there are many resources online to help you. I’ve posted the slides here for everyone to access, and they go through how to use advanced features of the online Dictionary, basic natural language processing tools to decipher confusing or complicated grammar, and ways to read digitized text to improve reading skills.

I hope it’s useful!

Hiring a DH Specialist in Coptic Studies

Coptic Scriptorium is hiring one more staff member, starting in September 2023. This position runs through at least August 2024 and can be renewed potentially up through July 2026. It is part-time (on average 15 hours/week). Full details in the ad attached!

Thank you to the National Endowment for the Humanities (Preservation and Access Division) and the University of Oklahoma Office of the Vice Provost for Research and Partnerships for the funding and support for this position!!

Hiring for a 1-year Postdoctoral Fellow

Coptic Scriptorium is hiring for a 1-year, full-time Postdoctoral Fellow to work with us on expanding the number of Coptic texts we have available and on annotating those texts.

Full details are in the job ad. This position is remote with virtual meetings and occasional travel to the University of Oklahoma or Georgetown University or other work sites (travel funding covered by OU).

If you are considering applying but have a dissertation defense/completion date firmly set during August (rather than prior to August 1), we can consider your application; please note in your letter your specific timetable, and if we decide to interview you we can discuss the timetable in the interview.

If you applied for the summer position previously advertised and would like to apply for this position, as well, please do send a new full application for this position.

We will begin reviewing applications next week and will conduct virtual interviews either at the end of this month or very early June. This position is funded jointly by a grant from the National Endowment for the Humanities and the University of Oklahoma Office of the Vice Provost for Research and Partnerships.

Hiring for a part-time summer position!

We are hiring for a summer part time position! The full description is below, but the highlights are:

  • must know Coptic
  • 10-20 hours per week
  • remote work
  • supervised by Prof. Caroline T. Schroeder at the University of Oklahoma
  • position begins May 15 or as soon thereafter as the hiring paperwork etc. can be completed
  • send a letter, CV, and names/contact info for 2 references to WGS@ou.edu

In addition, because of the nature of the position, we can only hire someone who is in the US and eligible to work in the US.

We will begin reviewing applications May 8.

This is perfect for a grad student, recent PhD, or part-time academic looking for some extra income over the summer

Busy this summer but wish you could apply? We will be hiring for one or two more positions to start in August or September. Watch this space!

Coptic Scriptorium Awarded an NEH Grant to Expand Corpora and Add More Dialects

The Coptic Scriptorium team is honored to have been awarded an NEH Preservation and Access/Humanities Collections and Reference Resources Implementation Grant in the amount of $349,887. This award will fund a 3-year project Expanding Coptic Digital Online Collections. You can read the press release and list of awarded grants on the NEH site. This initiative will enable Coptic Scriptorium to improve the user experience, to expand our digital database of richly annotated texts in the Sahidic dialect, and to develop natural language tools and searchable, annotated, digitized corpora for additional dialects, including Bohairic. Caroline T. Schroeder (University of Oklahoma) is PI, and Amir Zeldes (Georgetown University) is co-PI. The team also includes Rebecca Krawiec (Canisius College), Christine Luckritz Marquis (Union Presbyterian Seminary), and Hany Takla (St. Shenouda Society), as well as a diverse advisory board. We thank our whole team past and present for the work that led to this stage, and we are grateful to the National Endowment for the Humanities for their ongoing support.

« Older posts