Page 7 of 10

New feature + texts in our corpora: Apophthegmata, I See Your Eagerness

We are very excited to release new versions of two of our corpora in time for the Coptic Congress.  And keep reading to learn about a new feature on our website.

As usually, we provide a diplomatic transcription of the texts’ manuscripts, normalized text for ease of reading, and an analytic visualization with the normalized text and part of speech tags in our web application.  Plus you’ll see buttons to search the corpora in our database or download our digital files.

Apophthegmata Patrum

The Apophthegmata Patrum now contains 36 published Sayings.  New ones include

This release also marks the first contributions of our newest editor, Dr. Dana Lampe.  Dana earned her Ph.D. at the Catholic University of America is beginning a postdoc at Creighton in the fall.

I See Your Eagerness

We also are releasing a huge new chunk of Shenoute’s sermon, I See Your Eagerness.  These texts were transcribed and collated primarily by David Brakke (with some by Stephen Emmel).  We thank David for his  generous donation of his transcriptions to the project!  Senior Editor Rebecca Krawiec has digitized and annotated these transcriptions.

Please begin your read of I See Your Eagerness with the fragment from codex MONB.GL 9-10.   Or you can search it in our search & visualization tool ANNIS.

We now have over 9000 words of this text digitized and annotated!

New: “Next” & “Previous” Buttons on Document visualizations

We’ve got a new feature in our web application:  the “next” and “previous” buttons near the top of the text.

“Next” is the next document for this work; if there is a lacuna, you’ll be taken to the next extant witness we’ve digitized.  If there are multiple, parallel witnesses, you’ll be taken to the witness we’ve identified as the best or clearest witness (typically based on the amount of lacunae).

The same is true for the “Previous” button.

If you want to review the parallel witness(es), check out the metadatum field for each document called “witness.”  If a parallel witness exists, it will be listed; if we have digitized the witness, the URN for the witness will be listed.  You can enter the URN in the box at the top of our website to retrieve the document.

Coptic SCRIPTORIUM at the Coptic Congress

Much of the Coptic SCRIPTORIUM team is in Claremont this week for the Congress of the International Association of Coptic Studies.

We started out with a pre-conference, 2-day workshop with our KELLIA partners from Germany, where we worked on sharing data and technologies across digital Coptic projects.  Look here soon for an announcement about a really cool fruit of our labors.

Thursday there are two panels, and Friday there are two workshops.

Thursday 2-4 pm Coptic Digital Studies (Burkle 16)

David Brakke chair 

Prof. Dr. Caroline Schroeder, Coptic SCRIPTORIUM: A Digital Platform for Research in Coptic Language and Literature

Dr. Christine Luckritz Marquis, Reimagining the Apopthegmata Patrum in a Digital Culture

Prof. Amir Zeldes, A Quantitative Approach to Syntactic Alternations in Sahidic

Dr. Rebecca Krawiec, Charting Rhetorical Choices in Shenoute: Abraham our Father and I See Your Eagerness as case-studies

Thursday 4:30-6:30 Coptic Digital Humanities (Burkle 16)

Caroline T. Schroeder, Chair

Dr. Paul Dilley, Coptic Scriptorium beyond the Manuscript: Towards a Distant Reading of Coptic Texts

Mr. So Miyagawa and Dr. Marco Büchler, Computational Analysis of Text Reuse in Shenoute and Besa

Mr. Uwe Sikora, Text Encoding – Opportunities and Challenges

Ms. Eliese-Sophia Lincke, Optical Character Recogition (OCR) for Coptic. Testing Automated Digitization of Texts with OCRopy

 

Friday 11-12:30 Workshop on Coptic Fonts & Coptic Bible (AA)

Christian Askeland, Frank Feder

Friday 4:30-6 Digital Tools for Beginners (Workshop on Coptic SCRIPTORIUM)

Caroline T. Schroeder, Amir Zeldes, Rebecca S. Krawiec

New publication: Raiders of the Lost Corpus in DHQ

Amir Zeldes and Caroline T. Schroeder have recently published an article in Digital Humanities Quarterly about the need for digital tools and a digitized corpus for Coptic, and research questions that drive Coptic SCRIPTORIUM.

“Raiders of the Lost Corpus” is freely available on the DHQ website as part of a special issue on Digital Methods and Classical Studies edited by Neil Coffee and Neil W. Bernstein.  Schroeder presented an earlier version of this paper at the Digital Classics conference at the University at Buffalo in 2013.

 

Coptic Treebank Released

Yesterday we published the first public version of the Coptic Universal Dependency Treebank. This resource is the first syntactically annotated corpus of Coptic, containing complete analyses of each sentence in over 4,300 words of Coptic excerpts from Shenoute, the New Testament and the Apophthegmata Patrum.

To get an idea of the kind of analysis that Treebank data gives use, compare the following examples of an English and a Coptic dependency syntax tree. In the English tree below, the subject and object of the verb ‘depend’ on the verb for their grammatical function – the nominal subject (nsubj) is “I”, and the direct object (dobj) is “cat”.

cat_mat

We can quickly find out what’s going on in a sentence or ‘who did what to whom’ by looking at the arrows emanating from each word. The same holds for this Coptic example, which uses the same Universal Dependencies annotation schema, allowing us to compare English and Coptic syntax.

He gave them to the poor

He gave them to the poor

Treebanks are an essential component for linguistic research, but they also enable a variety of Natural Language Processing technologies to be used on a language. Beyond automatically parsing text to make some more analyzed data, we can use syntax trees for information extraction and entity recognition. For example, the first tree below shows us that “the Presbyter of Scetis” is a coherent entity (a subgraph, headed by a noun); the incorrect analysis following it would suggest Scetis is not part of the same unit as the Presbyter, meaning we could be dealing with a different person.

One time, the Presbyter of Scetis went...

One time, the Presbyter of Scetis went…

One time, the Presbyter went from Scetis... (incorrect!)

One time, the Presbyter went from Scetis… (incorrect!)

To find out more about this resource, check out the new Coptic Treebank webpage. And to read where the Presbyter of Scetis went, go to this URN: urn:cts:copticLit:ap.19.monbeg.

Full, machine-annotated New Testament Corpus updated

We’ve updated and re-released our fully machine-annotated New Testament corpus.  sahidica.nt V2.1.0 contains the Sahidica NT text from Warren Wells Sahidica online NT, with the following features:

  • Annotated with our latest NLP tools (part of speech tagger 1.9, tokenizer 4.1.0, language tagger and lemmatizer include lexical entries from the Database and Dictionary of Greek Loanwords in Coptic (DDGLC))
  • Now contains the morph layer (annotating compound words and Coptic morphs such ⲣⲉϥ- ⲙⲛⲧ- ⲁⲧ-)
  • Visualizations for linguistic analysis

Please keep in mind that this fully machine-annotated corpus is more accurate than previous versions but will nonetheless contain more errors than a corpus manually corrected by a human.

Search and queries

For searches and queries using our ANNIS database to find specific terms, for this corpus we recommend searching the normalized words using regular expressions (to capture instances of the desired word that may still be embedded in a Coptic bound group, instances that our tokenizer may have missed):

Lemma searches are now also possible.  You may wish to search for the lemma using regular expressions, as well, in order to find lemmas of some compound words.  For example, the following search will find entries containing ⲥⲱⲧⲙ in the lemma:

The results include various forms of ⲥⲱⲧⲙ (including ⲥⲟⲧⲙ) lemmatized the lexical entry “ⲥⲱⲧⲙ“, compound words lemmatized to ⲥⲱⲧⲙ or to a lexical entry containing ⲥⲱⲧⲙ, and some bound groups containing the word form ⲥⲱⲧⲙ, which our tokenizer did not catch:

Frequency table of normalized words lemmatized to swtm or a lemma form containing swtm (May 2016 Sahidica corpus)

Frequency table of normalized words lemmatized to ⲥⲱⲧⲙ or a lemma form containing ⲥⲱⲧⲙ (May 2016 Sahidica corpus)

As you can see, most of the hits are accurate (e.g., ⲥⲟⲧⲙ, ⲁⲧⲥⲱⲧⲙ, ⲣⲁⲧⲥⲱⲧⲙ, ⲣⲉϥⲥⲱⲧⲙ); some of the Coptic bound groups did not tokenize properly (e.g., ⲉⲡⲥⲱⲧⲙ, ⲙⲁⲣⲟⲩⲥⲱⲧⲙ).  We expect accuracy to increase as we incorporate more texts into our corpora that have been machine annotated and then manually edited.

Reading by individual chapter

You can also read these documents and see the linguistic analysis visualizations at data.copticscriptorium.org/urn:cts:copticLit:nt.  The first documents you will see (Gospel of Mark, 1 Corinthians) are manually annotated.  Scroll down for “New Testament,” which is the full, machine-annotated Sahidica New Testament.  Click on “Chapter” to read each chapter as normalized Coptic (with English translation as a pop-up when you hover your cursor).  Click on “Analytic” for the normalized Coptic, part of speech analysis, and English translation for each chapter.  Please keep in mind the English translation provided is a free, open-access New Testament translation from the World English Bible; it is not a direct translation from the Coptic.

Note:  we know that our server is slow generating the documents for this corpus.  It may take several minutes to load; please be patient.  For faster access, use ANNIS.  Visualizations to read the chapters are available by clicking on the corpus and the icon for visualizations.

Accessing document visualizations of the Sahidica corpus via ANNIS

Accessing document visualizations of the Sahidica corpus via ANNIS

We hope this corpus is useful to researchers.

New Besa fragment published

We’ve published another small fragment of Besa on Coptic SCRIPTORIUM.  So Miyagawa has edited and translated the letter fragment known as On Lack of Food.  Read it online or search the letters of Besa we have published.

ANNIS embeds for websites and blogs

Starting this week, there’s a new feature in our ANNIS web interface: ANNIS embeds.

The ANNIS interface can now give you an HTML snippet that you can embed on your webpage, blog post and more.

Here’s an example of an embedded visualizer for a passage from Besa’s letter to Aphthonia, in which he recounts Aphthonia’s threat to go to another monastery:

(MONB.BA 47, urn:cts:copticLit:besa.aphthonia.monbba)

The code snippet for this visualization is as follows:

<iframe src="https://corpling.uis.georgetown.edu/annis/?id=31e2a273-426f-4aaf-922a-7fa0f0b311e1" width="100%" height="500"/>

To get an embeddable snippet, click the share icon at the top left of your search result and choose the visualization you want. To share the entire set of results, use the share button at the top right of the results page. Additionally, if you want to share an ANNIS search result via e-mail, you can still copy and paste the URL as before, but now you can also get a specific shareable link for individual hits using the same share button .

Let us know if you have any feedback!

Annotation tools now include DDGLC Greek Loanword List

We are pleased to announce the release of our newest versions of some of our natural language processing tools for Coptic which incorporate the lemma list of loanwords developed by the Database and Dictionary of Greek Loanwords in Coptic (DDGLC).

The DDGLC is part of the KELLIA partnership between American and German digital Coptic projects funded by the NEH Office of Digital Humanities and the DFG.  The DDGLC, under the direction of Prof. Dr. Tonio Sebastian Richter, has been building a database of Greek loanwords in Coptic in order to facilitate the study of language contact, language borrowing, and multilingualism in Egypt.

We have integrated the Greek lemma list into our language of origin tagger, tokenizer and morphology analysis, and lemmatizer.

Our online natural language processing web service (which bundles together all of our NLP tools into one web application) also includes this new data from the DDGLC.

The Greek loanword list should greatly increase the accuracy of many of our tools.  If you use them, please let us know how it goes!

We at Coptic SCRIPTORIUM are grateful for this partnership and the generosity of the DDGLC team.

New born-digital edition of a Shenoute fragment

This winter we’ve released a new document we’ve been working on for a while.  It’s a born digital publication, in the sense that this document to our knowledge has never been published previously.  The edition and annotations here were produced by Elizabeth Platte (Reed College) and Rebecca S. Krawiec (Canisius College) directly from digital photographs of the manuscript for digital publication.

Read the manuscript transcription or the  normalized text, or query it in our database.

It’s a section of one of Shenoute’s texts for monks in volume three of his monastic Canons.  This 14-page (seven-folio) fragment now resides in the Bibliothèque Nationale in Paris and originally derives from the White Monastery codex known by the siglum MONB.YB.  We’ve released text and annotations for pages 307-320, which equate to the BN call number Ms Copte 130/2 ff. 51-57.  Digital photos are now available online at Gallica.

We’ve transcribed the text from images of the manuscript and then annotated it for manuscript information.  We’ve also broken the text down into the Coptic phrases known as “bound groups,” words, and morphs.  Then we’ve annotated it all for part of speech, loan words (Greek, Latin, etc.), and lemmas.

By “we” I mean primarily Platte and Krawiec .  Schroeder and Zeldes provided editorial review, as per our policy of having every published digital document reviewed by at least one editor.

As far as we know, this fragment has never been published; nor has any translation ever been published.  We don’t have a translation yet, either.

As the first born-digital edition, this document is an experiment for us.  Everything else we’ve worked with has been published in an edition, and sometimes even has an English translation that another scholar has published.  Even though we digitize from the original manuscript, previous editions and translations make the transcription, annotation, and editing process much easier.  This document is an unknown quantity.

This means we expect to have errors and welcome feedback on the document.

We also have no translation as of yet.  Our goal is to translate the document and then edit the transcription and annotations again as we work.  We hope to publish an essay on how the digital annotation process affected the creation of an edition.

In the meantime, use it to practice your Coptic.  Let us know if you find errors.  We’ll credit you.

2015 Society of Biblical Literature (SBL) Annual Meeting

The 2015 SBL conference is the largest gathering of scholars interested in the study of religion in the world. The latest research and the fostering of collegian contacts is present at each showcase. The 2015 meeting is taking place in Atlanta , GA and members of the Coptic SCRIPTORIUM  team will be presiding at the conference.

Digital Humanities in Biblical, Early Jewish, and Christian Studies; Papyrology and Early Christian Backgrounds will have Caroline T. Schroeder chairing , Taking place in International A (International Level) – Marriot on 11/23/2015 at 9:00 AM to 11:30 AM.

Also,

“A Wild Patience Has Taken this Far”: Future Avenues of Feminist Scholarship will also have Caroline T. Schroeder presenting, Taking place in Room A602 (Atrium  Level) – Marriot at 4:00 PM to 6:30 PM.

and

Art and Religions of Antiquity; Violence and Representations of Violence among Jews and Christians will have Christine Luckritz Marquis chairing, Taking place in Room 209 (Level 2) – Hilton on 11/23/2015 at 9:00 AM to 11:30 AM.

We are looking forward to their presentations and hope others are as well!

« Older posts Newer posts »