Full, machine-annotated New Testament Corpus updated

We’ve updated and re-released our fully machine-annotated New Testament corpus.  sahidica.nt V2.1.0 contains the Sahidica NT text from Warren Wells Sahidica online NT, with the following features:

  • Annotated with our latest NLP tools (part of speech tagger 1.9, tokenizer 4.1.0, language tagger and lemmatizer include lexical entries from the Database and Dictionary of Greek Loanwords in Coptic (DDGLC))
  • Now contains the morph layer (annotating compound words and Coptic morphs such ⲣⲉϥ- ⲙⲛⲧ- ⲁⲧ-)
  • Visualizations for linguistic analysis

Please keep in mind that this fully machine-annotated corpus is more accurate than previous versions but will nonetheless contain more errors than a corpus manually corrected by a human.

Search and queries

For searches and queries using our ANNIS database to find specific terms, for this corpus we recommend searching the normalized words using regular expressions (to capture instances of the desired word that may still be embedded in a Coptic bound group, instances that our tokenizer may have missed):

Lemma searches are now also possible.  You may wish to search for the lemma using regular expressions, as well, in order to find lemmas of some compound words.  For example, the following search will find entries containing ⲥⲱⲧⲙ in the lemma:

The results include various forms of ⲥⲱⲧⲙ (including ⲥⲟⲧⲙ) lemmatized the lexical entry “ⲥⲱⲧⲙ“, compound words lemmatized to ⲥⲱⲧⲙ or to a lexical entry containing ⲥⲱⲧⲙ, and some bound groups containing the word form ⲥⲱⲧⲙ, which our tokenizer did not catch:

Frequency table of normalized words lemmatized to swtm or a lemma form containing swtm (May 2016 Sahidica corpus)

Frequency table of normalized words lemmatized to ⲥⲱⲧⲙ or a lemma form containing ⲥⲱⲧⲙ (May 2016 Sahidica corpus)

As you can see, most of the hits are accurate (e.g., ⲥⲟⲧⲙ, ⲁⲧⲥⲱⲧⲙ, ⲣⲁⲧⲥⲱⲧⲙ, ⲣⲉϥⲥⲱⲧⲙ); some of the Coptic bound groups did not tokenize properly (e.g., ⲉⲡⲥⲱⲧⲙ, ⲙⲁⲣⲟⲩⲥⲱⲧⲙ).  We expect accuracy to increase as we incorporate more texts into our corpora that have been machine annotated and then manually edited.

Reading by individual chapter

You can also read these documents and see the linguistic analysis visualizations at data.copticscriptorium.org/urn:cts:copticLit:nt.  The first documents you will see (Gospel of Mark, 1 Corinthians) are manually annotated.  Scroll down for “New Testament,” which is the full, machine-annotated Sahidica New Testament.  Click on “Chapter” to read each chapter as normalized Coptic (with English translation as a pop-up when you hover your cursor).  Click on “Analytic” for the normalized Coptic, part of speech analysis, and English translation for each chapter.  Please keep in mind the English translation provided is a free, open-access New Testament translation from the World English Bible; it is not a direct translation from the Coptic.

Note:  we know that our server is slow generating the documents for this corpus.  It may take several minutes to load; please be patient.  For faster access, use ANNIS.  Visualizations to read the chapters are available by clicking on the corpus and the icon for visualizations.

Accessing document visualizations of the Sahidica corpus via ANNIS

We hope this corpus is useful to researchers.

ANNIS embeds for websites and blogs

Starting this week, there’s a new feature in our ANNIS web interface: ANNIS embeds.

The ANNIS interface can now give you an HTML snippet that you can embed on your webpage, blog post and more.

Here’s an example of an embedded visualizer for a passage from Besa’s letter to Aphthonia, in which he recounts Aphthonia’s threat to go to another monastery:

(MONB.BA 47, urn:cts:copticLit:besa.aphthonia.monbba)

The code snippet for this visualization is as follows:

<iframe src="https://corpling.uis.georgetown.edu/annis/?id=31e2a273-426f-4aaf-922a-7fa0f0b311e1" width="100%" height="500"/>

To get an embeddable snippet, click the share icon at the top left of your search result and choose the visualization you want. To share the entire set of results, use the share button at the top right of the results page. Additionally, if you want to share an ANNIS search result via e-mail, you can still copy and paste the URL as before, but now you can also get a specific shareable link for individual hits using the same share button .

Let us know if you have any feedback!

New web application to read documents, cite data, and access data (BETA release)

We’re very excited to announce a new feature at Coptic SCRIPTORIUM.  We’ve created a new online web application that we think will allow users to read and reference our material much more easily.

Users can read Coptic documents on HTML pages taken from the data visualizations.  There are also easy links to our search tool ANNIS and to our GitHub repository for downloading files.

And we have a system of canonical URNS that provide persisent identifiers for documents, texts, authors, and text groups.   This means you can cite our data in your scholarship, and then readers will be able to back to our site and find our most recent versions of the documents you have cited.

We’ve got a little video to introduce it, or dive right in at http://data.copticscriptorium.org.

This is a BETA release, which means you might see a few things that need to be ironed out.  (For one thing, our small corpus of documentary papyri are not yet in the system — stay tuned, and in the meanwhile you can still read and query them in ANNIS.)  We are pretty pleased with how it’s turning out and look forward to future developments.

Many thanks to Bridget Almas of the Perseus Digital Library for helping us develop a canonical referencing system, and to Archimedes Digital for implementing the application.



Releasing new translation of section of Shenoute’s Acephalous Work 22

An English Translation (by Anthony Alcock) of part of Shenoute’s Acephalous Work 22 is available.  Anthony Alcock of the University of Kassel has contributed a translation of White Monastery Manuscript YA (MONB.YA) pages 421-28. This section corresponds to Leipoldt’s vol. 4, pp. 124-29. Coptic, English, and various annotations are available. Many thanks to Dr. Alcock for the contribution! We are in the process of a major addition to our website functionality, to enable you to read and find these texts more easily. In the meantime, you can access the text via our ANNIS search and visualization tool.  Click on the little page icon next to the shenoute.a22 corpus listing to see the visualizations.

Screen Shot 2015-06-11 at 3.50.07 PM of ANNIS corpus list

List of corpora in ANNIS

Read the English translation directly in the linguistic analysis view; read it as a pop-up when you hover over the Coptic in the normalized view.

screenshot: list of visualizations in ANNIS

Or search the English in ANNIS using a search string; to search for the word “work” in the English translations of Acephalous Work 22, use translation=/.*work.*/.

(Originally posted in March 2015)

Entire Sahidica New Testament now available

The entire Sahidica New Testament (machine-annotated) is now available. It has been tokenized and tagged for part of speech entirely automatically, using our tools. There has been no manual editing or correction. Visit our corpora for more information, or just jump in and search it in ANNIS.


(Originally posted in March 2015)

Corpora and how to use ANNIS

Coptic SCRIPTORIUM provides Coptic texts for reading, analysis, and complex searches. For a full list of our text corpora, please click here. We have also added answers to who and what some people and terms mean on our main site. A video tutorial given by Amir Zeldes and Caroline T. Schroeder is also available on how to search our database using the tool ANNIS.


(Originally posted in December of 2014)

Introducing the project texts and data model, and how to use ANNIS

To learn more about Coptic SCRIPTORIUM’s corpora, data model, and features,  here is a video on how to use the tool ANNIS into the world of Coptic. Thanks goes to Caroline T. Schroeder for the video from her youtube channel.

(Originally posted on copticscriptorium.org)

