Coptic SCRIPTORIUM Blog

Page 6 of 10

Coptic SCRIPTORIUM at ISAW Conference

Coptic SCRIPTORIUM’s Caroline T. Schroeder will be giving the keynote at the “Future Philologies” Conference at Institute for the Study of the Ancient World today in New York City. We’re looking forward to conversations with colleagues old and new.

Geographic data now available via Pelagios

February 21, 2018 / Elizabeth Platte / 0 Comments

Coptic SCRIPTORIUM has partnered with Pelagios Commons to make geographic data drawn from published Coptic SCRIPTORIUM texts available via Pelagios’ Peripleo search engine and API. Each entry links a geographic location, identified by its Pleiades resource number, to a query for that term in ANNIS, our search and visualization interface. Therefore, each geographic entity in our data appears only once in the Pelagios data set, regardless of how many times the entity appears in our published texts. Queries cover corpora published as of April 2017 (release 2.3.1), including more recently published documents in those corpora, but do not include corpora new to our most recent release. The list of geographic entities included in this dataset dates to April 2017, and does not include locations unique to more recent publications.

The Coptic SCRIPTORIUM data set as it appears in Peripleo

Find the full Coptic SCRIPTORIUM dataset on Peripleo at: http://peripleo.pelagios.org/ui#selected= http%3A%2F%2Fcorpling.uis.georgetown.edu%2Fannis%2Fscriptoriummy-dataset

Turtle files prepared for this partnership are publicly available on GitHub: https://github.com/CopticScriptorium/pelagios-dataset-summary.

New corpora – release 2.4.0 is out!

November 23, 2017 / Amir Zeldes / 0 Comments

We are pleased to announce release version 2.4.0 with new corpora, with tagged and lemmatized corpora available for reading and download at [1], and fully searchable at [2]:

[1] http://data.copticscriptorium.org/

[2] https://corpling.uis.georgetown.edu/annis/scriptorium

This release contains new data contributed by Alin Suciu, David Brakke and Diliana Atanassova, as well as out of copyright edition material contributed by the Marcion project. New data in this release includes excerpts from:

The Martyrdom of Saint Victor the General (2033 tokens)
The Canons of Apa Johannes (438 tokens)
Pseudo-Theophilus On the Cross and The Thief (2814 tokens)
Shenoute, Some Kinds of People Sift Dirt (888 tokens)
11 additional Apophthegmata Patrum, bringing the total released to 63 apophthegms (7077 tokens)

All texts are also linked to the Coptic Dictionary Online (https://corpling.uis.georgetown.edu/coptic-dictionary/), which has been updated with frequency information including these texts. We would like to thank the annotators and translators of these data sets, several of whom are new to the project, without whose work the corpora would not be online:

Alexander Turtureanu, Alin Suciu, Amir Zeldes, Caroline T. Schroeder, Christine Luckritz Marquis, Dana Robinson, David Brakke, David Sriboonreuang, Diliana Atanassova, Elizabeth Davidson, Elizabeth Platte, Gianna Zipp, J. Gregory Given, Janet Timbie, Jennifer Quigley, Laura Slaughter, Lauren McDermott, Marina Ghaly, Mitchell Abrams, Paul Lufter, Rebecca Krawiec, Saskia Franck and Tobias Paul

We hope everyone will find this release useful and look forward to releasing more data in the coming year!

New release of the Coptic Treebank

August 26, 2017 / Amir Zeldes / 0 Comments

Coptic Treebank release 2.1, now with three Letters of Besa!

We are pleased to announce the release of the latest version of the Coptic Treebank, now containing three Letters of Besa:

On Lack of Food
To Aphthonia
To Thieving Nuns

This brings the total corpus size up to 10,499 tokens, thanks to annotation work by Elizabeth Davidson and Amir Zeldes, building on earlier transcription and tagging work by Coptic Scriptorium and KELLIA partners. Special thanks are due to So Miyagawa for providing the transcription for On Lack of Food. The corpus will continue to grow as we work to annotate more data and improve the accuracy of our automatic syntax parser for Coptic. You can search the current version of the corpus in ANNIS here:

https://corpling.uis.georgetown.edu/annis/scriptorium

Or download the latest raw annotated data from GitHub here:

https://github.com/universalDependencies/UD_Coptic/tree/dev

Please let us know if you find any errors or have any feedback on the treebank!

Old Testament corpus release

June 12, 2017 / Amir Zeldes / 0 Comments

We are happy to announce the release of the automatically annotated Sahidic Old Testament corpus (corpus identifier: sahidic.ot), based on the version of the available texts kindly provided by the CrossWire Bible Society SWORD Project thanks to work by Christian Askeland, Matthias Schulz and Troy Griffitts.

The corpus is available for search in ANNIS, much like the Sahidica New Testament corpus, together with word segmentation, morphological analysis, language of origin for loanwords, part of speech tagging and automatically aligned verse translations (except for parts of Jeremiah). Please expect some errors, due the fully automatic analysis in the corpus. The aligned translation is taken from the World English Bible. Here is an example search for the word ‘soul’:

norm=”ⲯⲩⲭⲏ”

You can also read entire chapters in ANNIS or at our repository, which look like this:

urn:cts:copticLit:ot.gen.crosswire:09

We hope that this resource will be helpful to Coptic scholars – please let us know if you have any questions or comments!

New Tutorials & Recent Workshop Wrap-up

June 8, 2017 / ctschroeder / 0 Comments

Coptic Scriptorium team members Caroline T. Schroeder and Rebecca Krawiec recently led a workshop on Digital Corpora and Digital Editions at the North American Patristics Society annual meeting. We created detailed tutorials useful to both beginners and more advanced users on our GitHub site. These tutorials cover:

an introduction to digital editions and corpora
working with the online Coptic Dictionary
simple and complex searching Coptic literature in our database ANNIS
creating a digital corpus with Epidoc TEI-XML annotations and natural language processing

We invite everyone to use these tutorials on their own. They’re designed for for self-paced work.

We were pleased to participate in the pre-conference Digital Humanities workshops that included another session on mapping led by Sarah Bond and Jennifer Barry. We had attendees from four countries, who ranged in their careers from graduate students to senior professors. Thanks to NAPS for hosting these workshops, and to the NEH and the DFG for making our work possible.

New Release of Corpora

April 25, 2017 / ctschroeder / 0 Comments

We’re pleased to announce that we’ve released more texts in our corpora.

The Sayings of the Desert Fathers (Apophthegmata Patrum) corpus now contains 52 sayings/apophthegms (>7100 words). We have edited previously published sayings for consistency in annotation, and we’ve released new sayings edited by Christine Luckritz Marquis, Elizabeth Platte, and our newest contributor, Dana Robinson. Read or browse the Sayings online. Click on the “Analytic” button to see read a saying in Coptic with a parallel English translation + part of speech tags for each Coptic word.

Or click on the “Norm” button (short for “normalized”) to read the Coptic. Clicking on any Coptic word in the normalized visualization will take you to an online Coptic-English dictionary. Hovering your cursor over a passage in the normalized visualization will show the English translation in a pop up window.

AP 96 Normalized view screenshot

Shenoute’s I See Your Eagerness now has numerous new manuscript fragments published (over 16,000 words). We also have edited previously published witnesses for consistency in annotation. These documents were transcribed and collated from the manuscripts by David Brakke and annotated for digital publication by Rebecca Krawiec. Now you can read Shenoute’s I See Your Eagerness in nearly its entirety in Coptic. We provide several paths for you to explore this text:

Read the text from start to end, beginning with the first manuscript fragment. Click “NEXT” to keep reading.
MONB.GL fragment D diplomatic visualization

(No English translation is provided, but in the “Note” metadata field below the Coptic, you can find page numbers for David Brakke’s and Andrew Crislip’s translation in their book, Discourses of Shenoute.) “Next” and “Previous” buttons will take you through the path we consider optimal for reading the text. This path wanders through various manuscript witnesses, following the path with the fewest lacunae. Want to see parallel witnesses? Check out the “Witness” metadata field below the text.

MONB.GL 29-30 metadata screenshot
Read through all surviving pages in one codex/manuscript witness by filtering for a particular codex. Click through the documents in that codex. For example, if you want to read through all the fragments of codex MONB.GL, go to data.copticscriptorium.org, and use the menu to filter by Corpus for the shenoute.eagerness corpus, and then filter by manuscript name for the MONB.GL codex. Click through the documents in that codex.
Perform a search/query in our ANNIS database. For example, search for all occurrences of “wicked” (ⲡⲟⲛⲏⲣⲟⲛ) in the corpus. Or, search for occurrences of “wicked” controlling for duplicate hits in parallel manuscript witnesses. See our guide to queries in ANNIS for more tips.

You also can download the entire corpus in TEI XML, PAULA XML, and relANNIS formats from our GitHub site.

New release – Coptic Treebank V2

March 16, 2017 / Amir Zeldes / 0 Comments

We are happy to announce the release of version 2 of the Coptic Universal Dependency Treebank. With over 8,500 tokens from 14 documents, the Treebank is the largest syntactically annotated resource in Coptic. The annotation scheme follows the Universal Dependency Guidelines, version 2, and is therefore comparable with UD data from 70 treebanks in 50 languages, including English, Latin, Classical Greek, Arabic, Hebrew and more.

You can search in the Treebank using ANNIS. For example, the following query finds cases of verbs dominating a complement clause (e.g. “say …. that …”):

pos="V" ->dep[func="ccomp"] norm

[Link to this query]

More about the Treebank:
https://corpling.uis.georgetown.edu/coptic-treebank/
More about Universal Dependencies:
http://universaldependencies.org/

White Paper for NEH DH Startup grant now online

December 30, 2016 / ctschroeder / 0 Comments

We have concluded our round of “startup” funding from the National Endowment for the Humanities Office of Digital Humanities. Our White Paper documents our activities and our outcomes for the period, including the following grant products:

A Digitized Coptic Corpus in Multiple Formats and Visualizations
Digital and Computational Tools (tokenizer, part of speech tagger, lemmatizer, and more and more)
ANNIS Database instance to query and search the multilayer corpus
Documentation in the toolsets, on our wiki, and on our blog
Web application for users to reading and cite visualizations of textual data
Symposium and workshop (“Digital Coptic 2,” March 2015) at Georgetown U + public tutorial and workshop at the Coptic Congress
Articles and conference papers to distribute the results of our work

CHECK IT OUT! We heartily thank the NEH ODH for its support, as well as the NEH Preservation and Access division for their concurrent grant. We also thank all of our participants, contributors, and collaborators, who are numerous and are outlined in the White Paper.

White Paper for NEH ODH Startup Grant

See also our White Paper for the P&A grant submitted in August.

December 2016 corpus release (v 2.2.0)

December 16, 2016 / ctschroeder / 0 Comments

We are happy to release the following new and revised documents to our corpora. A copy of the official release notes is below. The data is available for download from GitHub in TEI XML, PAULA XML, and relANNIS formats. The corpora can be viewed and accessed at data.copticscriptorium.org, and they all can be queried in ANNIS. We plan for another release with more documents in March 2017.

As always: if you have comments or corrections, please submit a pull request on GitHub or send us an email at contact [at] copticscriptorium [dot] org.

____

This corpus release includes new or revised documents for:

1 Corinthians: machine and manual annotations; new documents are chapters 13-16; edits to already published chapters include corrections and modifications to lemmas, normalization, part of speech, and/or tokenization to conform to evolving guidelines
Mark: machine and manual annotations; edits to already published chapters include corrections and modifications to lemmas, normalization, part of speech, and/or tokenization to conform to evolving guidelines
Not Because a Fox Barks (Shenoute): machine and manual annotations; edits to already published document include corrections and modifications to lemmas, normalization, part of speech, and/or tokenization to conform to evolving guidelines
Besa letters: machine and manual annotations; edits to already published documents include corrections and modifications to lemmas, normalization, part of speech, and/or tokenization to conform to evolving guidelines

All other documents in our corpora are unchanged from the last release.

New metadata and corpus feature: We are beginning to add to our documents a metadata field called “order” which will allow us to present documents in a logical order for browsing or reading. We’ve implemented it in the Besa letters, corpus and will roll it out for other corpora in the future. Our Document Retrieval web application (data.copticscriptorium.org) now lists the documents in the order in which they appear in the manuscript tradition, when you filter for that corpus. Thus, users who wish to read or browse the documents in that order can do so easily.

Version control: We have set the version number on our document metadata, corpus metadata (in ANNIS), and release information (in GitHub) all to match. Version #s and dates are only revised when a document is revised. So if no documents in our AP corpus have been revised and republished, or no new documents for that corpus have been published, then the version # on the documents and corpus do not change. Only new and newly edited documents (and their corpora) will have version 2.2.0 and date 08 December 2016 in their metadata.