We are happy to release the following new and revised documents to our corpora. A copy of the official release notes is below. The data is available for download from GitHub in TEI XML, PAULA XML, and relANNIS formats. The corpora can be viewed and accessed at data.copticscriptorium.org, and they all can be queried in ANNIS. We plan for another release with more documents in March 2017.
As always: if you have comments or corrections, please submit a pull request on GitHub or send us an email at contact [at] copticscriptorium [dot] org.
This corpus release includes new or revised documents for:
- 1 Corinthians: machine and manual annotations; new documents are chapters 13-16; edits to already published chapters include corrections and modifications to lemmas, normalization, part of speech, and/or tokenization to conform to evolving guidelines
- Mark: machine and manual annotations; edits to already published chapters include corrections and modifications to lemmas, normalization, part of speech, and/or tokenization to conform to evolving guidelines
- Not Because a Fox Barks (Shenoute): machine and manual annotations; edits to already published document include corrections and modifications to lemmas, normalization, part of speech, and/or tokenization to conform to evolving guidelines
- Besa letters: machine and manual annotations; edits to already published documents include corrections and modifications to lemmas, normalization, part of speech, and/or tokenization to conform to evolving guidelines
All other documents in our corpora are unchanged from the last release.
New metadata and corpus feature: We are beginning to add to our documents a metadata field called “order” which will allow us to present documents in a logical order for browsing or reading. We’ve implemented it in the Besa letters, corpus and will roll it out for other corpora in the future. Our Document Retrieval web application (data.copticscriptorium.org) now lists the documents in the order in which they appear in the manuscript tradition, when you filter for that corpus. Thus, users who wish to read or browse the documents in that order can do so easily.
Version control: We have set the version number on our document metadata, corpus metadata (in ANNIS), and release information (in GitHub) all to match. Version #s and dates are only revised when a document is revised. So if no documents in our AP corpus have been revised and republished, or no new documents for that corpus have been published, then the version # on the documents and corpus do not change. Only new and newly edited documents (and their corpora) will have version 2.2.0 and date 08 December 2016 in their metadata.
We at Coptic SCRIPTORIUM have been fortunate to have received three grants from the National Endowment for the Humanities for our work. We cannot thank the NEH enough for its support. So much of what we have done over the past 2+ years could not have happened without this funding.
We just completed a White Paper paper for a Foundations grant from the Humanities Collections and Reference Resources program in the Division of Preservation and Access. The grant, “Coptic SCRIPTORIUM: Digitizing a Corpus for Interdisciplinary Research in Ancient Egyptian,” ran from May 2104 until now.
Our White Paper documents our work and especially the standards and practices we developed for digitizing a pilot Coptic corpus.
If you want to know more about what truly interdisciplinary DH work looks like, check it out. We try to break down the complexities of creating a digital corpus for research in linguistics, history, religious studies, biblical studies, manuscript studies. We’ve got data models, workflows, digitization standards, transcription guidelines, and more all laid out for you here.
There is so much more to do; this is a only start. Thanks to everyone who has had faith in our work.
White Paper, NEH Grant PW-51672-14 (Preservation and Access): “Coptic SCRIPTORIUM: Digitizing a Corpus for Interdisciplinary Research in Ancient Egyptian” 29 August 2016
We’re very excited to announce a new feature at Coptic SCRIPTORIUM. We’ve created a new online web application that we think will allow users to read and reference our material much more easily.
Users can read Coptic documents on HTML pages taken from the data visualizations. There are also easy links to our search tool ANNIS and to our GitHub repository for downloading files.
And we have a system of canonical URNS that provide persisent identifiers for documents, texts, authors, and text groups. This means you can cite our data in your scholarship, and then readers will be able to back to our site and find our most recent versions of the documents you have cited.
We’ve got a little video to introduce it, or dive right in at http://data.copticscriptorium.org.
This is a BETA release, which means you might see a few things that need to be ironed out. (For one thing, our small corpus of documentary papyri are not yet in the system — stay tuned, and in the meanwhile you can still read and query them in ANNIS.) We are pretty pleased with how it’s turning out and look forward to future developments.
Many thanks to Bridget Almas of the Perseus Digital Library for helping us develop a canonical referencing system, and to Archimedes Digital for implementing the application.
We’ve released some new corpora (the papyri.info texts, for example) and some new documents to our existing corpora. You can download everything in three different formats from our GitHub repository. TEI XML, PAULA XML, and relANNIS.
To learn more about Coptic SCRIPTORIUM’s corpora, data model, and features, here is a video on how to use the tool ANNIS into the world of Coptic. Thanks goes to Caroline T. Schroeder for the video from her youtube channel.
(Originally posted on copticscriptorium.org)