We are happy to release the following new and revised documents to our corpora. A copy of the official release notes is below. The data is available for download from GitHub in TEI XML, PAULA XML, and relANNIS formats. The corpora can be viewed and accessed at data.copticscriptorium.org, and they all can be queried in ANNIS. We plan for another release with more documents in March 2017.
As always: if you have comments or corrections, please submit a pull request on GitHub or send us an email at contact [at] copticscriptorium [dot] org.
____
This corpus release includes new or revised documents for:
- 1 Corinthians: machine and manual annotations; new documents are chapters 13-16; edits to already published chapters include corrections and modifications to lemmas, normalization, part of speech, and/or tokenization to conform to evolving guidelines
- Mark: machine and manual annotations; edits to already published chapters include corrections and modifications to lemmas, normalization, part of speech, and/or tokenization to conform to evolving guidelines
- Not Because a Fox Barks (Shenoute): machine and manual annotations; edits to already published document include corrections and modifications to lemmas, normalization, part of speech, and/or tokenization to conform to evolving guidelines
- Besa letters: machine and manual annotations; edits to already published documents include corrections and modifications to lemmas, normalization, part of speech, and/or tokenization to conform to evolving guidelines
All other documents in our corpora are unchanged from the last release.
New metadata and corpus feature: We are beginning to add to our documents a metadata field called “order” which will allow us to present documents in a logical order for browsing or reading. We’ve implemented it in the Besa letters, corpus and will roll it out for other corpora in the future. Our Document Retrieval web application (data.copticscriptorium.org) now lists the documents in the order in which they appear in the manuscript tradition, when you filter for that corpus. Thus, users who wish to read or browse the documents in that order can do so easily.
Version control: We have set the version number on our document metadata, corpus metadata (in ANNIS), and release information (in GitHub) all to match. Version #s and dates are only revised when a document is revised. So if no documents in our AP corpus have been revised and republished, or no new documents for that corpus have been published, then the version # on the documents and corpus do not change. Only new and newly edited documents (and their corpora) will have version 2.2.0 and date 08 December 2016 in their metadata.