Coptic SCRIPTORIUM Blog

Page 8 of 10

New born-digital edition of a Shenoute fragment

This winter we’ve released a new document we’ve been working on for a while. It’s a born digital publication, in the sense that this document to our knowledge has never been published previously. The edition and annotations here were produced by Elizabeth Platte (Reed College) and Rebecca S. Krawiec (Canisius College) directly from digital photographs of the manuscript for digital publication.

Read the manuscript transcription or the normalized text, or query it in our database.

It’s a section of one of Shenoute’s texts for monks in volume three of his monastic Canons. This 14-page (seven-folio) fragment now resides in the Bibliothèque Nationale in Paris and originally derives from the White Monastery codex known by the siglum MONB.YB. We’ve released text and annotations for pages 307-320, which equate to the BN call number Ms Copte 130/2 ff. 51-57. Digital photos are now available online at Gallica.

We’ve transcribed the text from images of the manuscript and then annotated it for manuscript information. We’ve also broken the text down into the Coptic phrases known as “bound groups,” words, and morphs. Then we’ve annotated it all for part of speech, loan words (Greek, Latin, etc.), and lemmas.

By “we” I mean primarily Platte and Krawiec . Schroeder and Zeldes provided editorial review, as per our policy of having every published digital document reviewed by at least one editor.

As far as we know, this fragment has never been published; nor has any translation ever been published. We don’t have a translation yet, either.

As the first born-digital edition, this document is an experiment for us. Everything else we’ve worked with has been published in an edition, and sometimes even has an English translation that another scholar has published. Even though we digitize from the original manuscript, previous editions and translations make the transcription, annotation, and editing process much easier. This document is an unknown quantity.

This means we expect to have errors and welcome feedback on the document.

We also have no translation as of yet. Our goal is to translate the document and then edit the transcription and annotations again as we work. We hope to publish an essay on how the digital annotation process affected the creation of an edition.

In the meantime, use it to practice your Coptic. Let us know if you find errors. We’ll credit you.

2015 Society of Biblical Literature (SBL) Annual Meeting

November 17, 2015 / David Sriboonreuang / 0 Comments

The 2015 SBL conference is the largest gathering of scholars interested in the study of religion in the world. The latest research and the fostering of collegian contacts is present at each showcase. The 2015 meeting is taking place in Atlanta , GA and members of the Coptic SCRIPTORIUM team will be presiding at the conference.

Digital Humanities in Biblical, Early Jewish, and Christian Studies; Papyrology and Early Christian Backgrounds will have Caroline T. Schroeder chairing , Taking place in International A (International Level) – Marriot on 11/23/2015 at 9:00 AM to 11:30 AM.

Also,

“A Wild Patience Has Taken this Far”: Future Avenues of Feminist Scholarship will also have Caroline T. Schroeder presenting, Taking place in Room A602 (Atrium Level) – Marriot at 4:00 PM to 6:30 PM.

and

Art and Religions of Antiquity; Violence and Representations of Violence among Jews and Christians will have Christine Luckritz Marquis chairing, Taking place in Room 209 (Level 2) – Hilton on 11/23/2015 at 9:00 AM to 11:30 AM.

We are looking forward to their presentations and hope others are as well!

New Coptic morphological anlaysis

November 9, 2015 / Amir Zeldes / 0 Comments

A new component has been added to the Coptic NLP pipe-line at:

https://corpling.uis.georgetown.edu/coptic-nlp/

This adds morphological analysis of complex word forms, including multiple affixes (e.g. derived nouns with affixes such as Coptic ‘mnt’, equivalent to English ‘-ness’), compounds (noun-noun combinations) and complex verbs. Using the automatic morphological analysis will substantially reduce the amount of manual work involved in putting new texts online, meaning we will be able to concentrate on getting more texts out there faster, as well as developing new tools and ways of interacting with the data.

Coptic NLP pipeline Part 2

October 23, 2015 / David Sriboonreuang / 0 Comments

With the creation of the Coptic NLP (Natural Language Processor) pipeline by Amir Zeldes, it is now possible to run all our NLP tools simultaneously without the need to individually download and run them. The web application will tokenize bound groups into words, and will normalize the spelling of words and diacritics. It will also tag for part-of-speech, lemmatize, and tag for language of origin for borrowed (foreign) words. The interface is XML tolerant (preserves tags in the input) and the output is tagged in SGML. One of the options is to encode the lines breaks in a word or sentence which is useful for encoding manuscripts. However, keep in mind to double check results because the interface is still in the beta stage.

As an example, the screenshot below is a snippet from I See Your Eagerness from manuscript MONB.GL29.

Notice it contains an XML tag to encode a letter as “large ekthetic”. “Large ekthetic” corresponds to the alpha letter to designate it as a large character in the left margin of the manuscript’s column of text. This tag will be preserved in the output.

The results are shown above. Bounds group are shown and along with the part of speech tag abbreviated as “pos”. The snippet from I See Your Eagerness has also been lemmatized, shown as “lemma”. Also, near the bottom of the screenshot, the language of origin of borrowed (foreign) words in the snippet has been identified as “Greek”. These tags also correspond to the annotation layers you see in our multi-layer search and visualization tool ANNIS.

We hope the NLP service serves you well.

New Coptic NLP pipeline

October 5, 2015 / Amir Zeldes / 0 Comments

The entire tool chain of Coptic Natural Language Processing has been difficult to get running for some: it involves a bunch of command line tools, and special attention needed to be paid to coordinating word division expectations between the tools (normalization, tagging, language of origin detection). In order to make this process simpler, we now offer a Web interface that let’s you paste in Coptic text and run all tools on the input automatically, without installing anything. You can find the interface here:

https://corpling.uis.georgetown.edu/coptic-nlp/

The pipeline is XML tolerant (preserves tags in the input) and there’s also a machine actionable API version for external software to use these resources. Please let the Scriptorium team know if you’re using the pipeline and/or run into any problems.

Happy processing!

Introducing the Lemmatizer Tool

September 29, 2015 / David Sriboonreuang / 1 Comment

A new tool available at the Coptic SCRIPTORIUM webpage is the lemmatizer. The lemmatizer annotates words with their dictionary head word. The purpose of lemmatization is to group together the different inflected forms of a word so they can be analyzed as a single item.

For example, in English, the verb ‘to walk’ may appear as ‘walk’, ‘walked’, ‘walks’, and ‘walking’. The base form, ‘walk’, might be the word to look up in the dictionary, and it would be called the lemma for the word.

In Coptic, plural nouns sometimes have different forms, and verbs have different forms. A lemmatized corpus is useful for searching all the forms of a word and also if you want to link all the forms of a word to an online dictionary for future use.

Two of the corpora we have are annotated with lemmas: Not because a fox barks (Shenoute) and the Apophthegmata. As illustrated in the image below, I have searched for ⲟⲩⲱϩ, to live or dwell.

Also note that in the corpus list, I have chosen to look in the corpus ‘Not Because a Fox Barks’, as indicated by the highlighted blue selection.

Notice the word forms corresponding to the lemma I have searched for becomes highlighted in the corpus that was chosen. Two forms of the verb ⲟⲩⲱϩ appear in the results: ⲟⲩⲱϩ and ⲟⲩⲏϩ. In addition, there is also an annotation grid.

Clicking on the annotations grid reveals a plethora of information including the translation of the text along with its parts of speech. Hovering over the text using your computer’s mouse allows you to also find parts that may be related. For example, below the POS (part of speech) is V (verb), and when the mouse is hovering over V, a highlight indicates what word in the text the verb is referring to.

The tool is a feature in our part-of-speech tagger, so you can lemmatize at the same time you annotate a corpus for parts of speech. See https://github.com/CopticScriptorium/tagger-part-of-speech/.

Additional guidelines are available here: https://github.com/CopticScriptorium/tagger-part-of-speech/blob/master/Coptic%20SCRIPTORIUM%20lemmatization%20guidelines.pdf

Wishing the NEH a happy 50th Anniversary!

September 29, 2015 / David Sriboonreuang / 0 Comments

The Coptic SCRIPTORIUM team would like to wish the National Endowment for the Humanities (NEH), a happy 50th anniversary! We would like to thank the NEH for supporting Coptic SCRIPTORIUM. Cheers to the NEH!

Hiring: Digital Humanities Specialist for KELLIA and U Pacific Library

August 11, 2015 / ctschroeder / 0 Comments

Digital Humanities Specialist at the University of the Pacific

The University of the Pacific seeks to hire a creative and collaborative Digital Humanities Specialist (DHS) to develop and manage strategies and infrastructure for curating digital and pre-digital content and data; provide computer programming support for projects; and author and/or co-author new digital humanities resources or scholarship. This is a full-time 20-24 month pilot staff position. The DHS will work half-time contributing to the University Library’s archival and digital initiatives and half-time on an interdisciplinary NEH-funded Digital Humanities research project, KELLIA. The DHS will report to Prof. Caroline T. Schroeder in the Department of Religious Studies and Michael Wurtz, the Head of Special Collections.

[Apply for this position at the University of the Pacific website]

KELLIA (Koptische/Coptic Electronic Language and Literature International Alliance) is an international DH project funded by the NEH and the DFG (Germany) to develop international standards and promote digital scholarship in the language and literature of ancient Egypt. Researchers at the University of the Pacific, Georgetown University, Goettingen University, and Muenster University will be collaborating on digital methods in textual studies, linguistics, history, and manuscript studies.

The William Knox Holt Memorial Library on the Stockton campus serves a diverse community of liberal arts and professional faculty. The Holt-Atherton Special Collections is home to several important American cultural heritage collections: the multimedia archives of jazz legend Dave Brubeck; primary source documents from World War II Japanese-American Internment Camps; the papers of renowned naturalist and conservationist John Muir; and the papers and video archive of former San Francisco Mayor George R. Moscone.

Duties

The Digital Humanities Specialist may perform some but not all of the following duities and/or may be assigned additional duties:

Develops and manages strategies and infrastructure for curating digital humanities content and data.
Authors/co-authors new digital humanities resources or scholarship.
Provides web development and programming for humanities research.
Contributes to original research in digital humanities.
Contributes to planning and decision-making about KELLIA’s technological development and long-term sustainability.
Identifies, recommends, and implements linked open data technologies for humanities research.
Identifies, recommends, and implements digital asset management and digital archiving in the Library.
Participates in archival processing and reference duties in a special collections environment.
Designs forward-facing, interactive digital initiatives, websites, and/or exhibits.
Provides library and special collections instruction.

QUALIFICATIONS:

Education/Work Experience/Certifications:

1) MA in Digital Humanities OR 2) MLIS from an accredited ALA program or MA in Archival Studies with demonstrated digital/technological training/certification OR 3) MA in a Humanities discipline or related field with demonstrated digital/technological training or certification
Documented research and/or teaching experience in digital scholarship or pedagogy in a humanities discipline or related field
Demonstrated experience in web development and programming for research and/or teaching in the humanities or a related field (including archival studies and library and information science)

Skills/Knowledge and Expertise:

Required skills/knowledge and expertise

Excellent interpersonal, presentation, and communication skills
Demonstrated expertise in digital humanities technologies of web development (HTML, CSS, PHP, JavaScript), text encoding (XML), and programming (Python, Java)
Commitment to open access technologies and data for the humanities or a related field
Proven ability to work collaboratively in team-based initiatives
Proven ability to contribute to original scholarship in the humanities or a related field
Enthusiasm to build international and interdisciplinary research partnerships
Proven ability to work successfully with diverse populations and demonstrated commitment to promote and enhance diversity and inclusion
Knowledge of ancient languages, while welcome, is not a requirement for this position.

Preferred skills/knowledge and expertise

Demonstrated expertise with data curation techniques for a variety of digitized and born-digital media (text, code, images, music, etc.) and tools (e.g., DSpace, EPrints, Fedora, contentDM, etc.)
Demonstrated experience with linked data technologies and methodologies (e.g., JSON, RDF)
Experience managing CMS and LMS systems
Command of archival theory and best practices, especially as they relate to the particular issues posed by born-digital content.

APPLICATION:

To apply for this position visit https://pacific.peopleadmin.com/postings/5822 and submit:

Letter of interest
CV
Names and contact information for 3 references

Review of applications will begin on September 1.

Questions about the position may be directed to cschroeder@pacific.edu and mwurtz@pacific.edu. For questions about the online application process, please consult the online help system.

This position is funded by the University of the Pacific Library and the National Endowment for the Humanities (through the joint NEH-DFG bilateral Digital Humanities grant program).

New web application to read documents, cite data, and access data (BETA release)

July 16, 2015 / ctschroeder / 0 Comments

We’re very excited to announce a new feature at Coptic SCRIPTORIUM. We’ve created a new online web application that we think will allow users to read and reference our material much more easily.

Users can read Coptic documents on HTML pages taken from the data visualizations. There are also easy links to our search tool ANNIS and to our GitHub repository for downloading files.

And we have a system of canonical URNS that provide persisent identifiers for documents, texts, authors, and text groups. This means you can cite our data in your scholarship, and then readers will be able to back to our site and find our most recent versions of the documents you have cited.

We’ve got a little video to introduce it, or dive right in at http://data.copticscriptorium.org.

This is a BETA release, which means you might see a few things that need to be ironed out. (For one thing, our small corpus of documentary papyri are not yet in the system — stay tuned, and in the meanwhile you can still read and query them in ANNIS.) We are pretty pleased with how it’s turning out and look forward to future developments.

Many thanks to Bridget Almas of the Perseus Digital Library for helping us develop a canonical referencing system, and to Archimedes Digital for implementing the application.

Download release of all corpora in TEI XML, PAULA XML, relANNIS

June 12, 2015 / ctschroeder / 0 Comments

We’ve released some new corpora (the papyri.info texts, for example) and some new documents to our existing corpora. You can download everything in three different formats from our GitHub repository. TEI XML, PAULA XML, and relANNIS.