Author: ctschroeder (page 4 of 5)

Coptic SCRIPTORIUM at the Coptic Congress

Much of the Coptic SCRIPTORIUM team is in Claremont this week for the Congress of the International Association of Coptic Studies.

We started out with a pre-conference, 2-day workshop with our KELLIA partners from Germany, where we worked on sharing data and technologies across digital Coptic projects.  Look here soon for an announcement about a really cool fruit of our labors.

Thursday there are two panels, and Friday there are two workshops.

Thursday 2-4 pm Coptic Digital Studies (Burkle 16)

David Brakke chair 

Prof. Dr. Caroline Schroeder, Coptic SCRIPTORIUM: A Digital Platform for Research in Coptic Language and Literature

Dr. Christine Luckritz Marquis, Reimagining the Apopthegmata Patrum in a Digital Culture

Prof. Amir Zeldes, A Quantitative Approach to Syntactic Alternations in Sahidic

Dr. Rebecca Krawiec, Charting Rhetorical Choices in Shenoute: Abraham our Father and I See Your Eagerness as case-studies

Thursday 4:30-6:30 Coptic Digital Humanities (Burkle 16)

Caroline T. Schroeder, Chair

Dr. Paul Dilley, Coptic Scriptorium beyond the Manuscript: Towards a Distant Reading of Coptic Texts

Mr. So Miyagawa and Dr. Marco Büchler, Computational Analysis of Text Reuse in Shenoute and Besa

Mr. Uwe Sikora, Text Encoding – Opportunities and Challenges

Ms. Eliese-Sophia Lincke, Optical Character Recogition (OCR) for Coptic. Testing Automated Digitization of Texts with OCRopy

 

Friday 11-12:30 Workshop on Coptic Fonts & Coptic Bible (AA)

Christian Askeland, Frank Feder

Friday 4:30-6 Digital Tools for Beginners (Workshop on Coptic SCRIPTORIUM)

Caroline T. Schroeder, Amir Zeldes, Rebecca S. Krawiec

New publication: Raiders of the Lost Corpus in DHQ

Amir Zeldes and Caroline T. Schroeder have recently published an article in Digital Humanities Quarterly about the need for digital tools and a digitized corpus for Coptic, and research questions that drive Coptic SCRIPTORIUM.

“Raiders of the Lost Corpus” is freely available on the DHQ website as part of a special issue on Digital Methods and Classical Studies edited by Neil Coffee and Neil W. Bernstein.  Schroeder presented an earlier version of this paper at the Digital Classics conference at the University at Buffalo in 2013.

 

Full, machine-annotated New Testament Corpus updated

We’ve updated and re-released our fully machine-annotated New Testament corpus.  sahidica.nt V2.1.0 contains the Sahidica NT text from Warren Wells Sahidica online NT, with the following features:

  • Annotated with our latest NLP tools (part of speech tagger 1.9, tokenizer 4.1.0, language tagger and lemmatizer include lexical entries from the Database and Dictionary of Greek Loanwords in Coptic (DDGLC))
  • Now contains the morph layer (annotating compound words and Coptic morphs such ⲣⲉϥ- ⲙⲛⲧ- ⲁⲧ-)
  • Visualizations for linguistic analysis

Please keep in mind that this fully machine-annotated corpus is more accurate than previous versions but will nonetheless contain more errors than a corpus manually corrected by a human.

Search and queries

For searches and queries using our ANNIS database to find specific terms, for this corpus we recommend searching the normalized words using regular expressions (to capture instances of the desired word that may still be embedded in a Coptic bound group, instances that our tokenizer may have missed):

Lemma searches are now also possible.  You may wish to search for the lemma using regular expressions, as well, in order to find lemmas of some compound words.  For example, the following search will find entries containing ⲥⲱⲧⲙ in the lemma:

The results include various forms of ⲥⲱⲧⲙ (including ⲥⲟⲧⲙ) lemmatized the lexical entry “ⲥⲱⲧⲙ“, compound words lemmatized to ⲥⲱⲧⲙ or to a lexical entry containing ⲥⲱⲧⲙ, and some bound groups containing the word form ⲥⲱⲧⲙ, which our tokenizer did not catch:

Frequency table of normalized words lemmatized to swtm or a lemma form containing swtm (May 2016 Sahidica corpus)

Frequency table of normalized words lemmatized to ⲥⲱⲧⲙ or a lemma form containing ⲥⲱⲧⲙ (May 2016 Sahidica corpus)

As you can see, most of the hits are accurate (e.g., ⲥⲟⲧⲙ, ⲁⲧⲥⲱⲧⲙ, ⲣⲁⲧⲥⲱⲧⲙ, ⲣⲉϥⲥⲱⲧⲙ); some of the Coptic bound groups did not tokenize properly (e.g., ⲉⲡⲥⲱⲧⲙ, ⲙⲁⲣⲟⲩⲥⲱⲧⲙ).  We expect accuracy to increase as we incorporate more texts into our corpora that have been machine annotated and then manually edited.

Reading by individual chapter

You can also read these documents and see the linguistic analysis visualizations at data.copticscriptorium.org/urn:cts:copticLit:nt.  The first documents you will see (Gospel of Mark, 1 Corinthians) are manually annotated.  Scroll down for “New Testament,” which is the full, machine-annotated Sahidica New Testament.  Click on “Chapter” to read each chapter as normalized Coptic (with English translation as a pop-up when you hover your cursor).  Click on “Analytic” for the normalized Coptic, part of speech analysis, and English translation for each chapter.  Please keep in mind the English translation provided is a free, open-access New Testament translation from the World English Bible; it is not a direct translation from the Coptic.

Note:  we know that our server is slow generating the documents for this corpus.  It may take several minutes to load; please be patient.  For faster access, use ANNIS.  Visualizations to read the chapters are available by clicking on the corpus and the icon for visualizations.

Accessing document visualizations of the Sahidica corpus via ANNIS

Accessing document visualizations of the Sahidica corpus via ANNIS

We hope this corpus is useful to researchers.

New Besa fragment published

We’ve published another small fragment of Besa on Coptic SCRIPTORIUM.  So Miyagawa has edited and translated the letter fragment known as On Lack of Food.  Read it online or search the letters of Besa we have published.

Annotation tools now include DDGLC Greek Loanword List

We are pleased to announce the release of our newest versions of some of our natural language processing tools for Coptic which incorporate the lemma list of loanwords developed by the Database and Dictionary of Greek Loanwords in Coptic (DDGLC).

The DDGLC is part of the KELLIA partnership between American and German digital Coptic projects funded by the NEH Office of Digital Humanities and the DFG.  The DDGLC, under the direction of Prof. Dr. Tonio Sebastian Richter, has been building a database of Greek loanwords in Coptic in order to facilitate the study of language contact, language borrowing, and multilingualism in Egypt.

We have integrated the Greek lemma list into our language of origin tagger, tokenizer and morphology analysis, and lemmatizer.

Our online natural language processing web service (which bundles together all of our NLP tools into one web application) also includes this new data from the DDGLC.

The Greek loanword list should greatly increase the accuracy of many of our tools.  If you use them, please let us know how it goes!

We at Coptic SCRIPTORIUM are grateful for this partnership and the generosity of the DDGLC team.

New born-digital edition of a Shenoute fragment

This winter we’ve released a new document we’ve been working on for a while.  It’s a born digital publication, in the sense that this document to our knowledge has never been published previously.  The edition and annotations here were produced by Elizabeth Platte (Reed College) and Rebecca S. Krawiec (Canisius College) directly from digital photographs of the manuscript for digital publication.

Read the manuscript transcription or the  normalized text, or query it in our database.

It’s a section of one of Shenoute’s texts for monks in volume three of his monastic Canons.  This 14-page (seven-folio) fragment now resides in the Bibliothèque Nationale in Paris and originally derives from the White Monastery codex known by the siglum MONB.YB.  We’ve released text and annotations for pages 307-320, which equate to the BN call number Ms Copte 130/2 ff. 51-57.  Digital photos are now available online at Gallica.

We’ve transcribed the text from images of the manuscript and then annotated it for manuscript information.  We’ve also broken the text down into the Coptic phrases known as “bound groups,” words, and morphs.  Then we’ve annotated it all for part of speech, loan words (Greek, Latin, etc.), and lemmas.

By “we” I mean primarily Platte and Krawiec .  Schroeder and Zeldes provided editorial review, as per our policy of having every published digital document reviewed by at least one editor.

As far as we know, this fragment has never been published; nor has any translation ever been published.  We don’t have a translation yet, either.

As the first born-digital edition, this document is an experiment for us.  Everything else we’ve worked with has been published in an edition, and sometimes even has an English translation that another scholar has published.  Even though we digitize from the original manuscript, previous editions and translations make the transcription, annotation, and editing process much easier.  This document is an unknown quantity.

This means we expect to have errors and welcome feedback on the document.

We also have no translation as of yet.  Our goal is to translate the document and then edit the transcription and annotations again as we work.  We hope to publish an essay on how the digital annotation process affected the creation of an edition.

In the meantime, use it to practice your Coptic.  Let us know if you find errors.  We’ll credit you.

Hiring: Digital Humanities Specialist for KELLIA and U Pacific Library

Digital Humanities Specialist at the University of the Pacific

The University of the Pacific seeks to hire a creative and collaborative Digital Humanities Specialist (DHS) to develop and manage strategies and infrastructure for curating digital and pre-digital content and data; provide computer programming support for projects; and author and/or co-author new digital humanities resources or scholarship.  This is a full-time 20-24 month pilot staff position. The DHS will work half-time contributing to the University Library’s archival and digital initiatives and half-time on an interdisciplinary NEH-funded Digital Humanities research project, KELLIA.  The DHS will report to Prof. Caroline T. Schroeder in the Department of Religious Studies and Michael Wurtz, the Head of Special Collections.

[Apply for this position at the University of the Pacific website]

KELLIA (Koptische/Coptic Electronic Language and Literature International Alliance) is an international DH project funded by the NEH and the DFG (Germany) to develop international standards and promote digital scholarship in the language and literature of ancient Egypt.  Researchers at the University of the Pacific, Georgetown University, Goettingen University, and Muenster University will be collaborating on digital methods in textual studies, linguistics, history, and manuscript studies.

The William Knox Holt Memorial Library on the Stockton campus serves a diverse community of liberal arts and professional faculty.  The Holt-Atherton Special Collections is home to several important American cultural heritage collections:  the multimedia archives of jazz legend Dave Brubeck; primary source documents from World War II Japanese-American Internment Camps; the papers of renowned naturalist and conservationist John Muir; and the papers and video archive of former San Francisco Mayor George R. Moscone.

Duties

The Digital Humanities Specialist may perform some but not all of the following duities and/or may be assigned additional duties:

  1. Develops and manages strategies and infrastructure for curating digital humanities content and data.
  2. Authors/co-authors new digital humanities resources or scholarship.
  3. Provides web development and programming for humanities research.
  4. Contributes to original research in digital humanities.
  5. Contributes to planning and decision-making about KELLIA’s technological development and long-term sustainability.
  6. Identifies, recommends, and implements linked open data technologies for humanities research.
  7. Identifies, recommends, and implements digital asset management and digital archiving in the Library.
  8. Participates in archival processing and reference duties in a special collections environment.  
  9. Designs forward-facing, interactive digital initiatives, websites, and/or exhibits.
  10. Provides library and special collections instruction.

 

QUALIFICATIONS:

Education/Work Experience/Certifications:

  • 1) MA in Digital Humanities OR 2) MLIS from an accredited ALA program or MA in Archival Studies with demonstrated digital/technological training/certification OR 3) MA in a Humanities discipline or related field with demonstrated digital/technological training or certification
  • Documented research and/or teaching experience in digital scholarship or pedagogy in a humanities discipline or related field
  • Demonstrated experience in web development and programming for research and/or teaching in the humanities or a related field (including archival studies and library and information science)

Skills/Knowledge and Expertise:

Required skills/knowledge and expertise

  • Excellent interpersonal, presentation, and communication skills
  • Demonstrated expertise in digital humanities technologies of web development (HTML, CSS, PHP, JavaScript), text encoding (XML), and programming (Python, Java)
  • Commitment to open access technologies and data for the humanities or a related field
  • Proven ability to work collaboratively in team-based initiatives
  • Proven ability to contribute to original scholarship in the humanities or a related field
  • Enthusiasm to build international and interdisciplinary research partnerships
  • Proven ability to work successfully with diverse populations and demonstrated commitment to promote and enhance diversity and inclusion
  • Knowledge of ancient languages, while welcome, is not a requirement for this position.

Preferred skills/knowledge and expertise

  • Demonstrated expertise with data curation techniques for a variety of digitized and born-digital media (text, code, images, music, etc.) and tools (e.g., DSpace, EPrints, Fedora, contentDM, etc.)
  • Demonstrated experience with linked data technologies and methodologies (e.g., JSON, RDF)
  • Experience managing CMS and LMS systems
  • Command of archival theory and best practices, especially as they relate to the particular issues posed by born-digital content.  

APPLICATION:

To apply for this position visit https://pacific.peopleadmin.com/postings/5822 and submit:

  • Letter of interest
  • CV
  • Names and contact information for 3 references

Review of applications will begin on September 1.

Questions about the position may be directed to cschroeder@pacific.edu and mwurtz@pacific.edu.  For questions about the online application process, please consult the online help system.

This position is funded by the University of the Pacific Library and the National Endowment for the Humanities (through the joint NEH-DFG bilateral Digital Humanities grant program).

New web application to read documents, cite data, and access data (BETA release)

We’re very excited to announce a new feature at Coptic SCRIPTORIUM.  We’ve created a new online web application that we think will allow users to read and reference our material much more easily.

Users can read Coptic documents on HTML pages taken from the data visualizations.  There are also easy links to our search tool ANNIS and to our GitHub repository for downloading files.

And we have a system of canonical URNS that provide persisent identifiers for documents, texts, authors, and text groups.   This means you can cite our data in your scholarship, and then readers will be able to back to our site and find our most recent versions of the documents you have cited.

We’ve got a little video to introduce it, or dive right in at http://data.copticscriptorium.org.

This is a BETA release, which means you might see a few things that need to be ironed out.  (For one thing, our small corpus of documentary papyri are not yet in the system — stay tuned, and in the meanwhile you can still read and query them in ANNIS.)  We are pretty pleased with how it’s turning out and look forward to future developments.

Many thanks to Bridget Almas of the Perseus Digital Library for helping us develop a canonical referencing system, and to Archimedes Digital for implementing the application.

 

 

Download release of all corpora in TEI XML, PAULA XML, relANNIS

We’ve released some new corpora (the papyri.info texts, for example) and some new documents to our existing corpora.  You can download everything in three different formats from our GitHub repository.  TEI XML, PAULA XML, and relANNIS.

New American-German DH Collaboration: KELLIA

Koptische/Coptic Electronic Language and Literature International Alliance is a collaboration between Coptic SCRIPTORIUM, the Göttingen Coptic Old Testament Project , and other partners. KELLIA has been awarded a joint NEH-DFG bilateral grant for sharing data and technologies and for developing common standards in Coptic DH.

 

Kellia photo

From Les Kellia. Ermitages coptes en Basse Égypte. Genève: Musèe d’art et d’histoire de Genève, 1989

Older posts Newer posts