Category: KELLIA

New release – Coptic Treebank V2

We are happy to announce the release of version 2 of the Coptic Universal Dependency Treebank. With over 8,500 tokens from 14 documents, the Treebank is the largest syntactically annotated resource in Coptic. The annotation scheme follows the Universal Dependency Guidelines, version 2, and is therefore comparable with UD data from 70 treebanks in 50 languages, including English, Latin, Classical Greek, Arabic, Hebrew and more.

You can search in the Treebank using ANNIS. For example, the following query finds cases of verbs dominating a complement clause (e.g. “say …. that …”):

pos="V" ->dep[func="ccomp"] norm

[Link to this query]

Online lexicon linked to our corpora!

We have a great announcement today.  Along with our German research partners as part of the KELLIA project, we are releasing an online Coptic lexicon linked to our corpora.

For over three years, the Berlin-Brandenberg Academy of Sciences has been working on a digital lexicon for Coptic.  Frank Feder began the work.  Frank Feder began creating it, encoding definitions for Coptic lemmas in three languages: English, French, and German. The final entries were completed by Maxim Kupreyev at the academy and Julien Delhez in Göttingen.  The base lexicon file is encoded in TEI-XML.  This summer Amir Zeldes and his student, Emma Manning, created a web interface.  We will release the source code soon as part of the KELLIA project.

It may still need some refinements and updates, but we think it is a useful achievement that will help anyone interested in Coptic.

Entries have definitions in French, German, and English.

You can use the lexicon as a standalone website.  For the pilot launch, it’s on the Georgetown server, but make no mistake, this is major research outcome for the BBAW.

We’ve also linked the dictionary to our texts in Coptic SCRIPTORIUM.  You can click on the ANNIS icon in the dictionary entry to search all corpora in Coptic SCRIPTORIUM for that word.

lexicon-to-ANNIS The link also goes in the other direction.  In the normalized visualization of our texts, you can click on a word and get taken to the entry for that word’s lemma in the dictionary.  You can do this in the normalized visualization in our web application for reading and accessing texts (pictured below), or in the normalized visualization embedded in the ANNIS tool.

Screen Shot 2016-07-28 at 10.22.39 AM

Of course there will be refinements and developments to come.  We would love to hear your feedback on what works, what could work better, and where you find glitches.

On a more personal note, when Amir and I first came up with the idea for the project, we dreamed of creating a Perseus Digital Library for Coptic.  This dictionary is a huge step forward.  And honestly, I myself had almost nothing to do with this piece of the project.  It’s an example of the importance and power of collaboration.

Coptic SCRIPTORIUM at the Coptic Congress

Much of the Coptic SCRIPTORIUM team is in Claremont this week for the Congress of the International Association of Coptic Studies.

We started out with a pre-conference, 2-day workshop with our KELLIA partners from Germany, where we worked on sharing data and technologies across digital Coptic projects.  Look here soon for an announcement about a really cool fruit of our labors.

Thursday there are two panels, and Friday there are two workshops.

Thursday 2-4 pm Coptic Digital Studies (Burkle 16)

David Brakke chair 

Prof. Dr. Caroline Schroeder, Coptic SCRIPTORIUM: A Digital Platform for Research in Coptic Language and Literature

Dr. Christine Luckritz Marquis, Reimagining the Apopthegmata Patrum in a Digital Culture

Prof. Amir Zeldes, A Quantitative Approach to Syntactic Alternations in Sahidic

Dr. Rebecca Krawiec, Charting Rhetorical Choices in Shenoute: Abraham our Father and I See Your Eagerness as case-studies

Thursday 4:30-6:30 Coptic Digital Humanities (Burkle 16)

Caroline T. Schroeder, Chair

Dr. Paul Dilley, Coptic Scriptorium beyond the Manuscript: Towards a Distant Reading of Coptic Texts

Mr. So Miyagawa and Dr. Marco Büchler, Computational Analysis of Text Reuse in Shenoute and Besa

Mr. Uwe Sikora, Text Encoding – Opportunities and Challenges

Ms. Eliese-Sophia Lincke, Optical Character Recogition (OCR) for Coptic. Testing Automated Digitization of Texts with OCRopy

 

Friday 11-12:30 Workshop on Coptic Fonts & Coptic Bible (AA)

Christian Askeland, Frank Feder

Friday 4:30-6 Digital Tools for Beginners (Workshop on Coptic SCRIPTORIUM)

Caroline T. Schroeder, Amir Zeldes, Rebecca S. Krawiec

Full, machine-annotated New Testament Corpus updated

We’ve updated and re-released our fully machine-annotated New Testament corpus.  sahidica.nt V2.1.0 contains the Sahidica NT text from Warren Wells Sahidica online NT, with the following features:

  • Annotated with our latest NLP tools (part of speech tagger 1.9, tokenizer 4.1.0, language tagger and lemmatizer include lexical entries from the Database and Dictionary of Greek Loanwords in Coptic (DDGLC))
  • Now contains the morph layer (annotating compound words and Coptic morphs such ⲣⲉϥ- ⲙⲛⲧ- ⲁⲧ-)
  • Visualizations for linguistic analysis

Please keep in mind that this fully machine-annotated corpus is more accurate than previous versions but will nonetheless contain more errors than a corpus manually corrected by a human.

Search and queries

For searches and queries using our ANNIS database to find specific terms, for this corpus we recommend searching the normalized words using regular expressions (to capture instances of the desired word that may still be embedded in a Coptic bound group, instances that our tokenizer may have missed):

Lemma searches are now also possible.  You may wish to search for the lemma using regular expressions, as well, in order to find lemmas of some compound words.  For example, the following search will find entries containing ⲥⲱⲧⲙ in the lemma:

The results include various forms of ⲥⲱⲧⲙ (including ⲥⲟⲧⲙ) lemmatized the lexical entry “ⲥⲱⲧⲙ“, compound words lemmatized to ⲥⲱⲧⲙ or to a lexical entry containing ⲥⲱⲧⲙ, and some bound groups containing the word form ⲥⲱⲧⲙ, which our tokenizer did not catch:

Frequency table of normalized words lemmatized to swtm or a lemma form containing swtm (May 2016 Sahidica corpus)

Frequency table of normalized words lemmatized to ⲥⲱⲧⲙ or a lemma form containing ⲥⲱⲧⲙ (May 2016 Sahidica corpus)

As you can see, most of the hits are accurate (e.g., ⲥⲟⲧⲙ, ⲁⲧⲥⲱⲧⲙ, ⲣⲁⲧⲥⲱⲧⲙ, ⲣⲉϥⲥⲱⲧⲙ); some of the Coptic bound groups did not tokenize properly (e.g., ⲉⲡⲥⲱⲧⲙ, ⲙⲁⲣⲟⲩⲥⲱⲧⲙ).  We expect accuracy to increase as we incorporate more texts into our corpora that have been machine annotated and then manually edited.

Reading by individual chapter

You can also read these documents and see the linguistic analysis visualizations at data.copticscriptorium.org/urn:cts:copticLit:nt.  The first documents you will see (Gospel of Mark, 1 Corinthians) are manually annotated.  Scroll down for “New Testament,” which is the full, machine-annotated Sahidica New Testament.  Click on “Chapter” to read each chapter as normalized Coptic (with English translation as a pop-up when you hover your cursor).  Click on “Analytic” for the normalized Coptic, part of speech analysis, and English translation for each chapter.  Please keep in mind the English translation provided is a free, open-access New Testament translation from the World English Bible; it is not a direct translation from the Coptic.

Note:  we know that our server is slow generating the documents for this corpus.  It may take several minutes to load; please be patient.  For faster access, use ANNIS.  Visualizations to read the chapters are available by clicking on the corpus and the icon for visualizations.

Accessing document visualizations of the Sahidica corpus via ANNIS

Accessing document visualizations of the Sahidica corpus via ANNIS

We hope this corpus is useful to researchers.

New Besa fragment published

We’ve published another small fragment of Besa on Coptic SCRIPTORIUM.  So Miyagawa has edited and translated the letter fragment known as On Lack of Food.  Read it online or search the letters of Besa we have published.

Annotation tools now include DDGLC Greek Loanword List

We are pleased to announce the release of our newest versions of some of our natural language processing tools for Coptic which incorporate the lemma list of loanwords developed by the Database and Dictionary of Greek Loanwords in Coptic (DDGLC).

The DDGLC is part of the KELLIA partnership between American and German digital Coptic projects funded by the NEH Office of Digital Humanities and the DFG.  The DDGLC, under the direction of Prof. Dr. Tonio Sebastian Richter, has been building a database of Greek loanwords in Coptic in order to facilitate the study of language contact, language borrowing, and multilingualism in Egypt.

We have integrated the Greek lemma list into our language of origin tagger, tokenizer and morphology analysis, and lemmatizer.

Our online natural language processing web service (which bundles together all of our NLP tools into one web application) also includes this new data from the DDGLC.

The Greek loanword list should greatly increase the accuracy of many of our tools.  If you use them, please let us know how it goes!

We at Coptic SCRIPTORIUM are grateful for this partnership and the generosity of the DDGLC team.

Coptic NLP pipeline Part 2

With the creation of the Coptic NLP (Natural Language Processor) pipeline by Amir Zeldes, it is now possible to run all our NLP tools simultaneously without the need to individually download and run them. The web application will tokenize bound groups into words, and will normalize the spelling of words and diacritics. It will also tag for part-of-speech, lemmatize, and tag for language of origin for borrowed (foreign) words. The interface is XML tolerant (preserves tags in the input) and the output is tagged in SGML. One of the options is to encode the lines breaks in a word or sentence which is useful for encoding manuscripts. However, keep in mind to double check results because the interface is still in the beta stage.

As an example, the screenshot below is a snippet from I See Your Eagerness from manuscript MONB.GL29.

 

1.1

Notice it contains an XML tag to encode a letter as “large ekthetic”. “Large ekthetic” corresponds to the alpha letter to designate it as a large character in the left margin of the manuscript’s column of text.  This tag will be preserved in the output.

2

The results are shown above. Bounds group are shown and along with the part of speech tag abbreviated as “pos”. The snippet from I See Your Eagerness has also been lemmatized, shown as “lemma”. Also, near the bottom of the screenshot, the language of origin of borrowed (foreign) words in the snippet has been identified as “Greek”.  These tags also correspond to the annotation layers you see in our multi-layer search and visualization tool ANNIS.

We hope the NLP service serves you well.

 

Hiring: Digital Humanities Specialist for KELLIA and U Pacific Library

Digital Humanities Specialist at the University of the Pacific

The University of the Pacific seeks to hire a creative and collaborative Digital Humanities Specialist (DHS) to develop and manage strategies and infrastructure for curating digital and pre-digital content and data; provide computer programming support for projects; and author and/or co-author new digital humanities resources or scholarship.  This is a full-time 20-24 month pilot staff position. The DHS will work half-time contributing to the University Library’s archival and digital initiatives and half-time on an interdisciplinary NEH-funded Digital Humanities research project, KELLIA.  The DHS will report to Prof. Caroline T. Schroeder in the Department of Religious Studies and Michael Wurtz, the Head of Special Collections.

[Apply for this position at the University of the Pacific website]

KELLIA (Koptische/Coptic Electronic Language and Literature International Alliance) is an international DH project funded by the NEH and the DFG (Germany) to develop international standards and promote digital scholarship in the language and literature of ancient Egypt.  Researchers at the University of the Pacific, Georgetown University, Goettingen University, and Muenster University will be collaborating on digital methods in textual studies, linguistics, history, and manuscript studies.

The William Knox Holt Memorial Library on the Stockton campus serves a diverse community of liberal arts and professional faculty.  The Holt-Atherton Special Collections is home to several important American cultural heritage collections:  the multimedia archives of jazz legend Dave Brubeck; primary source documents from World War II Japanese-American Internment Camps; the papers of renowned naturalist and conservationist John Muir; and the papers and video archive of former San Francisco Mayor George R. Moscone.

Duties

The Digital Humanities Specialist may perform some but not all of the following duities and/or may be assigned additional duties:

  1. Develops and manages strategies and infrastructure for curating digital humanities content and data.
  2. Authors/co-authors new digital humanities resources or scholarship.
  3. Provides web development and programming for humanities research.
  4. Contributes to original research in digital humanities.
  5. Contributes to planning and decision-making about KELLIA’s technological development and long-term sustainability.
  6. Identifies, recommends, and implements linked open data technologies for humanities research.
  7. Identifies, recommends, and implements digital asset management and digital archiving in the Library.
  8. Participates in archival processing and reference duties in a special collections environment.  
  9. Designs forward-facing, interactive digital initiatives, websites, and/or exhibits.
  10. Provides library and special collections instruction.

 

QUALIFICATIONS:

Education/Work Experience/Certifications:

  • 1) MA in Digital Humanities OR 2) MLIS from an accredited ALA program or MA in Archival Studies with demonstrated digital/technological training/certification OR 3) MA in a Humanities discipline or related field with demonstrated digital/technological training or certification
  • Documented research and/or teaching experience in digital scholarship or pedagogy in a humanities discipline or related field
  • Demonstrated experience in web development and programming for research and/or teaching in the humanities or a related field (including archival studies and library and information science)

Skills/Knowledge and Expertise:

Required skills/knowledge and expertise

  • Excellent interpersonal, presentation, and communication skills
  • Demonstrated expertise in digital humanities technologies of web development (HTML, CSS, PHP, JavaScript), text encoding (XML), and programming (Python, Java)
  • Commitment to open access technologies and data for the humanities or a related field
  • Proven ability to work collaboratively in team-based initiatives
  • Proven ability to contribute to original scholarship in the humanities or a related field
  • Enthusiasm to build international and interdisciplinary research partnerships
  • Proven ability to work successfully with diverse populations and demonstrated commitment to promote and enhance diversity and inclusion
  • Knowledge of ancient languages, while welcome, is not a requirement for this position.

Preferred skills/knowledge and expertise

  • Demonstrated expertise with data curation techniques for a variety of digitized and born-digital media (text, code, images, music, etc.) and tools (e.g., DSpace, EPrints, Fedora, contentDM, etc.)
  • Demonstrated experience with linked data technologies and methodologies (e.g., JSON, RDF)
  • Experience managing CMS and LMS systems
  • Command of archival theory and best practices, especially as they relate to the particular issues posed by born-digital content.  

APPLICATION:

To apply for this position visit https://pacific.peopleadmin.com/postings/5822 and submit:

  • Letter of interest
  • CV
  • Names and contact information for 3 references

Review of applications will begin on September 1.

Questions about the position may be directed to cschroeder@pacific.edu and mwurtz@pacific.edu.  For questions about the online application process, please consult the online help system.

This position is funded by the University of the Pacific Library and the National Endowment for the Humanities (through the joint NEH-DFG bilateral Digital Humanities grant program).

New American-German DH Collaboration: KELLIA

Koptische/Coptic Electronic Language and Literature International Alliance is a collaboration between Coptic SCRIPTORIUM, the Göttingen Coptic Old Testament Project , and other partners. KELLIA has been awarded a joint NEH-DFG bilateral grant for sharing data and technologies and for developing common standards in Coptic DH.

 

Kellia photo

From Les Kellia. Ermitages coptes en Basse Égypte. Genève: Musèe d’art et d’histoire de Genève, 1989

© 2017

Theme by Anders NorenUp ↑