Category: KELLIA (page 1 of 3)

Example of research using the online Coptic Dictionary: standalone G Thomas transcription

Martijn Linssen, an independent researcher, has been working on the Gospel of Thomas for some time and recently published a stand-alone “interactive Coptic-English translation” of the Gospel of Thomas on his Academia.edu site. The Coptic is linked to entries in the online Coptic Dictionary! We invite you to check it out!

We are always excited to see what kind of work people are doing with our project. Please get in touch if you’ve been using the dictionary or any of Coptic Scriptorium’s tools, corpora, annotations, etc., in your work!

The online dictionary is part of the KELLIA collaboration between Coptic Scriptorium (Georgetown University and the University of Oklahoma), the Berlin-Brandenburg Academy, the Goettingen Academy, the Free University in Berlin, and Goettingen University.

Comprehensive Coptic Lexicon v1.2

The “Thesaurus Linguae Aegyptiae” project (“Strukturen und Transformationen des Wortschatzes der ägyptischen Sprache”, BBAW), the “Database and Dictionary of Greek Loanwords in Coptic” (DDGLC, Freie Universität Berlin), and “Coptic Scriptorium: Digital Research in Coptic Language and Literature” are pleased to announce the latest release of the “Comprehensive Coptic Lexicon”: Version 1.2. The raw data can be downloaded from: 

  • D. Burns, F. Feder, K. John, M. Kupreyev, et al. 2020-07-24. Comprehensive Coptic Lexicon: Including Loanwords from Ancient Greek, Berlin: Freie Universität Berlin, http://dx.doi.org/10.17169/refubium-27566

The processed data has been published by the 

The major new features include: 

  • Standardized use of parentheses “( )” in word forms.
  • Optimizated data structure (e.g., <sense/> element now contains a unique ID, facilitating the ongoing work on linking CCL to the databases of semantic relations such as Coptic WordNet).
  • Correction of orthographic, grammatical and semantic information of the existing entries and addition of new entries.
  • Linking to Perseus Greek morphology tool via the Greek head words. DDGLC lemma IDs are now displayed in the entry view of Coptic Dictionary Online.
  • Improved usability of the section of Greek loanwords due to exclusion or change of a number of  senses.
  • Link to attestation search for nouns filtered by entity-type (e.g., search for ⲟⲩⲟⲛ standing for a person, an animal, or an inanimate object) in Coptic Scriptorium.
  • Phrase network visualization of most common word sequences containing nouns, verbs and prepositions.

For the full description of the Version 1.2 please refer to the the “Release Notes”: https://refubium.fu-berlin.de/bitstream/handle/fub188/27813/Comprehensive_Coptic_Lexicon_v1.2_Release_Notes.pdf?sequence=5&isAllowed=y 


The Comprehensive Coptic Lexicon V 1.2 now contains 11263 entries and 31847 forms of Egyptian-Coptic and Greek-Coptic datasets. TLA, DDGLC and Coptic Scriptorium invite you to take a look at the new data and would welcome your feedback.

Digital Coptic 3 – program online!

The program for the third edition of Digital Coptic is now online. Check out the workshop website for the list of projects, talks and presenters.

Please join us for the workshop on July 12 and 13 – participants will receive a Zoom link and password for interactive presentations and discussion, and the workshop will also be cast to YouTube for larger audiences and offline viewing after the workshop. We look forward to hearing all the talks!

Comprehensive Coptic Lexicon v1 on Coptic Dictionary Online

The “Database and Dictionary of Greek Loanwords in Coptic” (DDGLC, Freie Universität Berlin), the research project “Strukturen und Transformationen des Wortschatzes der ägyptischen Sprache ”Thesaurus Linguae Aegyptiae” (TLA, Berlin-Brandenburgische Akademie der Wissenschaften) and “Coptic Scriptorium: Digital Research in Coptic Language and Literature” are happy to announce the release of version 1 of the “Comprehensive Coptic Lexicon“. The processed data has been published by the Coptic Dictionary Online:

  • Coptic Dictionary Online, ed. by the Koptische/Coptic Electronic Language and Literature International Alliance (KELLIA), https://coptic-dictionary.org/

The raw data can be downloaded at:

  • D. Burns, F. Feder, K. John, M. Kupreyev, et al. 12.5.2019. Comprehensive Coptic Lexicon: Including Loanwords from Ancient Greek, Berlin: Freie Universität Berlin, https://doi.org/10.17169/refubium-2333

The Comprehensive Coptic Lexicon includes ca. 8,000 Egyptian-Coptic lemmata with ca. 10,000 word forms, as well as ca. 3,250 Greek-Coptic lemmata with ca. 10,000 forms.

DDGLC, TLA and Coptic Scriptorium invite you to take a look at the new data and would welcome your feedback.

Spring 2019 Corpora Release 2.7.0

We at Coptic Scriptorium are pleased to version 2.7.0 of our corpora.  The release includes several new documents:

  • several more sayings in the Coptic Apophthegmata Patrum (edited & annotated by Marina Ghaly)
  • additional fragments of Shenoute’s sermon Some Kinds of People Sift Dirt (edited & annotated by Christine Luckritz Marquis, editions provided by David Brakke)
  • Besa’s letter On Vigilance (edited and annotated by So Miyagawa and others)
  • several more fragments of the monastic canons of Apa Johannes (annotated by Elizabeth Platte and Caroline T. Schroeder, digital edition provided by Diliana Atanassova)

All documents have metadata for word segmentation, tagging, and parsing to indicate whether those annotations are machine annotations only (automatic), checked for accuracy by an expert in Coptic (checked), or closely reviewed for accuracy, usually as a result of manual parsing (gold).

You can search all corpora at https://corpling.uis.georgetown.edu/annis/scriptorium and download the data in 4 formats (relANNIS database files, PAULA XML files, TEI XML files, and SGML files in Tree-tagger format).

Our total annotated corpora are now at over 780,000 words; corpora that have human editors who reviewed the machine annotations amount to over 100,000 words.

Enjoy!

Corpora release 2.6

We are pleased to announce release 2.6 of our corpora! Some exciting new things:

  • Expanded Coptic Old Testament
  • More gold-standard treebanked texts
  • Updated files of Shenoute’s Abraham Our Father and Acephalous Work 22
  • New metadata fields to indicate whether documents have been machine annotated or if an editor has reviewed the machine annotations

Expanded Coptic Old Testament

Our Coptic Old Testament corpus is updated and expanded, with digital text from the our partners at the Digital Edition of the Coptic Old Testament project in Goettingen.  All annotations in this corpora are fully machine-processed (no human editing, because it’s BIG). You can read through all the text in two different visualizations online and search it in the ANNIS database:

  1. analytic: the normalized text segmented into words aligned with part of speech tags; each verse is aligned with Brenton’s English translation of the Septuagint
  2. chapter: the normalized text presented as chapters and verses; each word links to the online Coptic dictionary
  3. ANNIS search: full search of text, lemmas, parts of speech, syntactic annotations, etc. (see our ANNIS tips if you’re new to ANNIS)

Please keep in mind this corpus is fully machine-annotated, and we currently do not have the capacity to make manual changes to a corpus of this size.  If you notice systemic errors (the same thing tagged incorrectly often, for example) please let us know.  Otherwise, please be patient: as the tools improve, we will update this corpus.

We’ve also machine-aligned the text with Brenton’s English translation of the Septuagint. It’s possible there will be some misalignments.  Thanks for your understanding!

Treebanks

We’ve added more documents to our separate gold-standard treebank corpus.  (Want to learn more about treebanks?) In this corpus, the treebank/syntactic annotations have been manually corrected; the documents are part of the Universal Dependencies project for cross-language linguistics research.  New treebanked documents include selections from 1 Corinthians, the Gospel of Mark, Shenoute’s Abraham Our Father, Shenoute’s Acephalous Work 22, and the Martyrdom of Victor.  This means the self-standing treebank corpus is expanded, and any documents we’ve treebanked have updated word segmentation, part of speech tagging, etc., in their regular corpora.

Updated Shenoute Documents

Documents in the corpora for Shenoute’s Abraham Our Father and Acephalous Work 22 have several updates.

First, some documents are in our treebank corpus and are now significantly more accurate in terms of word segmentation, tagging, etc.

Second, we’ve added chapter and verse segmentation to these works.  Since there are no comprehensive print editions of these works with versification, we’ve applied our own chapter and verse numbers.  We recognize that versification is arbitrary, but nonetheless useful for citation.  For texts transcribed from manuscripts, chapter divisions typically occur when an ekthesis occurs in the manuscript. (Ekthesis describes a letter hanging in the margin.)  They do not necessarily occur with each ekthesis (if ekthesis is very frequent), but we try to make the divisions occur only with ekthesis.  Verses typically equal one sentence, sometimes more than one sentence per verse for very short sentences or more than one verse per sentence for very long Shenoutean sentences.

Third, we’ve added “Order” metadata to make it easier to read a work in order if it’s broken into multiple documents.  Check out Abraham Our Father, for example: the first document in the list is the beginning of the work.

Screen shot of list of documents in Abraham Our Father Corpus

Screen shot of list of documents in Abraham Our Father Corpus

When you’re reading through a document, click on “Next” to get the next document in reading order.  (If there are multiple manuscript witnesses to a work, we’ll send you to the next document in order with the fewest lacunae, or missing segments.)

Screen shot of beginning of Abraham Our Father

Screen shot of beginning of Abraham Our Father

Of course, you can always click on documents in any order you want to read however you like!

And everything is fully searchable across all documents in ANNIS.

New Metadata Fields Documenting Annotation

We sometimes get asked: which corpora do scholars annotate and which corpora are machine-annotated?  The answer is complicated — almost everything is machine annotated, with different levels of scholarly review.  So we’re adding three new metadata fields to help show users what kinds of annotation each document get:

  • Segmentation refers to word segmentation (or “tokenization”) performed by the tokenizer tool.
  • Tagging refers to part of speech, language of origin, and lemma tagging performed by our tagger
  • Parsing refers to dependency syntax annotations (which are part of our treebanking)

Each of these fields contains one of the following values:

  • automatic: fully machine annotated; no manual review or corrections to the tool output
  • checked: the tool has annotated the text, and a scholar has reviewed the annotations before publication
  • gold: the tools have been run and the annotations have received thorough review; this value usually applies only to documents that have been treebanked by a scholar (requiring rigorous review of word segmentation and tagging along the way)

For example, in the first image of document metadata visible in ANNIS, the document has automatic parsing; a scholar has checked the word segmentation and tagging.

 

Screenshot of document metadata showing checked word segmentation and tagging

Screenshot of document metadata showing checked word segmentation and tagging

In the next image of document metadata, a scholar has treebanked the text, making segmentation, tagging, and parsing all gold.

Screenshot of document metadata showing gold level annotations

Screenshot of document metadata showing gold level annotations

 

We are rolling out these annotations with each new corpus and newly edited corpus; not every corpus has them, yet — only the ones in this release.  Our New Testament and Old Testament corpora are machine-annotated (automatic) in all annotations.

 

We hope you enjoy!

 

Rebecca Krawiec Featured Digital Humanities Researcher at Canisius

Rebecca Krawiec presenting in the Canisius College Digital Humanities speaker series

Rebecca Krawiec presenting in the Canisius College Digital Humanities speaker series

Project participant Rebecca Krawiec, Professor and Chair of Religious Studies and Theology at Canisius College, presented her work with Coptic Scriptorium as part of the Digital Humanities speaker series at Canisius. Her talk, “Studying Ancient Egyptian Christianity in a Modern Digital World,” discussed how the many layers of annotation in Coptic Scriptorium’s corpora enhance research into late antique Christianity.  Read a description of this event on the Canisius College website or watch the lecture online.

Recent presentations by Coptic Scriptorium team members (post 1 of 2)!

This fall, Coptic Scriptorium team members have presented their work in a number of environments.

Research Talk, Georgetown University Linguistics Speaker Series

In September, as part of the Georgetown University Department of Linguistics Friday Speaker Series, the project presented a summary of our latest work and our goals for the new NEH Digital Humanities Advancement Grant we received. “A Linked Digital Environment for Coptic Studies”.  Caroline T. Schroeder provided an overview of the project. Amir Zeldes presented the technology required to machine-process Coptic text in order to produce an annotated, digital corpus and linked online lexicon. Rebecca Krawiec discussed the research potential of an annotated digital corpus for research in early monasticism. Elizabeth Platte introduced the concept of linked data and demonstrated our linked geographic data features. (Christine Luckritz Marquis was scheduled present research on space and place in monastic literature but was unfortunately sidelined by a hurricane.)

Rebecca Krawiec, Elizabeth Platte, Amir Zeldes, Caroline T. Schroeder at Georgetown University, 2018

Rebecca Krawiec, Elizabeth Platte, Amir Zeldes, Caroline T. Schroeder at Georgetown University, 2018

Material of Christian Apocrypha Conference

In December, Caroline T. Schroeder gave a paper at the Material of Christian Apocrypha Conference hosted at the University of Virginia, under the auspices of the North American Society for the Study of Christian Apocryphal Literature.  Dr. Schroeder’s paper, “The Materiality of Digital Apocryphal Studies,” addressed the role of digital humanities in studying the colonial history of manuscripts, people and places in early Christian literature, and public humanities.  It was part of a panel on Christian Apocrypha and the Digital Humanities, which also included papers by James Walters (Rochester College) on “The Digital Syriac Corpus: A New Resource for the Study of Syriac Texts” and  Brandon Hawk (Rhode Island College) on “The Medieval Social Network of the Gospel of Pseudo-Matthew”.  Datasets used in the presentation are available at Dr. Schroeder’s GitHub site.

Caroline T. Schroeder presenting about the manuscripts digitized by Coptic Scriptorium

Caroline T. Schroeder presenting about the manuscripts digitized by Coptic Scriptorium

Caroline T. Schroeder presenting visualizations of occurrences of proper names in some of Coptic Scriptorium's corpora

Caroline T. Schroeder presenting visualizations of occurrences of proper names in some of Coptic Scriptorium’s corpora

Coptic Scriptorium’s summer adventures

This has been a summer of writing, annotating, and conferencing!

German PI Dr. Prof. Heike Behlmer and US PI Caroline T. Schroeder.

German PI Dr. Prof. Heike Behlmer and US PI Caroline T. Schroeder at Schroeder’s recent visit to the Coptic Old Testament Project at the University and the Goettingen Academy.

We are winding up our collaborative grant with our German partners (Coptic Old Testament Project, the Thesaurus Linguae Aegyptiae, the DDGLC, and the INTF).  Our German and US PI’s met in Göttingen, Germany, earlier this summer.   We’re working on writing our final reports and exchanging data and technologies.  We’re hoping to publish more annotated texts later this year.

We also have had a series of conference papers, including a paper on one of our collaboration’s proudest achievements, the online Coptic Dictionary.  Here are some of the lectures and conference presentations this summer:

Miyagawa, So and Zeldes, Amir (2018) “A Semantic Map of the Coptic Complementizer če Based on Corpus Analysis: Grammaticalization and Areal Typology in Africa,” International Workshop on Semantic maps: Where do we stand and where are we going? Liège, Belgium. June.

Schroeder, Caroline T. (2018) “A Homily is a Homily is a Homily is a Corpus:  Digital Approaches to Shenoute,” The Transmission of Early Christian Homilies from Late Antiquity to the Middle Ages Conference, Goethe-Universität Frankfurt am Main. June.

Schroeder, Caroline T. (2018) “Coptic Studies in the Digital Age,” Department of Ancient History, Macquarie University. July.

Schroeder, Caroline T. (2018) “Coptic Studies in a Digital Age,” UCLA-St. Shenouda Foundation Coptic Studies Conference, Los Angeles. July.

Feder, Frank,Maxim Kupreyev, Emma Manning, Caroline T. Schroeder, Amir Zeldes. “A Linked Coptic Dictionary Online”. Proceedings of LaTeCH 2018 – The 11th SIGHUM Workshop at COLING2018. Santa Fe, NM. August. [paper online]

As always, thanks to all our contributors, collaborators, and board members for their insight and labor.

New paper about the Coptic Dictionary Online

Coptic Dictionary Online

A new paper about the Coptic Dictionary Online will be presented at this year’s ACL SIGHUM workshop on Language Technology for Cultural Heritage. This work is a collaboration between the Akademie der Wissenschaften zu Göttingen, Berlin-Brandenburgische Akademie der Wissenschaften, and Coptic Scriptorium.

The paper presents the structure and underlying principles of the dictionary and its Web interface, and also gives a quantitative analysis of the dictionary’s coverage of Coptic lexical material in corpus data. You can check out the pre-print here:

Feder, Frank, Kupreyev, Maxim, Manning, Emma, Schroeder, Caroline T. and Zeldes, Amir (2018) “A Linked Coptic Dictionary Online”. Proceedings of LaTeCH 2018 – The 11th SIGHUM Workshop at COLING2018. Santa Fe, NM.

[paper]

Older posts