Tag: dictionary

Coptic Dictionary and ANNIS database down

We are sorry to report that the server that hosts the Coptic Dictionary Online and Coptic Scriptorium’s ANNIS database are down. (Likewise some of the NLP tools and internal tools like GitDox are down.)

We are working on fixing the problem, but for now we do not have a timeline for when they will be up and running.

In the meantime reading and browsing texts at http://data.copticscriptorium.org still work.

Thank you for your patience! We will let you know when the systems are up again.

Example of research using the online Coptic Dictionary: standalone G Thomas transcription

Martijn Linssen, an independent researcher, has been working on the Gospel of Thomas for some time and recently published a stand-alone “interactive Coptic-English translation” of the Gospel of Thomas on his Academia.edu site. The Coptic is linked to entries in the online Coptic Dictionary! We invite you to check it out!

We are always excited to see what kind of work people are doing with our project. Please get in touch if you’ve been using the dictionary or any of Coptic Scriptorium’s tools, corpora, annotations, etc., in your work!

The online dictionary is part of the KELLIA collaboration between Coptic Scriptorium (Georgetown University and the University of Oklahoma), the Berlin-Brandenburg Academy, the Goettingen Academy, the Free University in Berlin, and Goettingen University.

Comprehensive Coptic Lexicon v1.2

The “Thesaurus Linguae Aegyptiae” project (“Strukturen und Transformationen des Wortschatzes der ägyptischen Sprache”, BBAW), the “Database and Dictionary of Greek Loanwords in Coptic” (DDGLC, Freie Universität Berlin), and “Coptic Scriptorium: Digital Research in Coptic Language and Literature” are pleased to announce the latest release of the “Comprehensive Coptic Lexicon”: Version 1.2. The raw data can be downloaded from: 

  • D. Burns, F. Feder, K. John, M. Kupreyev, et al. 2020-07-24. Comprehensive Coptic Lexicon: Including Loanwords from Ancient Greek, Berlin: Freie Universität Berlin, http://dx.doi.org/10.17169/refubium-27566

The processed data has been published by the 

The major new features include: 

  • Standardized use of parentheses “( )” in word forms.
  • Optimizated data structure (e.g., <sense/> element now contains a unique ID, facilitating the ongoing work on linking CCL to the databases of semantic relations such as Coptic WordNet).
  • Correction of orthographic, grammatical and semantic information of the existing entries and addition of new entries.
  • Linking to Perseus Greek morphology tool via the Greek head words. DDGLC lemma IDs are now displayed in the entry view of Coptic Dictionary Online.
  • Improved usability of the section of Greek loanwords due to exclusion or change of a number of  senses.
  • Link to attestation search for nouns filtered by entity-type (e.g., search for ⲟⲩⲟⲛ standing for a person, an animal, or an inanimate object) in Coptic Scriptorium.
  • Phrase network visualization of most common word sequences containing nouns, verbs and prepositions.

For the full description of the Version 1.2 please refer to the the “Release Notes”: https://refubium.fu-berlin.de/bitstream/handle/fub188/27813/Comprehensive_Coptic_Lexicon_v1.2_Release_Notes.pdf?sequence=5&isAllowed=y 


The Comprehensive Coptic Lexicon V 1.2 now contains 11263 entries and 31847 forms of Egyptian-Coptic and Greek-Coptic datasets. TLA, DDGLC and Coptic Scriptorium invite you to take a look at the new data and would welcome your feedback.

The Coptic Dictionary Online wins the 2019 DH Award for Best Tool

We are very happy to announce that the Coptic Dictionary Online (CDO) has won the 2019 Digital Humanities Award in the category Best Tool or Suite of Tools! The dictionary interface, shown below, gives users access to searches by Coptic word forms, definitions in three languages (English, French and German), pattern and part of speech searches, and more. We have also added links to quantitative corpus data, including attestation and collocation analyses from data published by Coptic Scriptorium.

The dictionary is the result of a collaboration between Coptic Scriptorium and lexicographers in Germany at the Berlin-Brandenburg and Göttingen Academies of Science, the Free University in Berlin, and the Universities of Göttingen and Leipzig. This collaboration has been funded by the National Endowment for the Humanities (NEH) and the German Research Foundation (DFG).

To read more about the dictionary’s structure and creation, see Feder et al. (2018).

A view of the Coptic Dictionary Online

Comprehensive Coptic Lexicon v1 on Coptic Dictionary Online

The “Database and Dictionary of Greek Loanwords in Coptic” (DDGLC, Freie Universität Berlin), the research project “Strukturen und Transformationen des Wortschatzes der ägyptischen Sprache ”Thesaurus Linguae Aegyptiae” (TLA, Berlin-Brandenburgische Akademie der Wissenschaften) and “Coptic Scriptorium: Digital Research in Coptic Language and Literature” are happy to announce the release of version 1 of the “Comprehensive Coptic Lexicon“. The processed data has been published by the Coptic Dictionary Online:

  • Coptic Dictionary Online, ed. by the Koptische/Coptic Electronic Language and Literature International Alliance (KELLIA), https://coptic-dictionary.org/

The raw data can be downloaded at:

  • D. Burns, F. Feder, K. John, M. Kupreyev, et al. 12.5.2019. Comprehensive Coptic Lexicon: Including Loanwords from Ancient Greek, Berlin: Freie Universität Berlin, https://doi.org/10.17169/refubium-2333

The Comprehensive Coptic Lexicon includes ca. 8,000 Egyptian-Coptic lemmata with ca. 10,000 word forms, as well as ca. 3,250 Greek-Coptic lemmata with ca. 10,000 forms.

DDGLC, TLA and Coptic Scriptorium invite you to take a look at the new data and would welcome your feedback.

Corpora release 2.6

We are pleased to announce release 2.6 of our corpora! Some exciting new things:

  • Expanded Coptic Old Testament
  • More gold-standard treebanked texts
  • Updated files of Shenoute’s Abraham Our Father and Acephalous Work 22
  • New metadata fields to indicate whether documents have been machine annotated or if an editor has reviewed the machine annotations

Expanded Coptic Old Testament

Our Coptic Old Testament corpus is updated and expanded, with digital text from the our partners at the Digital Edition of the Coptic Old Testament project in Goettingen.  All annotations in this corpora are fully machine-processed (no human editing, because it’s BIG). You can read through all the text in two different visualizations online and search it in the ANNIS database:

  1. analytic: the normalized text segmented into words aligned with part of speech tags; each verse is aligned with Brenton’s English translation of the Septuagint
  2. chapter: the normalized text presented as chapters and verses; each word links to the online Coptic dictionary
  3. ANNIS search: full search of text, lemmas, parts of speech, syntactic annotations, etc. (see our ANNIS tips if you’re new to ANNIS)

Please keep in mind this corpus is fully machine-annotated, and we currently do not have the capacity to make manual changes to a corpus of this size.  If you notice systemic errors (the same thing tagged incorrectly often, for example) please let us know.  Otherwise, please be patient: as the tools improve, we will update this corpus.

We’ve also machine-aligned the text with Brenton’s English translation of the Septuagint. It’s possible there will be some misalignments.  Thanks for your understanding!

Treebanks

We’ve added more documents to our separate gold-standard treebank corpus.  (Want to learn more about treebanks?) In this corpus, the treebank/syntactic annotations have been manually corrected; the documents are part of the Universal Dependencies project for cross-language linguistics research.  New treebanked documents include selections from 1 Corinthians, the Gospel of Mark, Shenoute’s Abraham Our Father, Shenoute’s Acephalous Work 22, and the Martyrdom of Victor.  This means the self-standing treebank corpus is expanded, and any documents we’ve treebanked have updated word segmentation, part of speech tagging, etc., in their regular corpora.

Updated Shenoute Documents

Documents in the corpora for Shenoute’s Abraham Our Father and Acephalous Work 22 have several updates.

First, some documents are in our treebank corpus and are now significantly more accurate in terms of word segmentation, tagging, etc.

Second, we’ve added chapter and verse segmentation to these works.  Since there are no comprehensive print editions of these works with versification, we’ve applied our own chapter and verse numbers.  We recognize that versification is arbitrary, but nonetheless useful for citation.  For texts transcribed from manuscripts, chapter divisions typically occur when an ekthesis occurs in the manuscript. (Ekthesis describes a letter hanging in the margin.)  They do not necessarily occur with each ekthesis (if ekthesis is very frequent), but we try to make the divisions occur only with ekthesis.  Verses typically equal one sentence, sometimes more than one sentence per verse for very short sentences or more than one verse per sentence for very long Shenoutean sentences.

Third, we’ve added “Order” metadata to make it easier to read a work in order if it’s broken into multiple documents.  Check out Abraham Our Father, for example: the first document in the list is the beginning of the work.

Screen shot of list of documents in Abraham Our Father Corpus

Screen shot of list of documents in Abraham Our Father Corpus

When you’re reading through a document, click on “Next” to get the next document in reading order.  (If there are multiple manuscript witnesses to a work, we’ll send you to the next document in order with the fewest lacunae, or missing segments.)

Screen shot of beginning of Abraham Our Father

Screen shot of beginning of Abraham Our Father

Of course, you can always click on documents in any order you want to read however you like!

And everything is fully searchable across all documents in ANNIS.

New Metadata Fields Documenting Annotation

We sometimes get asked: which corpora do scholars annotate and which corpora are machine-annotated?  The answer is complicated — almost everything is machine annotated, with different levels of scholarly review.  So we’re adding three new metadata fields to help show users what kinds of annotation each document get:

  • Segmentation refers to word segmentation (or “tokenization”) performed by the tokenizer tool.
  • Tagging refers to part of speech, language of origin, and lemma tagging performed by our tagger
  • Parsing refers to dependency syntax annotations (which are part of our treebanking)

Each of these fields contains one of the following values:

  • automatic: fully machine annotated; no manual review or corrections to the tool output
  • checked: the tool has annotated the text, and a scholar has reviewed the annotations before publication
  • gold: the tools have been run and the annotations have received thorough review; this value usually applies only to documents that have been treebanked by a scholar (requiring rigorous review of word segmentation and tagging along the way)

For example, in the first image of document metadata visible in ANNIS, the document has automatic parsing; a scholar has checked the word segmentation and tagging.

 

Screenshot of document metadata showing checked word segmentation and tagging

Screenshot of document metadata showing checked word segmentation and tagging

In the next image of document metadata, a scholar has treebanked the text, making segmentation, tagging, and parsing all gold.

Screenshot of document metadata showing gold level annotations

Screenshot of document metadata showing gold level annotations

 

We are rolling out these annotations with each new corpus and newly edited corpus; not every corpus has them, yet — only the ones in this release.  Our New Testament and Old Testament corpora are machine-annotated (automatic) in all annotations.

 

We hope you enjoy!

 

Coptic Scriptorium’s summer adventures

This has been a summer of writing, annotating, and conferencing!

German PI Dr. Prof. Heike Behlmer and US PI Caroline T. Schroeder.

German PI Dr. Prof. Heike Behlmer and US PI Caroline T. Schroeder at Schroeder’s recent visit to the Coptic Old Testament Project at the University and the Goettingen Academy.

We are winding up our collaborative grant with our German partners (Coptic Old Testament Project, the Thesaurus Linguae Aegyptiae, the DDGLC, and the INTF).  Our German and US PI’s met in Göttingen, Germany, earlier this summer.   We’re working on writing our final reports and exchanging data and technologies.  We’re hoping to publish more annotated texts later this year.

We also have had a series of conference papers, including a paper on one of our collaboration’s proudest achievements, the online Coptic Dictionary.  Here are some of the lectures and conference presentations this summer:

Miyagawa, So and Zeldes, Amir (2018) “A Semantic Map of the Coptic Complementizer če Based on Corpus Analysis: Grammaticalization and Areal Typology in Africa,” International Workshop on Semantic maps: Where do we stand and where are we going? Liège, Belgium. June.

Schroeder, Caroline T. (2018) “A Homily is a Homily is a Homily is a Corpus:  Digital Approaches to Shenoute,” The Transmission of Early Christian Homilies from Late Antiquity to the Middle Ages Conference, Goethe-Universität Frankfurt am Main. June.

Schroeder, Caroline T. (2018) “Coptic Studies in the Digital Age,” Department of Ancient History, Macquarie University. July.

Schroeder, Caroline T. (2018) “Coptic Studies in a Digital Age,” UCLA-St. Shenouda Foundation Coptic Studies Conference, Los Angeles. July.

Feder, Frank,Maxim Kupreyev, Emma Manning, Caroline T. Schroeder, Amir Zeldes. “A Linked Coptic Dictionary Online”. Proceedings of LaTeCH 2018 – The 11th SIGHUM Workshop at COLING2018. Santa Fe, NM. August. [paper online]

As always, thanks to all our contributors, collaborators, and board members for their insight and labor.

New paper about the Coptic Dictionary Online

Coptic Dictionary Online

A new paper about the Coptic Dictionary Online will be presented at this year’s ACL SIGHUM workshop on Language Technology for Cultural Heritage. This work is a collaboration between the Akademie der Wissenschaften zu Göttingen, Berlin-Brandenburgische Akademie der Wissenschaften, and Coptic Scriptorium.

The paper presents the structure and underlying principles of the dictionary and its Web interface, and also gives a quantitative analysis of the dictionary’s coverage of Coptic lexical material in corpus data. You can check out the pre-print here:

Feder, Frank, Kupreyev, Maxim, Manning, Emma, Schroeder, Caroline T. and Zeldes, Amir (2018) “A Linked Coptic Dictionary Online”. Proceedings of LaTeCH 2018 – The 11th SIGHUM Workshop at COLING2018. Santa Fe, NM.

[paper]

Online lexicon linked to our corpora!

We have a great announcement today.  Along with our German research partners as part of the KELLIA project, we are releasing an online Coptic lexicon linked to our corpora.

For over three years, the Berlin-Brandenberg Academy of Sciences has been working on a digital lexicon for Coptic.  Frank Feder began the work.  Frank Feder began creating it, encoding definitions for Coptic lemmas in three languages: English, French, and German. The final entries were completed by Maxim Kupreyev at the academy and Julien Delhez in Göttingen.  The base lexicon file is encoded in TEI-XML.  This summer Amir Zeldes and his student, Emma Manning, created a web interface.  We will release the source code soon as part of the KELLIA project.

It may still need some refinements and updates, but we think it is a useful achievement that will help anyone interested in Coptic.

Entries have definitions in French, German, and English.

You can use the lexicon as a standalone website.  For the pilot launch, it’s on the Georgetown server, but make no mistake, this is major research outcome for the BBAW.

We’ve also linked the dictionary to our texts in Coptic SCRIPTORIUM.  You can click on the ANNIS icon in the dictionary entry to search all corpora in Coptic SCRIPTORIUM for that word.

lexicon-to-ANNIS The link also goes in the other direction.  In the normalized visualization of our texts, you can click on a word and get taken to the entry for that word’s lemma in the dictionary.  You can do this in the normalized visualization in our web application for reading and accessing texts (pictured below), or in the normalized visualization embedded in the ANNIS tool.

Screen Shot 2016-07-28 at 10.22.39 AM

Of course there will be refinements and developments to come.  We would love to hear your feedback on what works, what could work better, and where you find glitches.

On a more personal note, when Amir and I first came up with the idea for the project, we dreamed of creating a Perseus Digital Library for Coptic.  This dictionary is a huge step forward.  And honestly, I myself had almost nothing to do with this piece of the project.  It’s an example of the importance and power of collaboration.