Category: Tutorials

New Tutorials & Recent Workshop Wrap-up

Coptic Scriptorium team members Caroline T. Schroeder and Rebecca Krawiec recently led a workshop on Digital Corpora and Digital Editions at the North American Patristics Society annual meeting.   We created detailed tutorials useful to both beginners and more advanced users on our GitHub site.  These tutorials cover:

  • an introduction to digital editions and corpora
  • working with the online Coptic Dictionary
  • simple and complex searching Coptic literature in our database ANNIS
  • creating a digital corpus with Epidoc TEI-XML annotations and natural language processing

We invite everyone to use these tutorials on their own.  They’re designed for for self-paced work.

We were pleased to participate in the pre-conference Digital Humanities workshops that included another session on mapping led by Sarah Bond and Jennifer Barry.  We had attendees from four countries, who ranged in their careers from graduate students to senior professors.  Thanks to NAPS for hosting these workshops, and to the NEH and the DFG for making our work possible.

ANNIS embeds for websites and blogs

Starting this week, there’s a new feature in our ANNIS web interface: ANNIS embeds.

The ANNIS interface can now give you an HTML snippet that you can embed on your webpage, blog post and more.

Here’s an example of an embedded visualizer for a passage from Besa’s letter to Aphthonia, in which he recounts Aphthonia’s threat to go to another monastery:

(MONB.BA 47, urn:cts:copticLit:besa.aphthonia.monbba)

The code snippet for this visualization is as follows:

<iframe src="https://corpling.uis.georgetown.edu/annis/?id=31e2a273-426f-4aaf-922a-7fa0f0b311e1" width="100%" height="500"/>

To get an embeddable snippet, click the share icon at the top left of your search result and choose the visualization you want. To share the entire set of results, use the share button at the top right of the results page. Additionally, if you want to share an ANNIS search result via e-mail, you can still copy and paste the URL as before, but now you can also get a specific shareable link for individual hits using the same share button .

Let us know if you have any feedback!

Introducing the Lemmatizer Tool

A new tool available at the Coptic SCRIPTORIUM webpage is the lemmatizer. The lemmatizer annotates words with their dictionary head word. The purpose of lemmatization is to group together the different inflected forms of a word so they can be analyzed as a single item.

For example, in English, the verb ‘to walk’ may appear as ‘walk’, ‘walked’, ‘walks’, and ‘walking’. The base form, ‘walk’, might be the word to look up in the dictionary, and it would be called the lemma for the word.

In Coptic, plural nouns sometimes have different forms, and verbs have different forms.  A lemmatized corpus is useful for searching all the forms of a word and also if you want to link all the forms of a word to an online dictionary for future use.

Two of the corpora we have are annotated with lemmas: Not because a fox barks (Shenoute) and the Apophthegmata. As illustrated in the image below, I have searched for ⲟⲩⲱϩ, to live or dwell.

1

Also note that in the corpus list, I have chosen to look in the corpus ‘Not Because a Fox Barks’, as indicated by the highlighted blue selection.

scriptorium ANNIS Corpus Search

Notice the word forms corresponding to the lemma I have searched for becomes highlighted in the corpus that was chosen.  Two forms of the verb ⲟⲩⲱϩ appear in the results:  ⲟⲩⲱϩ and ⲟⲩⲏϩ.  In addition, there is also an annotation grid.

Desctop screenshot

Clicking on the annotations grid reveals a plethora of information including the translation of the text along with its parts of speech. Hovering over the text using your computer’s mouse allows you to also find parts that may be related. For example, below  the POS (part of speech) is V (verb), and when the mouse is hovering over V, a highlight indicates what word in the text the verb is referring to.

2

3

The tool is a feature in our part-of-speech tagger, so you can lemmatize at the same time you annotate a corpus for parts of speech.  See https://github.com/CopticScriptorium/tagger-part-of-speech/.

Additional guidelines are available here:  https://github.com/CopticScriptorium/tagger-part-of-speech/blob/master/Coptic%20SCRIPTORIUM%20lemmatization%20guidelines.pdf

Introducing the project texts and data model, and how to use ANNIS

To learn more about Coptic SCRIPTORIUM’s corpora, data model, and features,  here is a video on how to use the tool ANNIS into the world of Coptic. Thanks goes to Caroline T. Schroeder for the video from her youtube channel.

(Originally posted on copticscriptorium.org)

© 2017

Theme by Anders NorenUp ↑