If you are citizen of an European Union member nation, you may not use this service unless you are at least 16 years old.
You already know Dokkio is an AI-powered assistant to organize & manage your digital files & messages. Very soon, Dokkio will support Outlook as well as One Drive. Check it out today!

GuidelinesOdd

Page history last edited by PBworks 17 years, 6 months ago

Title: EpiDoc Guidelines TEI P5 ODD

This is an EpiDoc UserStory

Originator(s): GabrielBodard and ZanetaAu
Associated Bug/Feature Request(s):
- n/a
Markup List subject line(s)
- n/a

Summary

As a prelude to converting all of the EpiDoc Guidelines and recommendations to a TEI P5 schema, some work has been done on converting the XML of the Guidelines themselves to a P5 ODD.

Examples

Dependencies and rules

Questions

Acceptance Tests

Full report on EpiDoc ODD (as delivered by ZanetaAu in Paris, June 2006)

The study of Greek and Latin inscriptions has been an established sub-discipline of classics since the Renaissance. In the 19th Century several multi-volume publications of inscriptions – called corpora and intended to be comprehensive in scope -- began to be published in Europe. From this point it became especially important to agree on consistent and effective editorial and typographic conventions.

These conventions reached an important watershed in 1931, when an international meeting of interested scholars convened at Leiden to work out agreed notational practices for epigraphic and papyrological transcription. The resulting "Leiden Conventions"-- subject to subsequent refinements -- continue to provide Greek and Latin epigraphists with a common editorial shorthand down to the present day.

New technologies have brought new challenges and opportunities. Since 1999, the members of the EpiDoc community have been working to migrate epigraphic publication practices into the digital arena, extending them to take advantage of the opportunities offered by new media. Together, they are developing free and open guidelines and tools for the encoding of digital epigraphic editions in XML, using the TEI tagset.

To date, the Community have concentrated their efforts on developing uniform guidance for all aspects of epigraphic transcription, and on forwarding three definitive EpiDoc projects: one at Oxford (focused on the Roman-era Writing Tablets from Vindolanda), another at King's College London (publishing the inscribed documents from the ancient city of Aphrodisias in Turkey), and a third at Brown University (producing a comprehensive corpus of Greek and Latin inscriptions held in U.S. collections). A number of other projects are now under consideration or in active preparation, touching on Greek, Latin, Aramaic, Etruscan, and Lycian epigraphy.

The current version of the EpiDoc guidelines is freely available online, either as a downloadable set of XML files from Sourceforge, or in HTML on the Stoa Consortium website.

As an example of the Guidelines' scope, let's briefly examine the entry for erroneous omission. When the editor restores characters which she believes were inadvertently omitted by the stonecutter, the traditional, typographic convention is to enclose the restored characters between angle brackets. In EpiDoc XML we use the TEI element <supplied> with an @reason='omitted'.

The three sections of interest are:

1. Traditional typographic representation - hard coded in Leiden

2. EpiDoc Encoding - hard coded in EpiDoc TEI P4

3. The appearance generated by the standard EpiDoc XSLT stylesheets from the EpiDoc encoding.

At the moment, the output can be compared visually to the traditional typographic representation and therefore acts as a test on both the stylesheets and the markup. We plan to automate such tests (using XSLT and Schematron), and use the Guidelines themselves to store machine readable configuration data showing equivalences between Leiden and TEI. These equivalences may then govern the behavior of EpiDoc tools used in many projects.

Hypothetical EpiDoc Use Case

This diagram illustrating the use of EpiDoc tools and practices in developing digital epigraphic publications, may clarify our motives for moving from our present reliance on TEI P4 to P5.

Firstly, trained transcribers can enter texts, using the well-known, standard Leiden conventions, or pull them from existing digital sources. (a)

Next, these texts may then be converted into XML with the help of the Chapel Hill Electronic Text Converter (CHET-C), one of the open-source tools developed by the EpiDoc community. (b)

CHET-C is a JavaScript application that runs in a browser and draws on a series of regular expressions to convert conventionally typed epigraphic texts into TEI XML. Let's take a couple of lines from a Latin inscription as an example.

Fusco et De[x]/tro co(n)[s(ulibus)]

Chet-c output, converted to EpiDoc XML:

<ab>Fusco et De<supplied reason="lost">x</supplied><lb type="worddiv"/>tro <expan>co<supplied reason="abbreviation">n</supplied><supplied reason="lost">s<supplied reason="abbreviation">ulibus</supplied></supplied></expan></ab>

You will notice:

1. [] = supplied @reason='lost'
2. () = supplied @reason='abbreviation'

Alternatively, the epigraphic text can be transcribed by hand in XML; individual practitioners and projects must decide how to create their texts.

In any case, initial transcriptions in XML generally require iterative revision before they are ready for release. Extended commentary, figures and other material must be added, and the transcribed text must be checked for accuracy. Projects may elect to automatically transform the epigraphic transcription back to conventional typographic form. (c) It may then be programmatically compared with the originally typed text and problem cases referred to a human expert. The EpiDoc community maintains a set of "standard" XSLT files -- originally designed for HTML output -- that could be modified to perform this function.

Maintaining consistency across this roundtrip has proved a challenge. It is essential that the encoding recommendations of the guidelines match the constraints enshrined in the EpiDoc schema. Similarly, the regular expressions used by CHET-C to convert traditional typographic notation to XML should match just what the typographic exempla in the Guidelines specify, and produce exactly the corresponding TEI encoding. The standard XSLT should also perform the same transformation in the other direction. We have demonstrated to ourselves that humans should not undertake to maintain such consistency "by hand" -- the result is both error-prone and time-consuming.

The ODD mechanism, introduced by TEI for P5, takes care of synchronizing the guidance and the governing schema by unifying both explanations and schema specifications. By turning the EpiDoc Guidelines into a P5 ODD, and using Roma, this manual task will be automated. Hence our motivation to move quickly to P5.

As we have shown, we have built XSL transformations to render both the HTML Guidelines and everyday EpiDoc source files, with consistency checking in mind. A quick visual check, comparing the typographic exempla to the transformed output of the corresponding example encoding, verifies the integrity of the Guidelines entry and corresponding transformation template. Recent experiments, in which a small amount of linking information is added to the Guidelines source files, have demonstrated that Schematron can be used to perform and evaluate such checks across the entire guidelines automatically. This automated checking releases developers from tedious, manual patrolling of guidelines and configuration files, freeing them to focus on more rapid extension and perfection of Guidelines and tools.

With the addition of a few more elements in each Guidelines entry, these automated tests can be expanded to encompass the regular expressions required by CHET-C. The configuration file for CHET-C may then be generated from the Guidelines source itself via a separate XSL transformation. In this way, duplication is eliminated, manual effort minimized and the probability of consistency across tools radically improved.

The EpiDoc P5 schemata and relationships

Let me take a minute to explain the colours. The blue lines show files which are man made. The pink lines are validation lines. Pink boxes represent a schema and blue is for the ODD files.

Because our guidelines must contain such a variety of highly structured information, and must support more functions than those for which a regular TEI ODD is intended, we have elected to employ a hierarchy of ODD-generated schemata to govern and reinforce the entire process.

At the end of the line, we seek to produce high quality digital editions of inscriptions. The human encoders of these editions require EXPLANATIONS and EXAMPLES to guide their work, and a coordinating SCHEMA (epidoc.rnc) to help ensure that their encoded documents adhere to the Guidelines. The EPIDOC GUIDELINES ODD provides these components. As added insurance, the SCHEMA generated from the GUIDELINES ODD is also used to validate the examples encoded in the guidelines.

Without going into too much detail, we’ve created new div-like elements such as <epiEntry> and <typogBox> for the different elements and typographic examples. We have also altered some elements, for example, we have changed <cRefPattern> to <regex> to support our regular expression testing. Usage examples are encapsulated with <exemplum> and <egXML>.

The growing number of individuals who contribute content to the EpiDoc Guidelines are a wildly heterogeneous group with regard to programming and encoding skills, TEI familiarity and epigraphic training, but they all bring perspectives of value to the Guidelines process. In order to make the creation of the enhanced P5 guidelines a less arduous task for this group, we have developed EpiMetaDoc, a separate TEI customization governing the encoding of the Guidelines. EpiMetaDoc provides answers to questions like "what is the section for the recommended P5 encoding called?" and "what do I do if I don't have sample code?" and such. Via Roma, it also produces SCHEMATA to shape and validate the structure and completeness of their contributions.

This is all work in progress and the epidoc.rnc is not yet functional; hence these two validation paths are imaginary. As an interim measure until we have this schema in place, we have created another ODD and schema channel for validating the examples in the EpiMetaDoc ODD. The examples in the actual guidelines will be in the EpiDoc namespace proper and therefore validate to the epidoc.rnc.

In summary:

We have a constellation of tools
- The TEI customization
- Schemata
- Guidelines and documentation in XML
- Software tools and configuration data for them.
Our own customizations for the standard TEI customization process
- So that we can allow community authoring of guidelines and ODD
- The Guidelines and ODD can be used both for schemata and documentation generation, and
- As a Single source of and automated verification system for tool configuration for Chet-C and the stylesheets etc.

GuidelinesOdd

Title: EpiDoc Guidelines TEI P5 ODD

Summary

Examples

Dependencies and rules

Questions

Acceptance Tests

Full report on EpiDoc ODD (as delivered by ZanetaAu in Paris, June 2006)

Hypothetical EpiDoc Use Case

The EpiDoc P5 schemata and relationships

In summary:

GuidelinesOdd

Page Tools

Insert links

Comments (0)

Join this workspace

Navigator

SideBar

Recent Activity