Best Practices for TEI in Libraries

A guide for mass digitization, automated workflows, and promotion of interoperability with XML using the TEI

Editors: Kevin Hawkins, Michelle Dalmau, Elli Mylonas, and Syd Bauman

Creation of ODDs: Syd Bauman

Version: 4.0.0 (published September 2018)

Project information

Table of contents

1. Introduction

This document is the fourth major revision of a document formerly known as TEI Text Encoding in Libraries: Guidelines for Best Encoding Practices, which has been updated to comply with the Text Encoding Initiative’s Guidelines for Text Encoding and Interchange (P5). These Best Practices were originally created for use in large, library-based digitization projects but are useful as a way of approaching digitization and encoding as a whole.

There are many different library text digitization projects, serving a variety of purposes. With this in mind, these Best Practices are meant to be as inclusive as possible by specifying five encoding levels. These levels are meant to allow for a range of practice, from wholly automated text creation and encoding, to encoding that requires expert content knowledge, analysis, and editing. The encoding levels are not strictly cumulative: while higher levels tend to build upon lower levels by including more elements, higher levels are not supersets because some elements used at lower levels are not used at higher levels—often because more specific elements replace generic elements.

In brief, the encoding levels (with approximate examples) are as follows:

Level Description Example of encoding of Alger Hiss document Display example
Level 1 The text is generated through OCR, is subordinate to the page image, and is not intended to stand alone as an electronic text (without page images). Alger Hiss document example
Level 2 The text is generated through OCR and is mainly subordinate to the page image, though navigational markers (suchg as textual divisions and headings) are captured. Alger Hiss document example
Level 3 The text is created by conversion, either by way of OCR or keyboarding. Some structural elements of the text are encoded. The text may be used with or without page images. Alger Hiss document example
Level 4 The text is generated either through corrected OCR or keyboarding and is able to stand alone without page images in order for them to be read by students, scholars, and general readers. Alger Hiss document example
Level 5 The text is generated either through corrected OCR or keyboarding and is able to stand alone without page images, as in Level 4. In addition, the text includes semantic, linguistic, prosodic, or other types of tagging created through substantial human intervention by encoders with subject knowledge. (none) example

In these Best Practices, use of elements and attributes tends toward explicitness for ease of processing even though a human or possibly machine reader might be able to make inferences based on context. Only those elements and attributes mentioned below are recommended for use in encoding based on these Best Practices; use of other TEI elements and attributes is not recommended. Consult the full TEI Guidelines (P5) for guidance on use of elements and attributes beyond what is described below.

As guidelines rather than a specification, these Best Practices use should or recommended instead of must to explain conformance to the practices described here; the only exception is a practice required by P5, indicated by required. Optional practices are indicated by may or optional. However, to encourage conformance to these Best Practices, the ODD files will generate schemas that require use of recommended elements (those indicated by should or recommended).

These Best Practices specify a recommended archival storage format. Local system needs may require transformation of documents in this archival format to another XML format (another TEI customization or another type of XML) for use by a local indexing or delivery software.

2. Relation to other TEI customizations

2.1. TEI Tite

The TEI Tite customization of P5 was developed as a subset of the TEI to be used as a vendor specification for outsourced encoding of the type often initiated by libraries, archives and other cultural heritage organizations. These Best Practices were created to support in-house encoding that adheres as closely as possible to common TEI practice and library standards yet still leaves room for variation in local practice.

If a library uses TEI Tite for outsourced encoding, it should find that converting files from the TEI Tite format to a format conforming to these Best Practices is not difficult. TEI Tite files may be converted to Best Practices Level 3 with some loss of granularity; or to Level 4 with some additional markup with minimal human intervention. (The reason Level 3 does not contain as many elements as TEI Tite is to allow for use of this encoding level, whether for encoding of born-digital source documents or for upgrading Level 1 or Level 2 texts, with less human intervention than would be required by TEI Tite.)

These Best Practices are meant to complement the TEI Tite customization of P5. Whereas TEI Tite is meant for vendors who need exact specifications for encoding, these Best Practices document how a large-scale encoding project, whether based in a library or not, might create conformant TEI documents out of vendor-generated or locally-created TEI documents. TEI Tite lacks header metadata and elements for encoding textual structures of possible interest to libraries; however, once Tite documents are transformed to a TEI-conformant encoding used by an institution, these Best Practices can serve as a point of reference for developing the TEI header and applying richer markup as reflected in Level 4 or 5 of these Best Practices.

For a comparison of the TEI Tite schema to these Best Practices, see TEI Tite's Appendix A.

2.2. TEI simplePrint

TEI simplePrint is a customization of P5 designed to be an entry point for a wide variety of encoding projects, especially those dealing with Western European early modern printed material. An implicit aim of the creators of this customization, as with these Best Practices, was to prescribe a single way to encode common features in source documents. The features to be encoded are more granular than those in Levels 1 through 4 of these Best Practices; therefore, TEI simplePrint is discussed below with Level 5. Note that unlike these Best Practices, TEI simplePrint specifies a processing model for each element, recommending how they should be displayed or otherwise processed.

3. General Recommendations

3.1. Standards and Local Practice

The goal of the TEI is interchange, not interoperability. While seamless interoperability of texts created for different purposes is an elusive goal, use of a common markup vocabulary and syntax greatly aids interchange. Nevertheless, keep in mind that others—even within your organization—may use your texts in the future for other uses than you intended in your encoding.

An encoding project should strive for internal consistency and for use of standards so that the data can be modified or enhanced in the future with ease. In cases where local practice deviates from standards, there should at least be internal consistency in the local practice.

3.2. Character Encoding

For the purpose of interchange, TEI documents should be saved in UTF-8 format, without beginning with a byte-order mark (U+FEFF). Furthermore, while TEI allows for capturing of glyph variants not distinguished by Unicode (see chapter 5 of P5), the special elements for this purpose are not recommended in these Best Practices in order to facilitate interchange.

3.3. Transcription

When reformatting to digital media using any level of encoding, the electronic text should begin with the transcription of the first word on the first leaf of the original work. When using levels 1 to 3 of encoding, it may be impractical or undesirable to transcribe and encode certain features of the text, such as publisher’s advertisements or indexes. Any systematic omissions of material found in the original work should be noted in the <samplingDecl> in the TEI header. Non-systematic omissions should be indicated in the <notesStmt> of the <teiHeader>. At Level 4 and above, the transcription should be complete.

When encoding composite documents (an issue of a journal, a volume of conference papers, a book of poems, etc.) at Level 3 and above, consider creating separate TEI documents for each unit (journal article, conference paper, poem, etc.) so that more granular metadata may be attached to each unit in its own TEI header. A further step will be necessary in order to ensure that the structure of the composite document—how the units relate to each other—is adequately represented. A common way to accomplish this is to create a separate TEI document for the composite document that uses XInclude to include the relevant portions of the TEI documents for the units in the correct order.

3.3.1. Punctuation

While the TEI gives the option of removing from the transcribed text punctuation marks, such as quotation marks or end-of-life hyphones, that could be encoded using XML tags, it is recommended to leave these as XML character data in order to simplify rendering of the encoded text as it appears in the source document. For example:
<p>When she said <said>“Nobody uses the term  <mentioned>‘electronic text’</mentioned> anymore”</said> he nearly passed out!</p>
Use the <quotation> element in the header to record whether quotation marks have been removed, and use the <punctuation> element in the header to record whether other punctuation has been removed.
When removing punctuation marks whose rendering in the source document is worth preserving in the encoding, use the style attribute on the appropriate element to describe the punctuation marks that were removed. For example:
<p>When she said <said style="quotes: '“' '”'">Nobody uses the term  <mentioned rend="quotes: '‘' '’'">electronic text</mentioned> anymore</said> he nearly passed out!</p>

3.3.2. Hyphenation

Encoding of end-of-line, end-of-column, and end-of-page hyphenation varies considerably in the TEI community. Some capture all hyphens found on the printed page, while others remove those in the middle of words not normally hyphenated for easier implementation of full-text retrieval. If preserving hyphens, some will capture all hyphens using the same character, while others will distinguish hyphens that must be present in any case (often called hard hyphens) and those that are only present by virtue of being at the end of a line, column, or page (often called soft hyphens).

This issue is complicated by the fact that Unicode prescribes use of a soft hyphen not for a visible hyphen that might have been absent but instead for a place where a hyphen might occur. Furthermore, it includes a non-breaking hyphen, used in cases like ‘re-creation’ (meaning to create again, as opposed to ‘recreation’, meaning relaxation), in addition to a regular hyphen, which would normally count as a word boundary. In short, Unicode is oriented toward electronic text that may be processed with a computer in various ways, not toward capturing source documents.

Since OCR software relies on dictionaries to determine the probability not simply of characters but of whole words, it is often able to capture hyphenation in different ways, per the needs of a specific project.

At Levels 1 and 2, do not attempt to disambiguate different uses of hyphens. Encode all hyphens appearing in the source document using character U+2010 (HYPHEN) if possible; alternatively, use the semantically ambiguous U+002D (HYPHEN-MINUS).

At Level 3, if <lb>, <cb> and <pb> are used, optionally distinguish uses of the hyphen with <pc> and force. At Level 4, if <lb>, <cb> and <pb> are used, the use of <pc> and force on these elements is required when all original hyphenation and line breaks are being retained. This is important when encoding a text that may be republished in different formats causing line breaks and hyphenation to differ from the original.

Sample hyphenation markup follows, using the example of a word broken at the end of a line. (If such hyphens occur at a column or page break instead of a line break, use <cb> or <pb> instead.)

Colloquial name Appearance in source document Encoding Note
Hard hyphen
This is not a run-
on sentence. 
This is not a run<pc force="weak">-</pc><lb break="no"/>on sentence. The use of weak as the value of force indicates that the encoder considers "run-on" to be a single orthographic token, where the hyphen is not a word separator. The use of no as the value of break also indicates that the line break occurs inside an orthographic token (single word) which is broken across a line.
Hard hyphen
This is not a run-
on sentence.
        
This is not a run<pc force="strong">-</pc><lb break="yes"/>on sentence. The use of strong as the value of force indicates that the encoder considers "run-on" to be two orthographic tokens, where the hyphen is also a word separator. The use of yes as the value of break indicates that the line break occurs between two words.
Soft hyphen
UTF-8 is a char-
acter encoding for Unicode.
        
UTF-8 is a char<pc force="strong">-</pc><lb break="yes"/>acter encoding for Unicode. As in the first example, the use of weak as the value of force indicates that the encoder considers "character" to be a single orthographic token where the hyphen is only indicating that the word is broken across a line. The use of no as the value of break also indicates that the line break occurs inside an orthographic token (single word) which is broken across a line.
Unclear case
Some people say TEI is a mark-
up language.
        
Some people say TEI is a mark<pc force="inter">-</pc><lb break="maybe"/>up language. The use of inter as the value of force indicates that the encoder is not taking a position on whether "mark-up" is a single orthographic token or two words. The use of maybe as the value of break is also inconclusive on whether the line break occurs in between words or inside a word.

Do not confuse the following characters with hyphens:

  • en dash (U+2013)
  • em dash (U+2014)
  • minus sign (U+2212)

3.4. Filenames

A filename scheme that is internally consistent should be established for any encoding project.

Consider the following best practices when determining the file name scheme for your project:

3.5. URIs

A number of attributes take a URI (Uniform Resource Identifier) as their value. Note that in addition to the full form of reference defined by URI syntax, these attributes can take a relative reference (e.g., filename.ext) or a fragment identifier (e.g., #foo).

3.6. Textual Divisions

At Levels 2 and higher, when <text> is used, an encoding project should use only numbered divisions (i.e., <div1>, <div2>, etc.) or unnumbered divisions (i.e., <div>) but not both. This applies both within a TEI document (i.e., within <front>, <body>, <back>, even if nested within <group> or <floatingText>), and across TEI documents in any given collection. Keep in mind that numbering of textual subdivisions starts over (at <div1>) within <floatingText> elements nested inside a subdivision, so any software that expects to process nested numbered divisions within a document will need to account for this.

The choice of numbered or unnumbered divisions should be documented with the <tagUsage> element in the header. See section 4.1.8, Element and Attribute Recommendations for the TEI Header, below.

Whether numbered or unnumbered divisions are used, the type attribute of the division element is not recommended at Level 1 (because only one encoded division in the text exists), is optional at Level 2 (because the division-level metadata need not classify these divisions), is recommended at Level 3 (for broad yet useful analysis of text divisions), and is strongly recommended at Levels 4 and 5 (for full analysis of the text structure). Recommended values for the type attribute are listed below. Other values may be used if these are not applicable. See section 3.9.1, on the type attribute, below for a discussion of adding customized values to the project ODD.

3.6.1. Front Matter

Attribute Value Description
abstract Usually appears in front matter.
acknowledgement Usually appears in front matter.
contents A table of contents. Usually appears in front matter.
dedication A dedication. Usually appears in front matter.
docAuthorization A statement indicating that the document's printing was officially authorized.
foreword Usually appears in front matter.
frontispiece A portrait or other image (usually of the author, usually full page) printed at the front of a document. Usually appears in front matter.
imprimatur A formal indication (usually on the title page or in the front matter) that the document has received official license to be printed.
preface A section of the front matter that does not carry a more specific designation. Usually appears in front matter.

3.6.2. Body

Attribute Value Description
book A major structural component of a long work, identified explicitly in the text as a “book”
chapter A chapter, typically in a prose document.
part A major component of a work, containing further subdivisions.
section A generic section of a larger work.
subsection A generic subdivision of a section.
volume A single printed volume in a multi-volume work.
drama A dramatic text.
dramaPart A portion of a drama other than a prologue, act, scene, or epilogue.
act An act in a dramatic text.
scene A scene in a dramatic text.
castlist A list of characters in a dramatic text.
poem A poem.
poemGroup A group of two or more poems under a common heading.
argument A short passage at the start of a document or section giving a prose description of its contents.
corrigenda A section describing corrections to be made to the document.
entry An entry in a document that is organized as a log or diary with dated entries.
epigraph A short quotation at the start of a document or section, often accompanied by an attribution.
epilogue A short concluding section, usually of a dramatic or fictional work.
prologue An opening section of a literary work (typically drama or poetry).
narrative An embedded narrative.
nonfictionProse A text intended to be nonfiction consisting primarily of prose.
novelPart A portion of a novel.
advert An advertisement for a printed work or other product.
calendar A formal calendar; a document or document section identifying itself as a calendar.
essay A short prose non-fiction document.
examination A section that is identified explicitly as an ‘examination’ in a text that uses this as a main structural division.
letter Any document in epistolary form, i.e. addressed by a sender to a recipient.
prayer A prayer.
recipe A recipe in a cookbook.
speech A section described as a speech or public lecture, in a document that uses this as a main structural division.
timeline A timeline.
tract A short treatise in pamphlet form often on a religious subject.

3.6.3. Back Matter

Attribute Value Description
addendum Additional material, typically omissions, added at the end of a publication.
appendix Usually appears in back matter.
bibliography Usually appears in back matter.
colophon A short inscription, typically at the end of a book or manuscript, containing the title, printer, date and place of printing, etc. Usually appears in back matter.
concluding Any section of the back matter that does not carry a more specific designation; includes afterwords, epilogues, codas, etc.
endnotes A section containing endnotes for the document.
glossary Usually appears in back matter.
index An alphabetical listing of the topics in a document, usually with accompanying page references. Usually appears in back matter.
notes Usually appears in back matter.

3.7. Page Breaks

Page breaks should be encoded using the <pb> element, with the value of the n attribute denoting the number of the page whose content follows this element. The <pb> element should always be contained within a text division for ease of retrieval with indexing software. For example, a page break that occurs between chapters 2 and 3 should be encoded right after the opening tag of the text division that opens chapter 3 rather than before the closing tag of the division that ends chapter 2 or between the two text divisions. In the case of a page break occurring at the boundary of nested textual divisions, the <pb> element should always be encoded within the lowest level division. For example, a page break that occurs between chapters 2 and 3, where chapter 3 begins with an editor's note requiring its own text division, would be encoded right after the opening tag of the text division for the editor's note.

Any divergence from this practice should be documented in a <p> in the <editorialDecl>.

3.8. Linking Between Encoded Text and Images of Source Documents

There are three recommended mechanisms for linking between the encoded text and facsimile page images of source documents. Projects may use any of the following methods:

For those projects relying on METS, note that the xml:id attribute is used as a conceptual identifier for content as opposed to an explicit pointer to a specific representation of that content. (Conversely, the facs attribute is a pointer, not an identifier.) These identifiers are then used to generate a METS document that bundles the various content types (e.g., master image files, derivative image files for Web delivery, PDFs, etc.), explicitly lists all versions of the content, and defines the relationships between the constituent parts. This is achieved through the use of the <mets:fileSec> and <mets:structMap> sections of the METS document (see sample METS document for a TEI project).

The International Image Interoperability Framework (IIIF), with its growing user community, is likely to emerge as an alternative mechanism for linking between encoded text and images of source documents. The value of a facs attribute could be a IIIF Image API Request URI (e.g., to access a full-sized image). However, note that the TEI and IIIF communities have not yet agreed upon specific details of a mechanism for integrating data from these two communities of practice.

3.9. General Guidelines for Attribute Usage

These Best Practices provide recommended usage of attributes as used in the TEI header and within the body of the TEI document (within the <text> element), as evidenced by attributes used in encoding example snippets and the prose description of this document.

This section contains general advice on the use of particular attributes commonly needed for library encoding projects. (All of the attributes below are commonly used on various elements, but not every element requires or even allows these attributes.)

3.9.1. type

Constructing a list of acceptable attribute values for the type attribute for each element, on which everyone could agree, is impossible. Instead, it is recommended that projects describe the type attribute values used in their texts in the project ODD file and that this list be made available to people using the texts. It is worth noting that, at present, Roma, the web front-end editor for ODD files, does not have a mechanism for providing this documentation; instead, this information should be added to the ODD file manually. For a list of standard names and definitions of bibliographic features of printed books, see ABC for Book Collectors by John Carter (8th edition, New Castle, Del. and London: Oak Knoll Books and the British Library, 2004, available online at https://www.ilab.org/articles/john-carter-abc-book-collectors).

3.9.2. n

This attribute is sometimes used to number elements for machine processing, but it often includes data represented in the source document, such as page numbers or footnote numbers. Example: <pb n="456"/>

3.9.3. ref and tag URIs

The ref attribute is available on a variety of elements, including <persName>, <orgName> <author>, and <title>. Its value is a URI that identifies external metadata about the content of the element. Since Linked Data applications in libraries make many authority records accessible through URIs, this attribute can be used for disambiguation (authority control). For example:

<placeName ref="http://vocab.getty.edu/page/tgn/7012924">Indianapolis</placeName> <author>  <persName>Emily Dickinson</persName>  <idno type="VIAF">http://viaf.org/viaf/31995584</idno>  <idno type="ISNI">http://isni.org/isni/0000000121265882</idno>  <idno type="LCCN">http://id.loc.gov/authorities/names/n79054166</idno>  <idno type="Wikidata">https://www.wikidata.org/wiki/Q4441</idno> </author>
Still, there are times when an outside URI for disambiguation is not available. An encoding project could use URI fragment identifiers (see above) to point to an authority record (identified with xml:id) within the same TEI document. For example, in the transcription of the text, you can use
<placeName ref="#geo_0034">Texas</placeName>
which would be defined in a controlled vocabulary elsewhere in the same document:
<listPlace>  <place xml:id="geo_0034">   <placeName>    <settlement type="village">Texas</settlement>    <region type="state">Maryland</region>   </placeName>  </place> </listPlace>

XML software will then confirm when it encounters ref="#geo_0034" that xml:id="geo_0034" exists elsewhere in the document.

If there is a desire to have an authority file spanning documents, it might be tempting to use relative URIs (as discussed above), rather than an absolute URIs, in the value of ref in order to create a local authority file. However, such URIs cannot be resolved in a stable way as the location of the TEI document changes. As in section 3.5.1, Referring Strings, of P5, these Best Practices recommend the use of the ref attribute with tag URIs as described in IETF RFC 4151 in order to create unique local identifiers that can also be shared across files and projects. For example, a tag URI created in 2016 for nabokovpapers.org might look like this:
<persName ref="tag:nabokovpapers.org,2016:B_V001">Blavdak Vinomori</persName>
If desired, such tag URIs could be shortened by creating a private URI scheme as described in section 16.2.3 of P5.

3.9.4. style, rendition, and rend

At levels 3 and above, the style, rend, and rendition attributes may be used when it is desirable to record information about how a textual feature was displayed in the source document.

Never use these attributes on header elements since, in the header, metadata is transcribed and possibly regularized, as in a catalog record, but its exact appearance is not meant to be captured.

If a project is normalizing the rendering of text objects (for example, such that all titles should be italicized, regardless of how they appeared in the source document), there is no need to use these attributes; instead, a stylesheet will determine that all titles are displayed in italics.

However, if a project is faithfully recording the rendering in the source document, one of these attributes should be used to indicate this rendering, either on all elements to be rendered differently from the surrounding text or on all elements whose rendering does not follow the default stylesheet.

For the value of the style attribute, use only valid CSS properties and values as in a CSS declaration-block but without curly braces. For example:

<foreign style="font-style: italic">

<title style="text-decoration: underline; font-size: x-large">
Default descriptions of source document renditions can be established using the <rendition> element in the <teiHeader>. Specify the desired rendition (as CSS) in the content of the <rendition>, and to which elements it applies (using CSS selector syntax) in the selector attribute. For example:
<rendition selector="div[type='chap']>head"  scheme="css">font-style: italic; font-weight: bold</rendition> <rendition selector="div[type='sect']>head"  scheme="css">font-style: italic</rendition>
This example indicates that headings of chapters in the source document were in bold italics, whereas headings of sections were in italics, but not boldface.
Alternatively, use the rendition attribute to give an internal scheme:
<foreign rendition="#i">
documented with the <rendition> element in the header:
<rendition xml:id="ischeme="css">font-style: italic</rendition>
Use of the rendition attribute and element offers an additional level of indirection, decreasing the total number of keystrokes in the XML and therefore possibly reducing the chance of typos being introduced in the encoding.

If there are formatting features in the source that cannot be expressed with CSS syntax, the rend attribute may be used instead of style or rendition to supply a rendering using a locally defined style language.

3.9.5. xml:lang

This attribute is used to indicate the natural language of the content of an element. It is generally not found on or within the <sourceDoc> or <text> elements at Level 1 or Level 2, but is common at Level 3 and above. See the documentation of the teidata.language datatype in P5 for information on the values of xml:lang.

4. Structure of a TEI Document

Element Description
<TEI xml:id="___" xmlns="http://www.tei-c.org/ns/1.0"> The root element of a TEI document. Use of the xml:id attribute is recommended, giving the same unique identifier for the TEI document as in teiHeader/fileDesc/publicationStmt/idno.
<teiHeader xml:lang="___"> [required] Contains metadata about the TEI document. The xml:lang is recommended; it indicates the language used for the metadata describing the document.
<sourceDoc> [recommended at Level 1, optional at Levels 2–5] Contains a direct, generally machine-generated, transcription of document pages without further logical or structural information.
<facsimile> [optional] Defines sets of images that correspond with the text. Should only be used if page images are included and if this particular mechanism for linking page images is chosen. See section 3.8, Linking Between Encoded Text and Images of Source Documents.
<text xml:lang="___"> [recommended at Levels 2–5] Contains the encoded transcription of the source document. The xml:lang attribute is recommended; it indicates the primary language of the source document.

The child elements of these top-level elements are described below.

4.1. The TEI Header

4.1.2. Introduction

The TEI header is a metadata record for an encoded text. It includes bibliographic information related to the electronic document and, if appropriate, the bibliographic data for the original analog source document from which the electronic edition was created. The TEI header often includes a description of the encoding decisions or practices used to create the electronic document. While TEI Lite calls the header ‘the electronic title page’, it actually more closely resembles a catalog record with additional data not routinely stored in MARC records.

As with any descriptive metadata, the metadata in the TEI header can serve multiple audiences. In the local context, a TEI header provides metadata about the TEI document, its source, and its provenance. The TEI header may be used to enable metadata exchange, to automatically create indexes (author lists, title lists) for a collection of TEI documents, and to aid in browsing heterogeneous TEI documents. TEI headers may also be used as a basis for other metadata records (such as MARC or Dublin Core), though generation of other formats may require human intervention because they often are more granular, or have different granularity, than TEI headers.

4.1.3. The TEI Header and MARC

While a TEI header is often perceived as similar to or at least related to a MARC bibliographic record, a TEI header does not typically have a one-to-one correspondence with a MARC record. For example, one MARC bibliographic record may be used to describe a collection of TEI documents with individual headers. Furthermore, while a MARC bibliographic record captures metadata about a bibliographic entity in a library’s collection, a TEI header records information both about an encoded text and about the source document for that encoded text.

Each institution and even each project may have a different approach to the way electronic texts are created in TEI and then represented in a larger public catalog through MARC bibliographic records. At one institution, the same unit (e.g., a cataloging department) may be responsible for creating both TEI headers and MARC bibliographic records, while at other institutions the work may be distributed among different units. Within the library domain, metadata or cataloging experts are usually required for at least review and standardization of both the TEI header and the MARC bibliographic record.

In order to allow automatic generation of TEI headers from MARC bibliographic records and MARC bibliographic records from TEI headers, according to these Best Practices, some elements should contain content not typical for TEI practice but necessary due to a lack of granularity in the MARC format.

4.1.4. The TEI Header and METS

The Metadata Encoding and Transmission Standard (METS) includes direct support for TEI Headers; that is, a complete <teiHeader> element can be included in a METS metadata record alongside MARC, MODS, and DC metadata. METS specifies an MDTYPE of TEIHDR to indicate a TEI Header. For an example of how this might be done, see this sample METS record for The “Sure to Rise” Cookery Book. We are unaware of any METS implementations which provide significant support for TEI headers out-of-the-box.

4.1.5. The TEI Header and FRBR

Functional Requirements for Bibliographic Records (FRBR)Note: https://www.ifla.org/publications/functional-requirements-for-bibliographic-records is a conceptual model developed by the International Federation of Library Associations (IFLA)Note: https://www.ifla.org/ ‘for relating the data that are stored in bibliographic records to the needs of the users of those records’ ((Functional Requirements for Bibliographic Records: Final Report, February 2009, p. 7, https://www.ifla.org/files/assets/cataloguing/frbr/frbr_2008.pdf)). It is best known for its model of four primary bibliographic entities: work, expression, manifestation, and item. The FRBR model also includes primary relationships between certain of these entities: a work may be realized through one or more expressions, an expression may be embodied in one or more manifestations, and a manifestation may be exemplified by one or more items. Since the publication of FRBR in 2008, the library metadata community has collaborated to develop more precise entity-relationship models relating to bibliographic metadata—notably FRBRooNote: http://www.cidoc-crm.org/frbroo/ and BIBFRAMENote: https://www.loc.gov/bibframe/, each with different configurations of bibliographic entities than the four primary entities in FRBR.

It is important to note that these Best Practices, like P5 as a whole, does not distinguish betwen work, expression, manifestation, and item according to their definitions in FRBR.

Due to sustained interest in the original FRBR model from within the TEI community, these Best Practices describe how an encoder might identify bibliographic entities relating to a TEI document and express relationships between them. For example, a TEI document might contain the text of a particular edition of a rare book (a FRBR manifestation), and the header could reference all extant copies (FRBR items) of this book. These relationships and references to bibliographic entities are encoded using the <listRelation> element, an optional child of <sourceDesc>.

The following example expresses all three FRBR relationships between instances of the four primary FRBR entities:
<listRelation>  <relation ref="http://purl.org/vocab/frbr/core#realization"   active="http://d-nb.info/gnd/4128140-8passive="http://d-nb.info/gnd/4217850-2"/>  <relation ref="http://purl.org/vocab/frbr/core#embodiment"   active="http://d-nb.info/gnd/4217850-2passive="http://d-nb.info/1022142836"/>  <relation ref="http://purl.org/vocab/frbr/core#exemplar"   active="http://d-nb.info/1022142836"   passive="http://lobid.org/items/HT017412936:DE-361:NA088.88-4442721#!"/> </listRelation>

Note that a TEI document itself, like any other XML document, occupies an ambiguous territory between a FRBR expression and a FRBR manifestation.Note: Renear, Allen, Christopher Phillippe, Pat Lawton, and David Dubin. 2003. An XML document corresponds to which FRBR Group 1 entity?. Proceedings of Extreme Markup Languages 2003. http://conferences.idealliance.org/extreme/html/2003/Lawton01/EML2003Lawton01.html.

4.1.6. The TEI Header and Other Metadata Schemas

Several other descriptive metadata schemas are prevalent within the library domain, including Dublin Core (DC), Dublin Core Qualified (DCQ), and the Metadata Object Description Schema (MODS). Each of these schemas contains elements that capture the same data as many of the elements in the TEI header. As with MARC, a variety of automated or manual workflows can be implemented to crosswalk metadata from one standard to another and provide for increased sharing of metadata about electronic texts in larger contexts. In particular, DC and MODS are common schemas used by the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) and may be particularly valuable for sharing metadata across institutions.

Unfortunately, there is currently no mechanism for specifying that the content of an element should be drawn from an outside metadata source or that this outside metadata source should supplement the content of the element. In the absence of such mechanisms, users of these Best Practices may use the <idno> element to supply identifiers for outside metadata records and may supply identifiers for certain authority records using the ref attribute, allowed on certain elements.

The header's <xenoData> element may be used to embed non-TEI metadata in a TEI document.

4.1.7. Determining Data Values for the TEI Header

Within the library domain, there are several authoritative publications on how to create bibliographic and descriptive metadata for objects. These are usually called “descriptive cataloging standards”; two prominent examples are the International Standard Bibliographic Description (ISBD) and Resource Description and Access (RDA), an implementation of ISBD that has replaced the Anglo-American Cataloguing Rules, Second Edition (AACR2) and other national cataloging codes. These standards are extensive and outline a set of rules that enforce consistency across a voluminous amount of metadata.

It is recommended that metadata about the source document included in the header be taken from the catalog record for the source document. However, there may be cases when this information is incomplete or insufficient. The following sources of information are recommended in creating the TEI header:

  1. For an electronic document with a digitized title page and title page verso, the chief source of information is the information coded as the title page and title page verso.
  2. If there is no digitized title page but the header creator knows the physical source document from which it was derived, the header creator should refer to that source document for metadata creation.
  3. If no title page is present and there is no evidence from a source document, the header creator may assign a title and author, if appropriate, enclosing the information in square brackets.

4.1.8. Element and Attribute Recommendations for the TEI Header

Below is documentation on use of elements and attributes within the <teiHeader> element. These recommendations apply to all levels of encoding. The mapping to MARC for cataloging the TEI document assumes that the header is created based on the encoded text, not the source document. Note that the terms “work” and “item” are not used in the narrow way that they are defined in RDA.

Element Description Equivalent in MARC when cataloging the TEI document according to RDA April 2017 update Equivalent in MARC for the source document (whether cataloged according to RDA or AACR2)
<teiHeader xml:lang="___"> [required] The <teiHeader> contains metadata about the TEI document. The xml:lang attribute is recommended; it indicates the language used for the metadata describing the document. 040 $b n/a
<fileDesc> [required] The <fileDesc> contains bibliographic metadata about the TEI document. One of its child elements, <sourceDesc>, describes the source document from which the TEI document was created. n/a n/a
<titleStmt> [required] n/a n/a
<title type="___">

[required] One or more <title> elements are required to give the title of the TEI document being created. It is suggested that titles be constructed based on the source document according to the descriptive cataloging standard used.

Use of the level attribute is not recommended since it does not apply to a TEI document in a collection.

Use of the type attribute is recommended. It should have one of the following values as suitable in local practice:

  • main
  • sub
  • alt
  • short
  • desc
  • translated
  • marc245a (used for the title proper and alternative title according to the descriptive cataloging standard used)
  • filing (used for a version of the title with initial articles removed, to be used for sorting titles alphabetically but not for display)
  • marc245b (used for the remainder of the title information — parallel titles, titles subsequent to the first, and other title information — according to the descriptive cataloging standard used)
  • preferred (used for a preferred title according to the descriptive cataloging standard used)
  • 130
  • 210
  • 240
  • 242
  • 245 $a,$b
  • 246
  • 247
  • 130
  • 210
  • 240
  • 242
  • 245 $a,$b
  • 246
  • 247
<author> [recommended] One or more <author> elements (one name per element) are used to encode the names of entities primarily responsible for the content of the TEI document—usually, the author(s) of the source document. Use <persName> or <orgName> when applicable. Whenever possible, establish or use the authorized form of the name from a name authority file. Examples:
  • <author><persName>Shakespeare, William, 1564-1616</persName></author>
  • <author><orgName>National Organization for Women</orgName></author>
  • <author><orgName>Agreda (Family: Logroño, Spain)</orgName></author>
  • <author><persName>X, Malcolm</persName></author>
  • <author><persName>Thomas (Anglo-Norman poet)</persName></author>
  • <author><persName>Catherine II, Empress of Russia</persName></author>
  • <author><persName>Joannes, Actuarius, 13th/14th cent.</persName></author>
Optionally use the ref attribute on <persName>gi> and <orgName> to supply a URI for a record in an authority file (see section 4.2.5.6.2).
  • 100
  • 110
  • 111
  • 700
  • 710
  • 711
For all, use the relation designator “author”.
  • 100
  • 110
  • 111
  • 700
  • 710
  • 711
<editor>

[recommended] If applicable, use one or more <editor> elements (one name per element) to encode the names of entities besides those in <author> elements that acted as editors of the TEI document—usually, the editor(s) of the source document. If considered appropriate by the encoding project, the editor of the TEI document should be entered here. Use <persName> or <orgName> when applicable. Whenever possible, establish or use the authorized form of the name from a name authority file.

Unlike in P5, do not use this element for translators, illustrators, compilers, or other roles not generally considered an editor. Therefore, do not use the role attribute.

  • 700
  • 710
  • 711
For all, use the relationship designator “editor”.
  • 700
  • 710
  • 711
<respStmt> [recommended] Record the names of other persons or organizations, one responsibility or party per <respStmt>, that have responsibility for the intellectual or artistic content of the TEI document—often by transitivity from the source document—not covered by <author> and <editor>. This includes translators, illustrators, compilers, proofreaders, encoders, and those who wrote a preface or introduction. Each <respStmt> should contain either:
  • one <resp> followed by one or more of <persName> or <orgName>
  • one or more of <persName> or <orgName> followed by one <resp>
Whenever possible, establish or use the authorized form of the name from a name authority file.
  • 700
  • 710
  • 700
  • 710
<editionStmt> <p> [recommended] Include a statement about the edition of the TEI document produced, not the source document. 250 n/a
<publicationStmt> [required] Use the child elements below (rather than <p> for a prose description). n/a n/a
<publisher> [recommended] The publisher is the party responsible for making the file (the TEI document, not the source document) public. 264_1 $b n/a
<distributor> [recommended] The distributor is the party from whom copies of the file (the TEI document, not the source document) can be obtained. Often the same as <publisher>, in which case no <distributor> should be given. 264_2 $b n/a
<idno> [recommended] Any unique identifier for the TEI document as determined by the publisher of the TEI document. Optionally use a type attribute to indicate the type of identifier. 028 5_ n/a
<availability> [recommended] Provide a prose rights statement for the TEI document. If possible, provide a standard license, such as one from Creative Commons, using the <license> element. Otherwise, provide information on all applicable rights—rights in the original work, rights in page images of the source document, and rights in the encoded text—in a <p> element. 540 n/a
<date when="___"/> [recommended] Refers to the date of the first publication of this edition of the TEI document. The when attribute (see att.datable.w3c class) is used instead of free content to aid machine processing. (In this context, the <date> element should have no content.) 264_1 $c n/a
<seriesStmt> [optional] This element contains information about the electronic series being created. n/a n/a
<title level="s" type="___"> [recommended] This element contains the title of the series. Whenever possible, establish or use the authorized form of the title from a name authority file. Use of the type attribute is optional, but if it is used, it should follow the instructions for use of this element in P5.
  • 490
  • 8xx (optional)
n/a
<notesStmt> <note> [optional] Use one <note> element for each note about the TEI document that does not have an appropriate location elsewhere in the header. 5xx 5xx
<sourceDesc> [required] Use one <sourceDesc> per source document. Metadata for the source document may be automatically generated from a MARC record. n/a n/a
<biblStruct> [recommended] Use <biblStruct> with child elements arranged in the order below for ease of display according to ISBD. (This element is used instead of <bibl> to enforce structure, but <biblFull> is not used because it requires more elements than are typically available in library metadata sources.) n/a n/a
<analytic> [recommended] Use this element and its children only if the object of encoding is part of a larger work—for example, an article in a journal issue, a chapter in a book, or a poem in a collection. Below the object of encoding that is part of a larger work is referred to as the analytic item. n/a n/a
<author> [recommended] One or more <author> elements (one name per element) are used to encode the name for the personal author or corporate body responsible for the creation of the intellectual or artistic content of the analytic item. Use <persName> or <orgName> when applicable. Whenever possible, establish or use the authorized form of the name from a name authority file. n/a n/a
<title level="a" type="___">

[recommended] At least one <title> element is required for the title of the analytic item. Transcribe the title according to the the descriptive cataloging standard used.

Use of the type attribute is recommended. It should have one of the following values as suitable in local practice:

  • main
  • sub
  • alt
  • short
  • desc
  • translated
  • filing (used for a version of the title with initial articles removed, to be used for sorting titles alphabetically but not for display)
n/a n/a
<ptr target="___"> [optional] In the value of target, provide a URI for analytic item. For example, this could be link to a facsimile of the analytic item that is different from the TEI document. n/a n/a
<monogr> [required] Use this element to group together the elements describing the source document bibliographic item that is published as an independent item. The TEI definition of this element specifies that it is used even for works that might not otherwise be considered “monographs,” so, for example, bibliographic data about a journal title is included in this element. n/a n/a
<author> [recommended] One or more <author> elements (one name per element) are used to encode the name for the personal author or corporate body responsible for the creation of the intellectual or artistic content of the source document bibliographic item, even if this creator is not the main entry in the catalog record. Use <persName> or <orgName> when applicable. Whenever possible, establish or use the authorized form of the name from a name authority file. 534 $a = 1st author
  • 100
  • 110
  • 700
  • 710
<title level="___" type="___">

[recommended] At least one <title> element is recommended for the title of the source document bibliographic item. Transcribe the title according to the descriptive cataloging standard used.

Use of the level attribute is optional. If used, it should be used as in P5.

Use of the type attribute is recommended. It should have one of the following values as suitable in local practice:

  • marc245a (used for the title proper and alternative title according to the descriptive cataloging standard used)
  • filing (used for a version of the title with initial articles removed, to be used for sorting titles alphabetically but not for display)
  • marc245b (used for the the remainder of the title information — parallel titles, titles subsequent to the first, and other title information — according to the descriptive cataloging standard used)
  • marc245c (used for the statement of responsibility according to the descriptive cataloging standard used)
  • preferred (used for a preferred title according to the descriptive cataloging standard used)
534 $t
  • 240
  • 245 $a,$b
  • 246
<respStmt>

[recommended] Statement of responsibility on the source document bibliographic item, according to the descriptive cataloging standard used. Record one responsibility or party per <respStmt>. Each <respStmt> should contain either:

  • one <resp> followed by one or more of <persName> or <orgName>
  • one or more of <persName> or <orgName> followed by one <resp>

Whenever possible, establish or use the form of the authorized name from a name authority file.

If generating the <sourceDesc> from a MARC record, it will be difficult to split the content of the 245c field into <resp> and <persName> (or <orgName>) elements, so it is recommended to use <title type="marc245c"> instead of this element.

500 n/a
<edition> [recommended] Edition statement (if present) according to the descriptive cataloging standard used. 534 $b 250
<imprint> [required] n/a n/a
<pubPlace> [recommended] Place of publication from the source document bibliographic item according to the descriptive cataloging standard used. Optionally remove ISBD punctuation for separating areas of the bibliographic description (such as a colon) when deriving from a MARC record. However, use brackets to indicate supplied information according to the descriptive cataloging standard used, e.g. “[Toronto?]” or “[Place of publication not identified]”. 534 $c
  • 260 $a
  • 264_1 $a
<publisher> [recommended] Name of publisher, distributor, etc. from the source document bibliographic item according to the descriptive cataloging standard used. Optionally remove ISBD punctuation for separating areas of the bibliographic description (such as a comma) when deriving from a MARC record. However, leave brackets that indicate supplied information according to the descriptive cataloging standard used, e.g. “[publisher not identified]”. 534 $c
  • 260 $b
  • 264_1 $b
<date when="___">
or
<date notBefore="___" notAfter="___">
or
<date from="___">
or
<date to="___">
or
<date from="___" to="___">

[recommended] Date of publication, distribution, etc. from the source document bibliographic item. The content of the element is the statement of this data according to the descriptive cataloging standard used.

Since the content of the element according to the descriptive cataloging standard used is not easily processed by machine, when possible include the following attribute(s) with valid values: either when, or both notBefore and notAfter, or one or both of from and to.

Descriptive cataloging standards may distinguish between a possible range of dates for publication (such as “[between 1860 and 1868?]”. In the case of uncertainty, use cert="low".

If the date is unknown (for example, recorded according to the descriptive cataloging standard used, e.g., “[date of publication not identified]" or "[date of distribution not identified]”, use cert="unknown".

  • 534 $c (content of element)
  • Dates fixed fields (value of attribute(s))
  • 260 $c
  • 264_1 $c
  • 264_2 $c
<extent> [recommended] Describes the extent of the source document bibliographic item. If the data is generated by hand, it should include a comprehensible statement of the size of the item, such as the number of pages or leaves. If generated from a catalog record, there should be two <extent> elements: one for the extent of the item (e.g., number of pages) and other physical details, and a second one for the dimension(s). Both should be recorded according to descriptive cataloging standard used. 534 $e 300
<note> [optional] Use for notes about the source document bibliographic item, given according to the descriptive cataloging standard used. 534 $n 5xx
<idno> [optional] Use one or more <idno> elements to give identifiers for the source document, text, or work of the bibliographic item, whether assigned by the holding library (such as a call number), the publisher of the original document (such as an ISBN), or a standard bibliography (such as an identifier from the Short Title Catalogue or Books in Maori). Use the following values for the type attribute if applicable, and create other values if appropriate:
  • LCCN (for a Library of Congress control number)
  • LC_call_number
  • Those listed in P5
534 $z for ISBN
  • 010
  • 015
  • 016
  • 020
  • 024
  • 025
  • 027
  • 028
  • 035
  • 050-099
etc.
<ptr target="___"> [optional] In the value of target, provide a URI for the source document. For example, this could be link to a facsimile of the source document. 856 $u when 2nd indicator = 2 and $3 = “Source” n/a
<series> [recommended] If applicable, give information about the series to which the source document bibliographic item belongs, given according to the descriptive cataloging standard used.
<title level="s"> [recommended] Contains the title of the series. Whenever possible, establish or use the authorized form of the title from a name authority file. Use of the type attribute is optional, but if it is used, it should follow the instructions for use of this element in P5. 534 $f
  • 4xx
  • 8xx
<biblScope unit="volume"> [recommended] If applicable, use for volume numbering of the source document within the series. 534 $f
  • 4xx
  • 8xx
<idno type="ISSN"> [recommended] If applicable, use for an ISSN for the series.
<ptr target="___"> [optional] In the value of target, provide a URI for a catalog record for the source document.
<relatedItem> <biblStruct> [recommended] If applicable, describe a work related to the source document, following the guidelines for the components of the <biblStruct> that is a child of <sourceDesc>.
  • 700 $t
  • 710 $t
  • 711 $t
  • 730
  • 740
  • 700 $t
  • 710 $t
  • 711 $t
  • 730
  • 740
<listRelation> [optional] Use <listRelation> with <relation> child elements to express any FRBR entities and relationships relating to the TEI document (see section 4.1.5, The TEI Header and FRBR, above). n/a n/a
<encodingDesc> [recommended] n/a n/a
<projectDesc> <p> [optional] Include a description of the purpose for which the electronic file was encoded. 500 n/a
<schemaRef url="___">

[recommended] Documents the encoding level according to these Best Practices. Use the url attribute for the value that indicates the encoding level:

  • Level 1: bptl:L1-v4.0.0
  • Level 2: bptl:L2-v4.0.0
  • Level 3: bptl:L3-v4.0.0
  • Level 4: bptl:L4-v4.0.0
  • Level 5: bptl:L5-v4.0.0
  • 856 $z, which should include boilerplate text describing how the TEI document is presented to the user (as page images, text, or both)
n/a
<editorialDecl>

[optional]

n/a n/a
<correction status="___" method="___"> [optional] A <correction> element may be used to describe what corrections, if any, have been made to the source text, and how they were made. n/a n/a
<hyphenation eol="___"> [optional] A <hyphenation> element may be used to describe how both soft and hard hyphens have been recorded. n/a n/a
<normalization method="___"> [optional] One or more <normalization> elements may be provided to explicitly declare the extent of normalization or regularization. n/a n/a
<punctuation marks="___" placement="___"> [optional] A <punctuation> element may be provided to explicitly declare whether punctuation marks have been retained or replaced by markup, and if retained where they have been placed with respect intra-paragraph elements. n/a n/a
<quotation> [optional] A specialization of the <punctuation> element, the <quotation> element may be provided to explicitly declare whether quotation marks have been retained as content or not. Optionally use the <marks> attribute. n/a n/a
<p> [optional] Include up to one <p> element for each of the following in the order given, addressing in prose:
  1. notes about omissions of material found in the original work
  2. the format of the data in the header: Does the data in the <sourceDesc> follow RDA rules? How about in the <fileDesc>? Is ISBD punctuation included?
  3. automated processes used to generate the markup or content
  4. external files or databases (such as those containing authority data) referenced in the TEI document
  5. whether line breaks, column breaks, page breaks, or a combination are encoded
  6. any other editorial decisions made during encoding
  7. an explanation of any tag URI schemes used in the TEI document
  • 008/18
  • 040 $e
  • 500
n/a
<tagsDecl> [recommended] n/a n/a
<rendition selector="___" scheme="css"> [recommended] Include one or more <rendition> elements to indicate the default renditions of various elements in the source document. Use the selector attribute to indicate to which elements this rendition should be applied. Alternatively, use an xml:id attribute and point to it from a rendition attribute. n/a n/a
<namespace name="http://www.tei-c.org/ns/1.0"> <tagUsage> [recommended] This element should have one and only one child <tagUsage> element for describing the encoding of logical divisions, which should be one of the following:
  • <tagUsage gi="div1">Numbered divs used.</tagUsage>
  • <tagUsage gi="div">Unnumbered divs used.</tagUsage>
  • <tagUsage gi="sourceDoc">Logical divisions not encoded.</tagUsage>
Note that the intent of the <tagUsage> is to explicitly express whether logical divisions and subdivisions are encoded using numbered or unnumbered <div> elements. Thus the third option (with sourceDoc as the value of gi) is only intended for use at level 1. At levels 2 and above <sourceDoc> is optional, and the <tagUsage> should describe the divisions used within the required <text>.
n/a n/a
<classDecl> <taxonomy xml:id="___"> <bibl> [optional] Use to document classification schemes and controlled vocabularies referenced by a scheme attribute elsewhere in the header or body of the TEI document. For example:
  • <taxonomy xml:id="LCC"><bibl>Library of Congress Classification</bibl></taxonomy>
  • <taxonomy xml:id="LCSH"><bibl>Library of Congress Subject Headings</bibl></taxonomy>
  • <taxonomy xml:id="AAT"><bibl>Art &amp; Architecture Theasaurus</bibl></taxonomy>
The xml:id attribute is required in order to provide an identifier to which scheme attributes in elsewhere in the header refer.
<samplingDecl> <p> [optional] Used to record a prose description of the rationale and methods used in selecting texts, or parts of text, for inclusion. 500 n/a
<appInfo> <app> [optional] Used by external processes to record information about themselves and the changes they have made to the file.
<listPrefixDef> <prefixDef ident="bptl" matchPattern="L([1-5])-v(\d+\.\d+\.\d+[aαβb]?)" replacementPattern="http://www.tei-c.org/SIG/Libraries/teiinlibraries/$2/"> [recommended] Defines the “bptl:” prefix used on the value of the url attribute of the <schemaRef> element. Additional <prefixDef> elements may be specified if other private URI schemes are to be used.
<profileDesc> [optional] n/a n/a
<langUsage> [optional] Use this element and child <language> elements to list languages used in the text. This supplements the xml:lang attribute on the <text> (which is outside the header) in cases where more than one language is used in the text. It is not expected that the <langUsage> element will contain any description of language usage. n/a n/a
<language ident="___"> [optional] Use one or more <language> elements to indicate language(s) used in the source document. Use of the ident attribute is required as in P5. Since the value of this attribute is usually sufficient to indicate the language, the <language> element should normally have no content. In the unusual case where ident is insufficient, provide additional information about the language as content of the element.
  • 008/35-37
  • 041
  • 546
  • 008/35-37
  • 041
  • 546
<textClass> [optional] n/a n/a
<classCode scheme="___"> [optional] True classification numbers as opposed to call numbers may be entered here. The value of the scheme attribute corresponds to a classification scheme defined previously in <classDecl>.
Example: scheme="#LCC"
050-099 050-099
<keywords scheme="___"> [optional] Repeat this element as many times as there are keyword schemes. The value of the scheme attribute is a URI for a controlled or uncontrolled vocabulary. The URI may be absolute to a version online or to one defined previously in <classDecl>.
Example: scheme="#LCSH"
6xx 2nd indicator or 6xx $2 when 2nd indicator = 7 6xx 2nd indicator or 6xx $2 when 2nd indicator = 7
<term> [optional] Use for terms from controlled or uncontrolled vocabularies as defined according to the containing <keywords> element. 6xx 6xx
<xenoData> [optional] Use as a wrapper for embedded non-TEI metadata. n/a n/a
<revisionDesc> <change when="YYYY-MM-DD" who="[URI]">

[optional] Create a <change> element to record each significant change to the TEI document, in reverse chronological order (i.e., most recent first). A prose description of the change is recorded as the content of each <change> element. This prose may contain lists for organization, and phrase-level markup (like <gi>, <ptr>, or <date>), but not paragraphs.

The date of the change should be recorded using the when attribute (see att.datable.w3c class).

The person who is responsible for making the change should be indicated by the who attribute of <change>. Its value is a URI that points to a <respStmt> or <person> that encodes information about the responsible party. Note that this reference is a URI, not an IDREF, and thus is typically not checked by validation software. Small projects sometimes take advantage of this by putting information into the URI itself, and not having a <respStmt> or <person> element. For example, the document might simply give who="#Jane_Smith", relying on human readers to understand this reference.

n/a n/a

4.1.9. Sample TEI Header

<teiHeader xml:lang="en">  <fileDesc>   <titleStmt>    <title type="main">Lincoln and Seward.</title>    <author>     <persName>Welles, Gideon, 1802-1878.</persName>    </author>   </titleStmt>   <publicationStmt>    <publisher>University of Michigan, Digital Library Initiatives</publisher>    <availability>     <p>These pages may be freely searched and displayed. Permission must          be received for subsequent distribution in print or electronically.          Please go to http://www.umdl.umich.edu/ for more information.</p>    </availability>    <date when="1996"/>   </publicationStmt>   <seriesStmt>    <title level="stype="main">Making of America</title>   </seriesStmt>   <sourceDesc>    <biblStruct>     <monogr>      <author>       <persName>Welles, Gideon, 1802-1878.</persName>      </author>      <title level="mtype="marc245a">Lincoln and Seward.</title>      <title level="mtype="marc245b">Remarks upon the memorial address of Chas. Francis            Adams, on the late William H. Seward, with incidents and comments illustrative of the            measures and policy of the administration of Abraham Lincoln. And views as to the            relative positions of the late President and secretary of state.</title>      <title type="marc245c">By Gideon Welles</title>      <imprint>       <pubPlace>New York</pubPlace>       <publisher>Sheldon &amp; company</publisher>       <date when="1874">1874</date>      </imprint>      <extent>viii, [7]-215 p</extent>      <extent>20 cm.</extent>      <note>First published in condensed form in the Galaxy, v. 16, 1873, p. [518]-530,            [687]-700, [793]-804.</note>      <idno type="ISBN">1-4255-1817-6</idno>      <idno type="LC_call_number">E456 .W44</idno>     </monogr>    </biblStruct>   </sourceDesc>  </fileDesc>  <encodingDesc>   <projectDesc>    <p>XML created for the Making of America collection.</p>   </projectDesc>   <schemaRef url="bptl:L1-v4.0.0"/>   <editorialDecl>    <p>Data in the <gi>sourceDesc</gi> of the header comes from a pre-AACR2 record. Other data        follows AACR2 when applicable.</p>    <p>The <gi>sourceDesc</gi> was created by exporting from the catalog on 2008-06-15.</p>    <p>This electronic text file was created by optical character recognition (OCR). No        corrections have been made to the OCR-ed text and no editing has been done to the content        of the original document. Encoding has been done using the recommendations for Level 1 of        the <title>Best Practices for TEI in Libraries</title>.</p>    <p>All hyphens and quotation marks have been retained.</p>   </editorialDecl>   <tagsDecl>    <namespace name="http://www.tei-c.org/ns/1.0">     <tagUsage gi="sourceDoc">Logical divisions not encoded.</tagUsage>    </namespace>   </tagsDecl>   <classDecl>    <taxonomy xml:id="LCC">     <bibl>Library of Congress Classification</bibl>    </taxonomy>    <taxonomy xml:id="LCSH">     <bibl>Library of Congress Subject Headings</bibl>    </taxonomy>   </classDecl>  </encodingDesc>  <profileDesc>   <langUsage>    <language ident="en"/>   </langUsage>   <textClass>    <classCode scheme="#LCC">E456</classCode>    <keywords scheme="#LCSH">     <list>      <item>Lincoln, Abraham, 1809-1865.</item>      <item>Seward, William Henry, 1801-1872.</item>      <item>Adams, Charles Francis, 1807-1886. Address of Charles Francis Adams ... on the life            ... of William H. Seward.</item>     </list>    </keywords>   </textClass>  </profileDesc>  <revisionDesc>   <change who="#CKPwhen="2005-05-25">Header generated from export of MARC record</change>  </revisionDesc> </teiHeader>

4.2. Encoding Levels

4.2.1. Caveats About Examples

In the examples given in the description of each encoding level below, XML comments are illustrative, and are not meant to be included in encoded documents. Here is an example of such a comment:

<!-- uncorrected OCR for first page image begins here -->

Note that for technical reasons the namespace is not shown in these examples, but it should always be supplied on the root <TEI> element, e.g.:

<TEI xmlns="http://www.tei-c.org/ns/1.0">

4.2.2. Level 1: Fully Automated Conversion and Encoding

4.2.2.2. Purpose

To create electronic text with the primary purpose of keyword searching and linking to page images. The primary advantage in using the TEI at this very strictly limited level of encoding is that a TEI header is attached to the text file.

4.2.2.3. Rationale

The text is subordinate to the page image, and is not intended to stand alone as an electronic text without accompanying page images. Level 1 texts are not intended to be adequate for textual analysis; they are more likely to be suited to the goals of a preservation unit or mass digitization initiative. These texts are meant to be a faithful representation of the appearance of the source document derived from OCR, providing the basis for subsequent encoding at Level 2 or higher of these Best Practices.

Level 1 is most suitable for projects with the following characteristics:
  • A large volume of material is to be made available online quickly.
  • A digital image of each page can be displayed to the user.
  • No manual intervention will be performed in the text creation process.
  • The material is of interest to a large community of users who wish to explore texts using keyword searching.
  • Sophisticated search and display capabilities based on the structure of the text are not necessary.
  • Extensibility is desired; that is, one desires to keep open the option for a higher level of encoding to be added at a later date.
4.2.2.4. Workflow

Texts at Level 1 can be created and encoded by fully automated means. Page images are scanned and processed using OCR, but the text is generally left uncorrected (“dirty OCR”) and the XML is generated from the OCR output. If desired, such automated output can be enhanced by tagging individual page elements to indicate key textual features, such as a title page, front matter, or the start of a new chapter.

4.2.2.5. Element Recommendations for Level 1
<sourceDoc> [recommended] There should be only one element following the <teiHeader>, a single <sourceDoc> containing a raw transcription of the text of the source document.
<surface> [recommended] Defines a written surface as a two-dimensional coordinate space. There should be one <surface> for each encoded page, whether it is represented via a facsimile image file, a transcription, or, most likely, a textual representation obtained from OCR.
<zone> [optional] A two-dimensional region. May be used to divide a <surface> or <line> into two dimensional regions, e.g. a column, a dropped initial capital, or a word of interest. A <zone> may itself be divided into <line>s or <zone>s.
<line> [recommended] Contains text appearing in a single physical line on the page.

See section 3.8, Linking Between Encoded Texts and Images of Source Documents.

4.2.2.6. Level 1 Examples
4.2.2.6.1. Level 1 Alger Hiss document
<TEI xml:id="someid1" xmlns="http://www.tei-c.org/ns/1.0">  <teiHeader xml:lang="en"> <!-- header goes here -->  </teiHeader>  <sourceDoc xml:lang="en">   <surface n="113facs="00000001.tif"> <!-- uncorrected OCR for first page image begins here -->   <line>POINT VIII.</line>    <line>BECAUSE OF UNLAWFUL SURVEILLANCE, PETITIONER'S</line>    <line>CONVICTION SHOULD BE VACATED; ALTERNATIVELY,</line>    <line>DISCOVERY AND A HEARING SHOULD BE ORDERED.</line>    <line>The nature and extent of surveillance of Hiss, his</line>    <line>family and associates was not known at the time of trial by</line>    <line>the defense. Even now, with the release of some of the govern‐</line>    <line>ment documents concerning FBI investigative techniques regarding</line>    <line>Hiss, the full extent of surveillance -- wiretapping, mail open‐</line>    <line>ings, mail covers, physical surveillance, and other intrusive</line>    <line>techniques -- is still not 'clear. Nevertheless, it is apparent</line>    <line>that information gathered through the exploitation of unlawful</line>    <line>wiretaps and other illegal surveillance was used at trial and</line>    <line>consequently the conviction must be reversed. Alternatively,</line>    <line>further discovery and a hearing is essential to a fair deter‐</line>    <line>mination regarding these issues.</line>    <line>FBI surveillance of Hiss began in earnest in 1941 with</line>    <line>the institution of a mail cover on his incoming correspondence</line>    <line>at his home in connection with an FBI investigation of possible</line>    <line>Hatch Act violations. CN Ex. 98A. Another mail cover was placed</line>    <line/>    <line/>    <line>-113 -</line>    <line/>    <line/>      <!-- uncorrected OCR for first page image ends here -->   </surface>   <surface n="114facs="00000002.tif"> <!-- uncorrected OCR for second page image begins here -->   <line>on the Hiss mail in 1945, and at the same time the FBI obtained</line>    <line>toll call records from the Hiss residence Telephone for the</line>    <line>years 1943 and 1944 as well. CN Ex. 99. In September, 1945,</line>    <line>the FBI intercepted telegrams to Hiss as well. CN Ex. 100.</line>    <line>In late November, 1945, FBI surveillance of the Hiss</line>    <line>residence in Washington, D.C., escalated. For the third time,</line>    <line>a mail cover was instituted beginning on November 28, 1945,</line>    <line>which was continued at least until 1946. CN Ex. 101 at p. 70;</line>    <line>CN Ex. 102. Continuous physical surveillance of Hiss was begun</line>    <line>as well. CN Ex. 101 at p. 72. Although this twenty-four-hour</line>    <line>surveillance was discontinued on December 14, 1945, physical</line>    <line>surveillance was conducted frequently at various times until</line>    <line>September, 1947. CN Ex. 102; CN Ex. 103.</line>    <line>The most intrusive invasion of petitioner's rights</line>    <line>68/ Also before 1947, a letter from Priscilla Hiss addressed</line>    <line>to her son, Timothy Hobson, was intercepted and its contents</line>    <line>read. CN Ex. 100A at p. 167. In approximately March, 1947,</line>    <line>a letter from a Michael Greenberg addressed to petitioner re‐</line>    <line>garding an application for employment with the United Nations</line>    <line>was also intercepted, in a manner not revealed by the docu‐</line>    <line>ments. CN Ex. 100B</line>    <line/>    <line/>    <line>-114 -</line>    <line/>    <line/>    <line/>      <!-- uncorrected OCR for second page image ends here -->   </surface>   <surface n="115facs="00000003.tif"> <!-- uncorrected OCR for third page image begins here -->   <line>occurred from December 13, 1945 until the Hisses moved from</line>    <line>Washington, D.C. to New York City on September 13, 1947. A</line>    <line>"technical surveillance," -- a wiretap -- was placed on the Hiss</line>    <line>telephone at their residence on P Street-in Washington, D.C.</line>    <line>The logs of this surveillance constitute twenty-nine volumes</line>    <line>of FBI serials and are roughly 2,500 pages in length, in which</line>    <line>an enormous amount of information concerning the Hisses' per‐</line>    <line>sonal lives, relationships with friends and associates, and</line>    <line>habits is recorded.</line>    <line>The wiretap was installed following FBI Director Hoover's</line>    <line>application to the Attorney General for authorization, although</line>    <line>no written authorization appears in the documents released to</line>    <line>Hiss. The purpose of the application was to gather information</line>    <line>regarding Hiss' alleged contacts with Soviet espionage agents and</line>    <line>communists in government service, general allegations which had</line>    <line>been made by Elizabeth Bentley and Chambers.</line>    <line>As one would expect, the interception of every telephone</line>    <line>h9/ Hoover's initial request was answered by a note reques‐</line>    <line>ting information on Hiss. CN Ex. 104. Additional information</line>    <line>was furnished by letter dated November 30, 1945. CN Ex. 105.</line>    <line/>    <line/>    <line>-115 -</line>    <line/>    <line/>      <!-- uncorrected OCR for third page image ends here -->   </surface>  </sourceDoc> </TEI>
4.2.2.6.2. Level 1 Text with OCR Coordinates
<TEI xml:id="someid1a" xmlns="http://www.tei-c.org/ns/1.0">  <teiHeader xml:lang="en"> <!-- header goes here -->  </teiHeader>  <sourceDoc xml:lang="en">   <surface n="113facs="00000001.tif"    ulx="0uly="0lrx="2550lry="3300"> <!-- uncorrected OCR for first page image begins here -->   <zone type="print_spaceulx="407"     uly="40lrx="2525lry="3219">     <zone type="text_blockulx="1255"      uly="386lrx="1605lry="426">      <line type="text_lineulx="1256"       uly="387lrx="1604lry="425">       <zone type="wordulx="1256uly="390"        lrx="1411lry="425">POINT</zone>       <zone type="wordulx="1452uly="387"        lrx="1579lry="424">VIII</zone>       <zone type="wordulx="1592uly="408"        lrx="1604lry="418">.</zone>      </line>     </zone>     <zone type="text_blockulx="744"      uly="481lrx="2222lry="534">      <line type="text_lineulx="745"       uly="482lrx="2221lry="533">       <zone type="wordulx="745uly="498"        lrx="970lry="533">BECAUSE</zone>       <zone type="wordulx="1006uly="498"        lrx="1064lry="529">OF</zone>       <zone type="wordulx="1102uly="494"        lrx="1353lry="528">UNLAWFUL</zone>       <zone type="wordulx="1392uly="491"        lrx="1798lry="527">SURVEILLANCE,</zone>       <zone type="wordulx="1847uly="482"        lrx="2221lry="521">PETITIONER'S</zone>      </line>     </zone>     <zone type="text_blockulx="746"      uly="546lrx="2155lry="593">      <line type="text_lineulx="747"       uly="547lrx="2154lry="592">       <zone type="wordulx="748uly="556"        lrx="1068lry="592">CONVICTION</zone>       <zone type="wordulx="1103uly="555"        lrx="1291lry="588">SHOULD</zone>       <zone type="wordulx="1326uly="553"        lrx="1389lry="585">BE</zone>       <zone type="wordulx="1421uly="551"        lrx="1673lry="588">VACATED?</zone>       <zone type="wordulx="1712uly="547"        lrx="2154lry="584">ALTERNATIVELY,</zone>      </line>     </zone>     <zone type="text_blockulx="746"      uly="607lrx="2091lry="652">      <line type="text_lineulx="747"       uly="608lrx="2090lry="651">       <zone type="wordulx="747uly="617"        lrx="1037lry="651">DISCOVERY</zone>       <zone type="wordulx="1069uly="616"        lrx="1163lry="648">AND</zone>       <zone type="wordulx="1197uly="615"        lrx="1228lry="647">A</zone>       <zone type="wordulx="1262uly="612"        lrx="1488lry="647">HEARING</zone>       <zone type="wordulx="1522uly="610"        lrx="1712lry="643">SHOULD</zone>       <zone type="wordulx="1750uly="610"        lrx="1810lry="641">BE</zone>       <zone type="wordulx="1847uly="608"        lrx="2090lry="641">ORDERED.</zone>      </line>     </zone>     <zone type="text_blockulx="414"      uly="802lrx="2482lry="2190">      <line type="text_lineulx="741"       uly="803lrx="2345lry="848">       <zone type="wordulx="741uly="808"        lrx="833lry="844">The</zone>       <zone type="wordulx="870uly="809"        lrx="1059lry="843">nature</zone>       <zone type="wordulx="1099uly="807"        lrx="1191lry="842">and</zone>       <zone type="wordulx="1225uly="808"        lrx="1415lry="841">extent</zone>       <zone type="wordulx="1453uly="805"        lrx="1511lry="841">of</zone>       <zone type="wordulx="1551uly="804"        lrx="1930lry="840">surveillance</zone>       <zone type="wordulx="1968uly="805"        lrx="2027lry="839">of</zone>       <zone type="wordulx="2060uly="803"        lrx="2210lry="844">Hiss,</zone>       <zone type="wordulx="2255uly="803"        lrx="2345lry="839">his</zone>      </line>      <line type="text_lineulx="420"       uly="924lrx="2318lry="976">       <zone type="wordulx="420uly="930"        lrx="609lry="975">family</zone>       <zone type="wordulx="645uly="930"        lrx="740lry="965">and</zone>       <zone type="wordulx="777uly="927"        lrx="1092lry="965">associates</zone>       <zone type="wordulx="1128uly="938"        lrx="1221lry="964">was</zone>       <zone type="wordulx="1258uly="930"        lrx="1351lry="964">not</zone>       <zone type="wordulx="1386uly="928"        lrx="1547lry="962">known</zone>       <zone type="wordulx="1584uly="930"        lrx="1645lry="962">at</zone>       <zone type="wordulx="1681uly="927"        lrx="1769lry="962">the</zone>       <zone type="wordulx="1809uly="925"        lrx="1932lry="961">time</zone>       <zone type="wordulx="1969uly="927"        lrx="2027lry="962">of</zone>       <zone type="wordulx="2066uly="925"        lrx="2219lry="961">trial</zone>       <zone type="wordulx="2253uly="926"        lrx="2318lry="971">by</zone>      </line>          <!-- encoded text continues -->     </zone>    </zone>   </surface>    <!-- encoded text continues -->  </sourceDoc> </TEI>

4.2.3. Level 2: Minimal Encoding

4.2.3.1. Reference

Note that this is a ‘syntactically conformant’ customization, in that documents that are valid against this scheme will also be valid against the TEI_all schema. However, it is unclear whether or not it is truly ‘TEI conformant’, as P5 does not make clear whether or not encoding of individual paragraphs is mandatory.

4.2.3.2. Purpose

To create electronic text for full-text searching, linking to page images, and identifying simple structural hierarchy to improve navigation. (For example, you can generate a table of contents automatically from such encoding.)

4.2.3.3. Rationale

The text is mainly subordinate to the page image, though navigational markers (textual divisions, headings) are captured. However, the text could stand alone as electronic text (without page images) if the accuracy of its contents is suitable to its intended use and it is not necessary to display low-level typographic or structural information. Use cases for Level 2 require a set of elements more granular than those of Level 1, including bibliographic or structural information below the monographic or volume level. One of the motivations for using Level 2 is to avoid expensive analysis of textual elements and/or the expense of accurate text conversion, e.g., double-keying or detailed proofreading of automatic OCR.

For the most part, though, Level 2 texts are not intended to be displayed separately from their page images. Level 2 encoding of sections and headings provides greater navigational possibilities than Level 1 encoding, and enables searching to be restricted within particular textual divisions (for example, searching for two phrases within the same chapter).

Level 2 is most suitable for projects with the following characteristics:
  • A large volume of material is to be made available online quickly.
  • A digital image of each page is desired.
  • The material is of interest to a large community of users who wish to read texts that allow keyword searching.
  • Rudimentary search and display capabilities based on the large structures of the text are desired.
  • Each text is checked to ensure that textual divisions and headers are properly identified.
  • Extensibility is desired; that is, one desires to keep open the option for a higher level of encoding to be added at a later date.
4.2.3.4. Workflow

Level 2 generally can be created and encoded by automated means. Pagination is identified as in Level 1, and metadata for the textual divisions is created, likely based on the page images. The textual division metadata might contain the page number on which the division begins and a transcription of that division's heading. This metadata is inserted into the OCR at the appropriate points, forming a valid XML document. Level 2 texts do not require any special knowledge or manual intervention below the section level.

4.2.3.5. Element Recommendations for Level 2

Optionally use the elements specified in Level 1. In addition, use the following:

<text> [recommended] Used as a wrapper for the encoded transcription of the source document.
<front>, <back> [optional] Contains one or more <div> or <div1>.
<body> [recommended] Contains one or more <div> or <div1>.
<div1> or <div> [recommended] One <div> or <div1> is used per section of the text identified with division-level metadata. If no type attribute is specified, a type value of section should be presumed.
<head> [recommended] Use if headings are present. As in P5, this element must appear before the <ab> for that division.
<ab> [recommended] There should be only one child of the <div> (or <div1>): a single <ab> wrapping all of the OCR text. If the TEI document is ever ‘upgraded’ to Level 3 or higher, the <ab> element will be replaced by structural elements like <p> and <table>.
<lb> [optional] Indicates the beginning of a new line.
<cb> [optional] indicates the beginning of a new column.
4.2.3.6. Level 2 Examples
4.2.3.6.1. Level 2 Basic Structure
<TEI xmlns="http://www.tei-c.org/ns/1.0">  <teiHeader xml:lang="en"> <!-- header goes here -->  </teiHeader>  <sourceDoc> <!-- entire <sourceDoc> is optional, but it might have, for example: -->  <surface facs="imgs/xmp_pg01.jpg">    <line>[ OCR content of page 1 line 1 here ]</line>    <line>[ OCR content of page 1 line 2 here ]</line>    <line>[ OCR content of page 1 line 3 here ]</line>      <!-- ... -->   </surface>   <surface facs="imgs/xmp_pg02.jpg">    <line>[ OCR content of page 2 line 1 here ]</line>    <line>[ OCR content of page 2 line 2 here ]</line>    <line>[ OCR content of page 2 line 3 here ]</line>      <!-- ... -->   </surface>    <!-- ... -->  </sourceDoc>  <text xml:lang="en">   <front> <!-- entire <front> is optional, but it might have, for example: -->   <div type="titlePage">     <pb facs="[URI of title page image]"/>     <ab>[ entire title page here ]</ab>    </div>    <div type="TOC">     <pb n="ii"      facs="[URI of table of contents]"/>     <head>[ heading of table of contents ]</head>     <ab>[ entire table of contents here ]</ab>    </div>    <div type="preface">     <head>[ heading of preface ]</head>     <ab>[ entire preface, with interspersed <gi>pb</gi> elements pointing          to page images as needed, here ]</ab>    </div>   </front>   <body>    <div type="section">     <pb n="1facs="[URI of page 1 image]"/>     <head>[ heading of section 1 ]</head>     <ab>[ entire contents of section 1 here, with          interspersed <gi>pb</gi> elements pointing to page          images; in this example there are 26 more pages          to section 1 ]</ab>    </div>    <div type="section">     <pb n="27"      facs="[URI of page 27 image]"/>     <div type="subsection">      <head>[ heading of section 2 subsection 1 ]</head>      <ab>[ all the paragraphs of subsection one go here            with page breaks inserted ]</ab>     </div>    </div>   </body>   <back> <!-- optional: organized like <front>, with 1 or more <div> or <div1>,      each with a single <ab> -->   </back>  </text> </TEI>
4.2.3.6.2. Level 2 Alger Hiss document
<TEI xml:id="someid2" xmlns="http://www.tei-c.org/ns/1.0">  <teiHeader xml:lang="en"> <!-- header goes here -->  </teiHeader>  <text xml:lang="en">   <body>    <div1>     <pb n="113facs="00000001.tif"/>        <!-- content of head element was transcribed from page image -->    <head>POINT VIII. BECAUSE OF UNLAWFUL SURVEILLANCE, PETITIONER'S     <lb/>CONVICTION SHOULD BE VACATED; ALTERNATIVELY, DISCOVERY AND A HEARING     <lb/>SHOULD BE ORDERED.</head>     <ab> <!-- uncorrected OCR for first page image begins here -->          POINT VIII.     <lb/>BECAUSE OF UNLAWFUL SURVEILLANCE, PETITIONER'S     <lb/>CONVICTION SHOULD BE VACATED; ALTERNATIVELY,     <lb/>DISCOVERY AND A HEARING SHOULD BE ORDERED.     <lb/>The nature and extent of surveillance of Hiss, his     <lb/>family and associates was not known at the time of trial by     <lb/>the defense. Even now, with the release of some of the govern‐     <lb/>ment documents concerning FBI investigative techniques regarding     <lb/>Hiss, the full extent of surveillance -- wiretapping, mail open‐     <lb/>ings, mail covers, physical surveillance, and other intrusive     <lb/>techniques -- is still not 'clear. Nevertheless, it is apparent     <lb/>that information gathered through the exploitation of unlawful     <lb/>wiretaps and other illegal surveillance was used at trial and     <lb/>consequently the conviction must be reversed. Alternatively,     <lb/>further discovery and a hearing is essential to a fair deter‐     <lb/>mination regarding these issues.     <lb/>FBI surveillance of Hiss began in earnest in 1941 with     <lb/>the institution of a mail cover on his incoming correspondence     <lb/>at his home in connection with an FBI investigation of possible     <lb/>Hatch Act violations. CN Ex. 98A. Another mail cover was placed                       <lb/>-113 -                            <!-- uncorrected OCR for first page image ends here -->     <pb n="114facs="00000002.tif"/>          <!-- uncorrected OCR for second page image begins here -->     <lb/>on the Hiss mail in 1945, and at the same time the FBI obtained     <lb/>toll call records from the Hiss residence Telephone for the     <lb/>years 1943 and 1944 as well. CN Ex. 99. In September, 1945,     <lb/>the FBI intercepted telegrams to Hiss as well. CN Ex. 100.     <lb/>In late November, 1945, FBI surveillance of the Hiss     <lb/>residence in Washington, D.C., escalated. For the third time,     <lb/>a mail cover was instituted beginning on November 28, 1945,     <lb/>which was continued at least until 1946. CN Ex. 101 at p. 70;     <lb/>CN Ex. 102. Continuous physical surveillance of Hiss was begun     <lb/>as well. CN Ex. 101 at p. 72. Although this twenty-four-hour     <lb/>surveillance was discontinued on December 14, 1945, physical     <lb/>surveillance was conducted frequently at various times until     <lb/>September, 1947. CN Ex. 102; CN Ex. 103.     <lb/>The most intrusive invasion of petitioner's rights     <lb/>68/ Also before 1947, a letter from Priscilla Hiss addressed     <lb/>to her son, Timothy Hobson, was intercepted and its contents     <lb/>read. CN Ex. 100A at p. 167. In approximately March, 1947,     <lb/>a letter from a Michael Greenberg addressed to petitioner re‐     <lb/>garding an application for employment with the United Nations     <lb/>was also intercepted, in a manner not revealed by the docu‐     <lb/>ments. CN Ex. 100B                       <lb/>-114 -                                     <!-- uncorrected OCR for second page image ends here -->     <pb n="115facs="00000003.tif"/>          <!-- uncorrected OCR for third page image begins here -->     <lb/>occurred from December 13, 1945 until the Hisses moved from     <lb/>Washington, D.C. to New York City on September 13, 1947. A     <lb/>"technical surveillance," -- a wiretap -- was placed on the Hiss     <lb/>telephone at their residence on P Street-in Washington, D.C.     <lb/>The logs of this surveillance constitute twenty-nine volumes     <lb/>of FBI serials and are roughly 2,500 pages in length, in which     <lb/>an enormous amount of information concerning the Hisses' per‐     <lb/>sonal lives, relationships with friends and associates, and     <lb/>habits is recorded.     <lb/>The wiretap was installed following FBI Director Hoover's     <lb/>application to the Attorney General for authorization, although     <lb/>no written authorization appears in the documents released to     <lb/>Hiss. The purpose of the application was to gather information     <lb/>regarding Hiss' alleged contacts with Soviet espionage agents and     <lb/>communists in government service, general allegations which had     <lb/>been made by Elizabeth Bentley and Chambers.     <lb/>As one would expect, the interception of every telephone     <lb/>h9/ Hoover's initial request was answered by a note reques‐     <lb/>ting information on Hiss. CN Ex. 104. Additional information     <lb/>was furnished by letter dated November 30, 1945. CN Ex. 105.                       <lb/>-115 -                            <!-- uncorrected OCR for third page image ends here -->     </ab>    </div1>   </body>  </text> </TEI>

4.2.4. Level 3: Simple Analysis

4.2.4.2. Purpose

To create a stand-alone electronic text and identify hierarchy (logical structure) and typography without content analysis being of primary importance.

4.2.4.3. Rationale

Encoding at this level provides the foundation for upgrading to higher levels of encoding. Level 3 generally requires some human editing, but the features to be encoded are determined by the logical structure and appearance of the text and not specialized content analysis.

Level 3 texts identify front and back matter, textual divisions, and all paragraph breaks. Floating texts (sub-texts like a poem or letter embedded in the greater text) are supported in this level. The finer granularity of encoding these features, as well as figures, notes, and all changes of typography, allows a range of options for display, delivery, and searching. For example, one has the option of identifying, and therefore specifying, the display characteristics of different typographic styles, and regularizing the display and placement of note text.

Level 3 texts can stand alone as text without page images and therefore can be uploaded, downloaded, and delivered to users quickly. In addition, they require less storage space than digital collections with page images. However, the simple level of structural analysis and absence of specialized content analysis reflected in Level 3 encoding may make it desirable for some, depending on project priorities, to include page images in order to provide users with a fuller set of resources on the source document.

Level 3 is most suitable for projects with the following characteristics:
  • The material is of interest to a large community of users who wish to read texts that allow for keyword searching.
  • Some sophistication of display, delivery, and searching based on structure of the text is desired.
  • Each text will undergo quality control to ensure that encoding decisions have been made appropriately.
  • The users of the texts may have limited storage or display capabilities.
  • The creator of the texts has limited or no ability to provide content expertise to analyze, tag, or review texts.
  • Extensibility is desired; that is, one desires to keep open the option for a higher level of encoding to be added at a later date.
4.2.4.4. Workflow

Level 3 texts can be created by semi-automated conversion from an electronic source such as an HTML file or word-processor document or from a print source, either through OCR or keyboarding; some human intervention is likely necessary. Level 3 texts can also be generated trivially by converting from outsourced double-keyboarded texts conforming to TEI Tite, though some granularity of encoding will be lost in the translation.

4.2.4.5. Element Recommendations for Level 3

Use all elements specified in Level 2 except <ab> as defined for that level, plus the following:

<front>, <back> [recommended] Use if front or back matter is present.
<div> or <div1> [recommended] At least one <div> or <div1> is recommended within each of <front>, <body>, and <back>; type attribute is recommended. If a bibliography is included in the document, it is recommended that it be encoded within a <div> whose type attribute has the value bibliography.
<p> [recommended] Use for paragraphs in prose.
<lg> and <l> [recommended] Use for identifying groups of lines and lines, respectively.
<figure> and appropriate child elements [recommended] Use to refer to illustrative images and descriptive information about those images.
<floatingText> [optional] Use to indicate a floating text.
<note> [recommended] Use for notes.
<ptr> or <ref> [recommended] If a table of contents is encoded, <ptr> or <ref> is recommended for linking to sections of the document. If notes are encoded at the point they occur in the text or at another point convenient when converting from a born-digital source document, recommended for encoding the point of reference.
<hi> [recommended] Indicates changes in typeface; rend attribute is optional.
<list> and <item> [optional] Use to indicate ordered and unordered list structures.
<table>, <row>, and <cell> [optional] Use to indicate table structures.
<listBibl> [recommended] If a bibliography is present and encoded, use for identifying the bibliographic entries. Note that any header or title should be encoded by a <head> in the containing <div>.
<bibl> [recommended] Contains a single bibliographic entry. Phrase level markup (e.g., <hi>) is optional. Further structural markup (e.g., <author>) is not used at level 3.
4.2.4.6. General Level 3 Recommendations
4.2.4.6.1. Forme Work

Running heads, catch words, page numbers, signatures, and other artifacts derived from printing should not be included in Level 3, with the exception of page numbers, which are recorded using the n attribute on <pb>. If upgrading a text from Level 1 or Level 2 that was generated using OCR, discard any forme work text.

4.2.4.6.2. Level 3 Figures

<figure> groups elements representing or containing graphic information such as an illustration or figure. A <figure> should contain the following elements:

<head>
for a caption label (e.g., ‘Figure 1’) or, when no caption label is present, a literal transcription of a caption. Use when this feature is present in the source document.
<p>
When a caption label is present and encoded using <head>>, use <p> for a literal transcription of a caption (could be used in conjunction with the <head> tag if a caption label is present). Use when this feature is present in the source document.
<figDesc>
for a description of the image to serve as an alternative to viewing the image. This is recommended in order to create digital texts that will be accessible to the visually impaired.
<graphic>
for pointing to the URI of the image itself using a url attribute and containing other presentation instructions such as dimension at which the graphic should be displayed, etc. This is recommended in order to point to the corresponding image file.
Here is an example of the encoding of a frontispiece using <figure>:
<front>  <div type="frontispiece">   <figure>    <head>Sojourner Truth.</head>    <figDesc>Woodcut of Sojourner Truth.</figDesc>    <graphic url="http://docsouth.unc.edu/neh/truth50/frontis.html"     scale="0.5"/>   </figure>  </div>  <ref target="Etc">...</ref> </front>
Note: Truth, Sojourner and Olive Gilbert. 1850. Narrative of Sojourner Truth, a Northern Slave, Emancipated from Bodily Servitude by the State of New York, in 1828. Boston: n.p.
4.2.4.6.3. Tables of Contents

You may wish not to include in your TEI document front matter such as a table of contents or a list of illustrations, especially if you plan to automatically generate these from the encoded text. If you do, however, plan to manually encode such textual features, use a <div> (or <div1>) element with an appropriate type attribute (e.g., <div type="contents">) to surround the encoding of the feature. Within this division, use the <list> element to mark up the table of contents, list of illustrations, etc. Each list item should have a <ptr> or <ref> element with a target attribute referencing an xml:id attribute on the <pb> or on the <div> (or <div1>) of the referenced page or section. Use <ref> if you wish to transcribe page numbers in the table of contents; use <ptr> if you do not.

4.2.4.6.4. Notes

Use the <note> element to encode the text of a margin note, footnote, endnote, or other note found in the source document. If a point of attachment is marked with a siglum (such as an asterisk or superscript number), encode the siglum with a <ref> element. Alternatively, if there is no marked point of attachment, insert a <ptr> at the most likely implied point of attachment for the note.

The text of the note (inside a <note> element) may either (a) be left where it occurs in the layout of the page (as may be more convenient in the case of conversion from OCR text) or may (b) be moved so that the <note> occurs just following the <ref> or <ptr> marking the point of attachment. If a note span pages, be sure to insert a <pb> to mark the start of a new page. Similarly, if the note in the source document begins on a different page from the place of reference, a <pb> should be inserted before the first word of the note within the <note> to record this.

The siglum is often repeated at the beginning of the note to aid the reader in matching the siglum in the text to the note. Encode any siglum within the note using <label>.

Here is an example encoded according to method (a)—first the sigla:
<p>The three little pigs built their houses out of straw,<ref target="#n_a1">1</ref> sticks,<ref target="#n_a2">2</ref> and bricks.<ref target="#n_a3">3</ref> </p>
Note: Christian. 2012. How to properly typeset footnotes/superscripts after punctuation marks?. https://tex.stackexchange.com/questions/56063/how-to-properly-typeset-footnotes-superscripts-after-punctuation-marks/56085.

And then the annotations themselves, encoded as <note> elements inside a division in the backmatter that holds the endnotes:

<back>  <div type="endnotes">   <head>The Collected End-notes</head>   <note place="bottomanchored="true"    xml:id="n_a1">    <label>1</label>      not to be confused with hay</note>   <note place="bottomanchored="true"    xml:id="n_a2">    <label>2</label>      or lumber according to some sources</note>   <note place="bottomanchored="true"    xml:id="n_a3">    <label>3</label>      probably fired clay bricks</note>  </div> </back>
Here is an example encoded according to method (b), with notes encoded right after the point of attachment:
<p>The three little pigs built their houses out of straw,<ref target="#n_b1">1</ref>  <note place="bottomanchored="true"   xml:id="n_b1">   <label>1</label> not to be confused with hay  </note> sticks,<ref target="#n_b2">2</ref>  <note place="bottomanchored="true"   xml:id="n_b2">   <label>2</label> or lumber according to some sources  </note> and bricks.<ref target="#n_b3">3</ref>  <note place="bottomanchored="true"   xml:id="n_b3">   <label>3</label> probably fired clay bricks  </note> </p>
4.2.4.7. Level 3 Examples
4.2.4.7.1. Level 3 Basic Structure: Prose
<TEI xml:id="MBFG0236" xmlns="http://www.tei-c.org/ns/1.0">  <teiHeader xml:lang="en"> <!-- header goes here -->  </teiHeader>  <text xml:lang="en">   <front>    <div type="frontispiece">[figure]</div>    <titlePage>[text]</titlePage>    <div type="dedication">[text]</div>    <div type="contents">[text]</div>   </front>   <body>    <div type="book">     <head>[book title]</head>     <div type="chapter">[text]</div>     <div type="chapter">[text]</div>     <div type="chapter">[text]</div>     <div type="chapter">[text]</div>     <div type="chapter">[text]</div>    </div>   </body>   <back>    <div type="appendix">[text]</div>    <div type="index">[text]</div>   </back>  </text> </TEI>
4.2.4.7.2. Level 3 Basic Structure: Verse
<TEI xml:id="VAA2383" xmlns="http://www.tei-c.org/ns/1.0">  <teiHeader xml:lang="en"> <!-- header goes here -->  </teiHeader>  <text xml:lang="en">   <front>    <titlePage>[text]</titlePage>    <div type="dedication">[text]</div>    <div type="contents">[text]</div>   </front>   <body>    <div type="book">     <head>[book title]</head>     <div type="part">      <head>[section title]</head>      <div type="poem">       <head>THE DAYS GONE BY.</head>       <lg>        <l>O the days gone by! O the days gone by!</l>        <l>The apples in the orchard, and the pathway through the rye;</l>        <l>The chirrup of the robin, and the whistle of the quail</l>        <l>As he piped across the meadows sweet as any nightingale;</l>        <l>When the bloom was on the clover, and the blue was in the sky,</l>        <l>And my happy heart brimmed overin the happy days gone by.</l>       </lg>       <lg>[lines of poetry]</lg>       <lg>[lines of poetry]</lg>       <lg>[lines of poetry]</lg>      </div>     </div>    </div>   </body>  </text> </TEI>
4.2.4.7.3. Level 3 Table of Contents
<!--target attribute references page break identifier--> <div type="contents">  <head>CONTENTS</head>  <list type="simple">   <item>I. A Boy and His Dog <ref target="#VAA2383_011"     rend="text-align: right">3</ref>   </item>   <item>II. Romance <ref target="#VAA2383_020"     rend="text-align: right">12</ref>   </item>   <item>III. The Costume <ref target="#VAA2383_029"     rend="text-align: right">21</ref>   </item>   <item>IV. Desperation <ref target="#VAA2383_038"     rend="text-align: right">30</ref>   </item>   <item>V. The Pageant of the Table Round <ref target="#VAA2383_046"     rend="text-align: right">38</ref>   </item>  </list> </div>
4.2.4.7.4. Level 3 Chapter with Letter
<div type="chapter">  <pb xml:id="VAA2383_126n="118"/>  <head type="main">CHAPTER XIV</head>  <head type="subtitle">MAURICE LEVY'S CONSTITUTION</head>  <p>   <hi rend="font-weight: bold">L</hi>O, SAM!" said Maurice cautiously. "What you    doin'?"</p>  <p>Penrod at that instant had a singular experiencean intellectual shock like a flash    of fire in the brain. Sitting in darkness, a great light flooded him with wild    brilliance. He gasped!</p> <!--Text removed from example--> <p>"What you doin'?" asked Maurice for the third time, Sam Williams not having decided    upon a reply.</p>  <pb xml:id="VAA2383_127n="119"/>  <p>It was Penrod who answered.</p>  <p>"Drinkin' lickrish water," he said simply, and wiped his mouth with such delicious    enjoyment that Sam's jaded thirst was instantly stimulated. He took the bottle    eagerly from Penrod.</p>  <p>"A-a-h!" exclaimed Penrod, smacking his lips. "That was a good un!"</p> <!--Text removed from example--> <p>Penrod uttered some muffled words and then waved both armseither in response or as    an expression of his condition of mind; it may have been a gesture of despair. How    much intention there was in this actobviously so rash, considering the position he    occupiedit is impossible to say. Undeniably there must remain a suspicion of    deliberate purpose.</p> <!--Text removed from example--> <pb xml:id="VAA2383_138n="130"/>  <p>The damsel curtsied again and handed him the following communication, addressed to    herself: </p>  <floatingText>   <body>    <div type="letter">     <p>"Dear madam Please excuse me from dancing the cotilo with you this afternoon          as I have fell off the barn</p>     <p>"Sincerly yours<lb/> "<hi rend="font-variant: small-caps">Penrod            Schofield</hi>." </p>    </div>   </body>  </floatingText> </div>
4.2.4.7.5. Level 3 Alger Hiss document
<TEI xml:id="someid3" xmlns="http://www.tei-c.org/ns/1.0">  <teiHeader xml:lang="en"> <!-- header goes here -->  </teiHeader>  <text xml:lang="en">   <body>    <div1>     <pb n="113facs="00000001.tif"/>     <head>POINT VIII. BECAUSE OF UNLAWFUL SURVEILLANCE, PETITIONER'S CONVICTION          SHOULD BE VACATED; ALTERNATIVELY, DISCOVERY AND A HEARING SHOULD BE          ORDERED.</head>     <p>The nature and extent of surveillance of Hiss, his family and associates was          not known at the time of trial by the defense. Even now, with the release of          some of the govern­ ment documents concerning FBI investigative techniques          regarding Hiss, the full extent of surveillance -- wiretapping, mail open­          ings, mail covers, physical surveillance, and other intrusive techniques -- is          still not 'clear. Nevertheless, it is apparent that information gathered          through the exploitation of unlawful wiretaps and other illegal surveillance          was used at trial and consequently the conviction must be reversed.          Alternatively, further discovery and a hearing is essential to a fair deter­          mination regarding these issues.</p>     <p>FBI surveillance of Hiss began in earnest in 1941 with the institution of a          mail cover on his incoming correspondence at his home in connection with an          FBI investigation of possible Hatch Act violations. CN Ex. 98A. Another mail          cover was placed <pb n="114facs="00000002.tif"/> on the Hiss mail in 1945,          and at the same time the FBI obtained toll call records from the Hiss          residence Telephone for the years 1943 and 1944 as well. CN Ex. 99. In          September, 1945, the FBI intercepted telegrams to Hiss as well. CN Ex.          100.</p>     <p>In late November, 1945, FBI surveillance of the Hiss residence in Washington,          D.C., escalated. For the third time, a mail cover was instituted beginning on          November 28, 1945, which was continued at least until 1946. CN Ex. 101 at p.          70; CN Ex. 102. Continuous physical surveillance of Hiss was begun as well. CN          Ex. 101 at p. 72. Although this twenty-four-hour surveillance was discontinued          on December 14, 1945, physical surveillance was conducted frequently at          various times until September, 1947. <ptr target="#n68">68</ptr>      <note place="bottomanchored="true"       xml:id="n68">       <label>68</label>Also before            1947, a letter from Priscilla Hiss addressed to her son,            Timothy Hobson, was intercepted and its contents read. CN Ex. 100A at p.            167. In approximately March, 1947, a letter from a Michael Greenberg            addressed to petitioner re­ garding an application for employment with the            United Nations was also intercepted, in a manner not revealed by the docu­            ments. CN Ex. 100B</note> CN Ex. 102; CN Ex. 103.</p>     <p>The most intrusive invasion of petitioner's rights <pb n="115facs="00000003.tif"/> occurred from December 13, 1945 until the Hisses moved          from Washington, D.C. to New York City on September 13, 1947. A "technical          surveillance," -- a wiretap -- was placed on the Hiss telephone at their          residence on P Street-in Washington, D.C. The logs of this surveillance          constitute twenty-nine volumes of FBI serials and are roughly 2,500 pages in          length, in which an enormous amount of information concerning the Hisses' per­          sonal lives, relationships with friends and associates, and habits is          recorded.</p>     <p>The wiretap was installed following FBI Director Hoover's application to the          Attorney General for authorization, <ptr target="n69">69</ptr>      <note place="bottomanchored="true"       xml:id="n69">       <label>69</label>Hoover's initial request was answered by a note reques­ ting            information on Hiss. CN Ex. 104. Additional information was furnished by            letter dated November 30, 1945. CN Ex. 105.</note> although no written          authorization appears in the documents released to Hiss. The purpose of the          application was to gather information regarding Hiss' alleged contacts with          Soviet espionage agents and communists in government service, general          allegations which had been made by Elizabeth Bentley and Chambers.</p>     <p>As one would expect, the interception of every telephone</p>    </div1>   </body>  </text> </TEI>

4.2.5. Level 4: Basic Content Analysis

4.2.5.2. Purpose

To create text that can stand alone as electronic text, identifies hierarchy and typography, specifies the function of textual and structural elements, and describes the nature of the content and not merely its appearance. However, this level is not meant to encode or identify all structural, semantic, or bibliographic features of the text.

4.2.5.3. Rationale

Greater description of function and content at this level of encoding allows for:

  • flexibility of display and delivery
  • sophisticated searching within specified textual and structural elements
  • combining the broadest range of uses and audiences

Level 4 texts contain elements and attributes that describe content, not just appearance, of the text. Texts encoded at Level 4 are able to stand alone without page images in order for them to be read by students, scholars, and general readers, and the encoding of content allows these texts to work effectively with screen readers and other applications that rely on the structure of a text, not just its appearance.

Finally, functionally accurate encoding in Level 4 texts allows them to be searched or displayed in sophisticated ways. For example, perhaps a searcher could limit their search in a dramatic text to stage directions or in a verse text to only first lines. Alternatively, in a political tract published by subscription, a search could be confined to names that appear in lists, thus limiting a search to names of people who subscribed to a particular volume. This ability to limit searches becomes more significant as textbases become larger, and thus is of great importance to the library community as it attempts to build into the initial design and implementation of textbases the features needed to enhance interoperability.

Level 4 is most suitable for projects with the following characteristics:
  • Sophisticated search and retrieval capabilities are desired.
  • The texts will be used for textual analysis.
  • Extensibility is desired; that is, one desires to keep open the option for a higher level of encoding to be added by the scholarly community at a later date.
  • The users of the texts may have limited storage or display capabilities.
4.2.5.4. Workflow

Text is generated by keyboarding (likely outsourced double keyboarding from page images using TEI Tite) or possibly by correcting OCR text using software that identifies spelling mistakes or consults a log from the OCR software to find regions of uncertainty in the OCR text. If converting from TEI Tite, minimal additional markup should be added, as discussed in Appendix A of TEI Tite.

4.2.5.5. Element Recommendations for Level 4

Use all elements specified in Levels 2 and 3 except <ab> as defined for Level 2, plus elements in the following table. Note that some of these elements are defined in Level 3 as well, but their use in Level 4 is more strict.

<titlePage> and appropriate child elements [recommended]
<group> [recommended] Use to encode a collection of independent texts that are regarded as a single group for processing or other purposes.
<div>or<div1>, <div2>, <div3>, etc. [recommended] Use for encoding a hierarchy of textual divisions. Use as many levels of hierarchy as needed to represent the source document.
<floatingText> [recommended] Use when a floating text is identified.
<list> and <item> [recommended] Use to indicate ordered and unordered list structures.
<table>, <row>, and <cell> [recommended] Use to indicate table structures.
<hi> [recommended] Use to indicate change in rendition when a more specific element is not being used; rend attribute is optional.
<opener>, <dateline>, <salute><closer>, <signed>, <postscript> [recommended] Use to indicate specific parts of letters.
<castList>, <castItem>, <sp>, <speaker>, and <stage> [recommended] Use to encode different structures in performance texts (i.e. drama).
<sp> and <speaker> [recommended] Use to encode oral history interviews.
<epigraph> [recommended] Use for encoding epigraphs found as front matter
<quote rend="___"> [recommended] Use for encoding blockquotes that appear outside the flow of a paragraph. Use style, rendition, or rend to describe the appearance (such as <style="padding-left: 0.5in;">)
<argument> [recommended] Use to encode a list of topics sometimes found at the start of a chapter or other textual division.
<trailer> [recommended] Use to encode a closing title or footer at the end of a division.
<quote>, <said>, <mentioned>, and <soCalled> [optional]
<emph>, <distinct>, <foreign>, <gloss>, and <term> [optional]
<title type="_"> [optional] Use of this element within the <text> (not the <teiHeader>) is optional, especially when text is typographically distinct. Optionally use the type attribute with a value as given in P5 except for main titles. (The main value should be used, when appropriate, for <title>s within a TEI header, but is not needed for <title>s elsewhere in a document.)
<ptr> or <ref> [optional] In addition to using to point to notes (as in Level 3), use for identifying cross-references within the text.
<sic>, <corr>, or <choice> [optional] Use to encode errors or typos.
<add>, <del>, <gap>, and <unclear> [optional] Use to encode material that is added, marked for deletion, or is illegible, invisible, or inaudible.
<persName>, <placeName>, <geogName>, <orgName>, and <name type="___"> [optional] Use to encode personal, place, organizational, and other names used in a text. The <persName> element is also used to indicate a personal name inside a <person> element (see below).
<listName>, <listPlace>, and <listOrg> [optional] Use in support of personal, place, and organizational names normalization and to capture additional information about the names. Should be captured in an external TEI file or database for easier maintenance of names.
<listPerson> [optional] Use in support of transcriptions of oral interviews.
<person> [optional] Use in support of transcriptions of oral interviews. The use of the role attribute is strongly recommended, particularly to differentiate interviewer(s) from interviewee(s).
<birth> [optional] Use inside a <person> to indicate the date or place of birth of the person being documented.
<bibl> [optional] Contains a single bibliographic entry. Use of at least one <author> or <editor> child element is recommended. Other valid child elements given in P5 may also be used to encode a bibliographic citation, such as <title>, <publisher> and, <biblScope>. It is recommended that the <date> always appear with a when attribute in order to provide a canonical form of the date.
<ab type="typography"> [recommended] Use to mark typographical elements that indicate a structural break or boundary.
4.2.5.6. General Level 4 Recommendations and Examples

As shown above, there are many optional elements at Level 4. While content for many of these elements can be identified within running prose based on changes in typography or use of quotation marks in the source document, they are not always so easily identified, or they may occur so often that identification of each instance is impractical. Use only those optional elements that are appropriate for your users' needs and your encoding budget.

Below are notes on usage of specific elements:
  • The use of <group> is recommended when you need to encode a body of distinct texts that are grouped together and are regarded as a unit. Most typical examples of such composite texts would be anthologies, collected works of an author, etc. Section 4.3.1 Grouped Texts states, ‘The presence of common front matter referring to the whole collection, possibly in addition to front matter relating to each individual text, is a good indication that a given text might usefully be encoded in this way.’
  • Use <argument> to encode a prefatory list or prose description of the topics usually discovered at the beginning of a chapter. The content within the <argument> element can be presented as a list or as a paragraph:
    <div type="chaptern="1">  <pb xml:id="albert14n="14"/>  <head>CHAPTER I.</head>  <head>CHARLOTTE BROOKS.</head>  <argument>   <p>Causes of immorality among colored people - Charlotte Brooks - She is sold South -      Sunday work.</p>  </argument>  <p> ... </p> </div>
    Note: Albert, Octavia V. Rogers. 1890. The House of Bondage, or, Charlotte Brooks and Other Slaves, Original and Life Like, As They Appeared in Their Old Plantation and City Slave Life; Together with Pen-Pictures of the Peculiar Institution, with Sights and Insights into Their New Relations as Freedmen, Freemen, and Citizens. New York: Hunt & Eaton.
  • The <trailer> element is recommended to encode a heading- or title-like content at the end of a textual division:
    <body>  <head>[book title]</head>  <div type="chaptern="1">   <head>[chapter title]</head>   <p>[text]</p>   <trailer>Here ends the Chapter 1.</trailer>  </div>  <div type="chaptern="2">   <head>[chapter title]</head>   <p>[text]</p>   <trailer>Here ends the Chapter 2.</trailer>  </div>  <trailer>FINIS.</trailer> </body>
  • Typographically distinct text may be encoded using the following elements:
  • Any ambiguous typographically distinct text should be encoded as hi (e.g. <hi rend="font-weight: bold">). This element may also be used if the more specific elements above are not used.
  • Any of the following three methods may be used to encode errors or typos in original texts:
    • the sic element used alone is optional to indicate errors without correcting them
    • the corr element used alone is optional to provide corrections without indicating the initial error
    • the choice element allows both the apparent error and its editorial correction to be recorded, as in the following examples:
      <p>He has no Scruple about Fish; but won't touch a bit of Pork, it being <choice>   <sic>expresly</sic>   <corr>expressly</corr>  </choice> forbidden by their Law.</p>
      Note: Bluett, Thomas. 1734. Some Memoirs of the Life of Job, the Son of Solomon, the High Priest of Boonda in Africa; Who was a Slave About Two Years in Maryland; and Afterwards Being Brought to England, was Set Free, and Sent to His Native Land in the Year 1734. London: n.p.. or
      <p>4. The art of writing she obtained by her own industry and curiosity, and in so short a time that in the year 1765, when she was not more than twelve years of <choice>   <sic>age,she</sic>   <corr>age, she</corr>  </choice> was capable of writing letters to her friends <pb xml:id="p11n="11"/> on various subjects. She also wrote to several persons in high stations.</p>
      Note: Mott, Abigail. 1826. Biographical Sketches and Interesting Anecdotes of Persons of Colour. To Which is Added, a Selection of Pieces in Poetry.New-York: M. Day.
  • The elements <add>, <del>, <unclear>, <gap> may be used to indicate instances when a text (i.e., a word or part of it, or a phrase or part of it) has been added or marked for deletion, or to indicate cases where transcription is difficult (<unclear>) or impossible (<gap>) because the material is illegible, invisible, or inaudible (such as while transcribing oral history interviews):
    <p>But it is well authenticated by the observation of every one, that <del rend="text-decoration: line-through"   hand="#JHL">their manner</del>  <add rend="vertical-align: super"   hand="#JHL">this way—i.e.    the above</add> of writing influences the style of compos. of those who practise it considerably, when they grow up to years of manhood; for their productions, <del hand="#JHL"   rend="text-decoration: line-through">instead</del> far from being terse, argumentative, convincing, are without head or tail &amp;amp; are generally an incongruous mass mixed up in the most disgusting manner, without divisions or heads &amp;amp; in short without a subject (so to speak).</p>
    Note: Lacy, J.H.. [1851]. Prejudice Against Composition Writing
    <p>But I still hope for &amp;amp; trust in God and I believe he will animate our brave defenders with a superhuman power and we will yet drive from our soil the hated invaders whose tread <gap reason="ink blot"/> profanation, but this is an hour to try men's souls—Fort Donelson has been taken by the enemy. Frank was there and covered himself with honor but his bravery cost him a wound; he was wounded in the leg slightly—a flesh wound only, you must not be uneasy.</p>
    Note: Kimberly Family Personal Correspondence, 1862-1864. Transcript of the manuscript, UNC-Chapel Hill, Southern Historical Collection.
4.2.5.6.1. Level 4 Front and Back Matter

Encode each section of front and back matter as their own textual division. Beyond what is described in the P5 Guidelines, note the following:

Titles pages (recto and verso)
The use of the <titlePage> element with appropriate child elements describing the major features of most title pages is recommended. The child elements are listed in Section 4.6, "Title Pages", of P5. <titlePage> should include the verso if present, divided by <pb n="verso"/>.
Tables of contents, errata, subscription lists, lists of other titles by the same author, and other such lists'
These should be encoded using a <list> with <item>s. For an index, use <ref target="____"> to mark up page numbers given in the index, with the value of target referring to the xml:id attribute of the <pb> of the referenced page.
4.2.5.6.2. Level 4 Name Tagging

Names should be encoded using <persName>, <placeName>, <geogName>, and <orgName> elements. For names of entities other than persons, places, and organizations, use <name> with a type attribute containing an appropriate value, such as event for the name of an event.

For all of these elements, use the ref attribute (see section 3.9.3 above) to provide a reference to a <person>, <place>, <org>, <event>, or other element in an external file or database for managing name normalization and compilation of additional information such as biographical or geospatial information. An external TEI file may contain an entry for each name, grouped accordingly under <listPerson>, <listPlace>, <listOrg>, or <listEvent>, with each name uniquely identified with an xml:id attribute. In this case the value of the ref attribute in the main TEI document (the transcription of the source document) references the value of the xml:id attribute in the external file. (In the examples below, the external file is named context.xml for ‘contextual information’ and is in the same directory as the source file, but it may be named anything and placed anywhere that can be referenced by a URI.)

When referencing external files or databases by tag URI, it is strongly recommended to provide an explanation in a <p> element in the <editorialDecl> section of the TEI header. When referencing a controlled vocabulary by relative URI, be sure to specify the controlled vocabulary in the <classDecl> section of the TEI header.

  • Place-name tagging example in main TEI document (the transcription of the source document):
    <p>The first Jews arrived in <placeName ref="http://vocab.getty.edu/page/tgn/7012924">Indianapolis</placeName> in the middle of the 19th century. Primarily immigrants from <placeName ref="context.xml#tgn_7000084"> Germany</placeName> and other points in central Europe (though many had lived elsewhere in the <placeName ref="http://vocab.getty.edu/page/tgn/7012149">United States</placeName> before they arrived in the city), they were drawn from throughout the Midwest by the growth of commerce and rail lines in <placeName ref="http://vocab.getty.edu/page/tgn/7012924">Indianapolis</placeName>. </p>
  • Personal and organizational name tagging example in main TEI document (the transcription of the source document):
    <p>PRIZE LIBRARY GIFT-Indiana University President <persName ref="http://id.loc.gov/authorities/names/n82134365.html">Elvis J.    Stahr</persName> (right), a former law dean and practicing attorney, reminisces with Professor of Law <persName ref="http://id.loc.gov/authorities/names/n00113347">W. Howard    Mann</persName> as the two inspect some of the nearly 3,000 volumes of <orgName ref="http://id.loc.gov/authorities/names/n79006848">U.S. Supreme    Court</orgName> records recently transferred to I.U. from the <orgName ref="http://id.loc.gov/authorities/names/n79109178">Indiana Supreme Court    Library</orgName>. The collection, dating back to 1925, is one of the oldest and most complete sets in existence.</p>
4.2.5.6.3. Level 4 Embedded Texts

If the embedded text is more than a short quotation, use <floatingText> even if the instance is still only an excerpt of the embedded text.

Personal letters are a common example of an embedded text. While a collection of letters would use a textual division for each letter, if a letter is quoted as part of a larger text, use <floatingText><body><div1 type="letter"> (or <floatingText><body><div type="letter"> if using unnumbered textual divisions) with <opener>, <dateline>, <salute>, <signed>, <closer>, <postscript> included as appropriate. For example:
<p>She opened and read as follows:</p> <floatingText>  <body>   <div1 type="letter">    <opener>     <dateline>AUGUSTA, March 4th, 18—</dateline>     <salute>      <hi rend="font-style: italic">Mrs. A. Mitten:</hi>     </salute>    </opener>    <p>"Having recently understood that you have procured a private teacher, we have        ventured to stop your advertisement, <hi rend="font-style: italic">though ordered to continue it          until forbid,</hi> under the impression that you have probably forgotten to have it        stopped. If, however, we have been misinformed, we will promptly resume the        publication of it. You will find our account below; which as we are much in want of        funds, you will oblige us by settling as soon as convenient. Hoping your teacher is        all that you could desire in one,</p>    <closer>     <salute>"We remain, your ob't. serv'ts,</salute>     <signed>"H—&amp;amp; B—&#x201D;</signed>    </closer>   </div1>  </body> </floatingText>
Note: Longstreet, Augustus B.. 1864. Master William Mitten: or, A Youth of Brilliant Talents, Who Was Ruined by Bad Luck. Macon, Ga.: Burke, Boykin.
4.2.5.6.4. Level 4 Drama

Within the front matter (<front>) of a performance text, cast lists should be encoded as <castList>s, with each item in that list encoded as a <castItem>. If desired, each <castItem> may be uniquely identified with an xml:id attribute.

For example,
<front>  <castList>   <head>Dramatis Personae</head>   <castItem xml:id="kllear">LEAR king of Britain</castItem>   <castItem xml:id="klfrance">KING OF FRANCE</castItem>   <castItem xml:id="klburgundy">DUKE OF BURGUNDY</castItem>   <castItem xml:id="klcornwall">DUKE OF CORNWALL</castItem>   <castItem xml:id="klalbany">DUKE OF ALBANY</castItem>   <castItem xml:id="klkent">EARL OF KENT</castItem>   <castItem xml:id="klgloucester">EARL OF GLOUCESTER</castItem>   <castItem xml:id="kledgar">EDGAR son to Gloucester.</castItem>   <castItem xml:id="kledmund">EDMUND bastard son to Gloucester.</castItem>    <!-- ... -->  </castList> </front>
Note: Shakespeare. King Lear.

Within the body of performative texts:

  • speeches are encoded as <sp> and speakers identified by the <speaker> element, which is a child of <sp>.
  • Stage directions are encoded as <stage> and enclose content describing scenery, stage directions, etc.
  • When encoding the actual speech content itself, utilize elements and attributes that correspond to the type of dramatic speech presented (e.g. <p> for prose speech with <lb> to designate a new line in a particular edition of the text or <lg> and <l> to describe dramatic verse structures).
  • If it is desired to tie the speaker(s) of a speech to a particular persona in the cast list, the who attribute of <sp> may be used to refer to the <castItem> of the speaker. When who is used, <speaker> is optional.
    <div type="actn="1">  <head>Act 1</head>  <div type="scenen="1">   <head>Scene 1</head>   <stage>King Lear's palace.</stage>   <stage>Enter KENT, GLOUCESTER, and EDMUND</stage>   <sp n="1who="#klkent">    <speaker>KENT</speaker>    <p>I thought the king had more affected the Duke of<lb/>        Albany than Cornwall.</p>   </sp>   <sp n="2who="#klgloucester">    <speaker>GLOUCESTER</speaker>    <p>It did always seem so to us: but now, in the<lb/>        division of the kingdom, it appears not which of<lb/>        the dukes he values most; for equalities are so<lb/>        weighed, that curiosity in neither can make choice<lb/>        of either's moiety.</p>   </sp>   <sp n="3who="#klkent">    <speaker>KENT</speaker>    <p>Is not this your son, my lord?</p>   </sp>    <!-- ... -->  </div> </div>
4.2.5.6.5. Level 4 Transcription of Oral History

The list of participants in the transcription of an oral history may be encoded in the body of the TEI document using the structured <listPerson> element containing <person> elements for each participant. Alternatively, the list of participants may be encoded as a free-form <list> in the body. The latter approach is especially useful when the list is included in the source document in an unstructured form.

If an oral history document has no title of its own, the speakers in oral history interviews, i.e., interviewee(s) and interviewer(s), may also be identified in the <teiHeader> as a list of <author> elements (typically each with its own <persName>) within <fileDesc> / <titleStmt>.

Regardless, use an xml:id on the <persName> element to uniquely identify the individual participant.

Questions and answers from interviewees and interviewers are encoded as <sp>, with each speaker identified either:

  • within <speaker> elements, which are the first child of <sp>, or
  • with a who attribute on <sp>, the value of which points to the the <item> for the given speaker in the list of interview participants (by its xml:id), or
  • both.
<list type="simple">  <head>Interview Participants</head>  <item>   <persName xml:id="spk1"    ref="tag:docsouth.unc.edu,2016:wftype="interviewee">WILLIAM C. FRIDAY</persName>, interviewee  </item>  <item>   <persName xml:id="spk2"    ref="tag:docsouth.unc.edu,2016:wltype="interviewer">WILLIAM LINK</persName>, interviewer  </item> </list> <!-- ... --> <sp who="#spk2">  <speaker n="2">WILLIAM LINK:</speaker>  <p>Last time we were talking about Frank Porter Graham. And I have a couple of questions    about Graham, and I wonder if you could clear them up for me. You have mentioned that you    had worked with him as a student at North Carolina State, had you met him before?</p> </sp> <sp who="#spk1">  <speaker n="1">WILLIAM C. FRIDAY:</speaker>  <p>No. That budget hearing was the first that I knew of him, of course, but the first time    that I ever encountered him. I was president of class at N.C. State, and that through me into    this kind of public adventure. And so I went merrily on downtown and sat there in the budget    hearing, along with the president of the student body, and some others.</p> </sp>
One way to synchronize audio and transcript has been introduced in Oral Histories of the American South, using <milestone> with a timestamp attribute:
<milestone n="7248unit="emptytype="stop" timestamp="00:08:54"/>
4.2.5.6.6. Level 4 Verse

Use <lg> and <l> as in Level 3. In addition, use the style, rendition, or rend attribute to indicate lines that are indented.

For example,
<div type="fitn="1">  <head>Fit the First: THE LANDING</head>  <lg type="stanzan="1">   <l n="1.1">"Just the place for a Snark!" the Bellman cried,</l>   <l n="1.2rend="margin-left: 0.5in">As he landed his crew with care;</l>   <l n="1.3">Supporting each man on the top of the tide</l>   <l n="1.4rend="margin-left: 0.5in">By a finger entwined in his hair.</l>  </lg>  <lg type="stanzan="2">   <l n="2.1">"Just the place for a Snark! I have said it twice:</l>   <l n="2.2rend="margin-left: 0.5in">That alone should encourage the crew.</l>   <l n="2.3">Just the place for a Snark! I have said it thrice:</l>   <l n="2.4rend="margin-left: 0.5in">What I tell you three times is true."</l>  </lg> <!-- ... --> </div>
Note: Carroll, Lewis. 1876. The Hunting of the Snark. London: Macmillan & Co..
4.2.5.6.7. Level 4 Typographic Separators
To mark typographic elements that indicate a structural break or boundary use <ab type="typography">. The content of this element is the character(s) or device used to mark the division in the source document. The style attribute may be used to indicate how the separator is rendered. As a content-bearing element, <ab> is recommended over the empty elements <milestone> or <space>, as it can contain further markup and glyph information. For example:
<ab type="typographystyle="text-align: center">*****</ab>

Like <pb> elements, these should always be placed within the lowest-level text division.

4.2.5.6.8. Level 4 Alger Hiss document
<TEI xml:id="project_document_identifier" xmlns="http://www.tei-c.org/ns/1.0">  <teiHeader xml:lang="en"> <!-- header goes here -->  </teiHeader>  <text xml:lang="en">   <body>    <div1>     <pb n="113"      facs="./pageImages/AH4_0113.jpg"/>     <head>POINT VIII.</head>     <head>BECAUSE OF UNLAWFUL SURVEILLANCE, PETITIONER'S     <lb/>CONVICTION SHOULD BE VACATED; ALTERNATIVELY,     <lb/>DISCOVERY AND A HEARING SHOULD BE ORDERED.</head>     <p>The nature and extent of surveillance of Hiss, his     <lb/>family and associates was not known at the time of trial by     <lb/>the defense. Even now, with the release of some of the govern-     <lb break="no"/>ment documents concerning FBI investigative techniques regarding     <lb/>Hiss, the full extent of surveillance -- wiretapping, mail open-     <lb break="no"/>ings, mail covers, physical surveillance, and other intrusive     <lb/>techniques -- is still not 'clear. Nevertheless, it is apparent     <lb/>that information gathered through the exploitation of unlawful     <lb/>wiretaps and other illegal surveillance was used at trial and     <lb/>consequently the conviction must be reversed. Alternatively,     <lb/>further discovery and a hearing is essential to a fair deter-     <lb break="no"/>mination regarding these issues.</p>     <p>FBI surveillance of Hiss began in earnest in 1941 with     <lb/>the institution of a mail cover on his incoming correspondence     <lb/>at his home in connection with an FBI investigation of possible     <lb/>Hatch Act violations. CN Ex. 98A. Another mail cover was placed     <pb n="114"       facs="./pageImages/AH_0114.jpg"/>          on the Hiss mail in 1945, and at the same time the FBI obtained     <lb/>toll call records from the Hiss residence Telephone for the     <lb/>years 1943 and 1944 as well. CN Ex. 99. In September, 1945,     <lb/>the FBI intercepted telegrams to Hiss as well. CN Ex. 100.</p>     <p>In late November, 1945, FBI surveillance of the Hiss     <lb/>residence in Washington, D.C., escalated. For the third time,     <lb/>a mail cover was instituted beginning on November 28, 1945,     <lb/>which was continued at least until 1946. CN Ex. 101 at p. 70;     <lb/>CN Ex. 102. Continuous physical surveillance of Hiss was begun     <lb/>as well. CN Ex. 101 at p. 72. Although this twenty-four-hour     <lb/>surveillance was discontinued on December 14, 1945, physical     <lb/>surveillance was conducted frequently at various times until     <lb/>September, 1947.<ptr target="#N68">68</ptr>      <note place="bottomanchored="true"       xml:id="N68">       <label>68</label>Also            before 1947, a letter from Priscilla Hiss addressed      <lb/>to her son, Timothy Hobson, was intercepted and its contents      <lb/>read. CN Ex. 100A at p. 167. In approximately March, 1947,      <lb/>a letter from a Michael Greenberg addressed to petitioner re-      <lb break="no"/>garding an application for employment with the United Nations      <lb/>was also intercepted, in a manner not revealed by the docu-      <lb break="no"/>ments. CN Ex. 100B</note> CN Ex. 102; CN Ex. 103.</p>     <p>The most intrusive invasion of petitioner's rights     <pb n="115"       facs="./pageImages/AH_0115.jpg"/>      <lb/>occurred from December 13, 1945 until the Hisses moved from     <lb/>Washington, D.C. to New York City on September 13, 1947. A     <soCalled>technical surveillance</soCalled>, -- a wiretap -- was placed on the Hiss     <lb/>telephone at their residence on P Street-in Washington, D.C.     <lb/>The logs of this surveillance constitute twenty-nine volumes     <lb/>of FBI serials and are roughly 2,500 pages in length, in which     <lb/>an enormous amount of information concerning the Hisses' per-     <lb break="no"/>sonal lives, relationships with friends and associates, and     <lb/>habits is recorded.</p>     <p>The wiretap was installed following FBI Director Hoover's     <lb/>application to the Attorney General for authorization,     <ptr target="#N69">69</ptr>      <note place="bottomanchored="true"       xml:id="N69">       <label>69</label>Hoover's initial request was answered by a note reques-      <lb break="no"/>ting information on Hiss. CN Ex. 104<sic/>. Additional information      <lb/>was furnished by letter dated November 30, 1945. CN Ex. 105<sic/>.</note>      <lb/>although no written authorization appears in the documents released to     <lb/>Hiss. The purpose of the application was to gather information     <lb/>regarding Hiss' alleged contacts with Soviet espionage agents and     <lb/>communists in government service, general allegations which had     <lb/>been made by Elizabeth Bentley and Chambers.</p>     <p>As one would expect, the interception of every telephone</p>    </div1>   </body>  </text> </TEI>

4.2.6. Level 5: Scholarly Encoding Projects

Level 5 texts are those that require substantial human intervention by encoders with subject knowledge. These texts might include encodings of semantic, linguistic, prosodic, or other features well beyond the basic structural elements discussed for Levels 1-4 above. They might also include elements for editorial, critical, or analytical additions; manuscript descriptions; translations; or other textual apparatus.

4.2.6.2. Purpose

To create deeply analytical encoded texts that might be appropriate for specific research purposes, as part of a scholarly publishing project, or for other encoding practices in library-based text encoding.

4.2.6.3. Rationale

A significant number of library-based projects engage in high-level analytical text encoding as part of their efforts in digitization, scholarly editing, academic support, or other research. Level 5 is intended to represent that work, which can take advantage of the full richness of the complete TEI Guidelines (P5), while still acknowledging the impact of library-specific practices on encoded text that is created under the auspices of a library.

4.2.6.4. Element Recommendations and Examples

Because of the vast range of possibilities for Level 5 encoding, these Best Practices have chosen to provide neither a list of recommended elements nor any specific examples for encoding a transcription of a source document at this level. Please refer to the TEI Header section above for recommendations for the <teiHeader>.

Those encoding Western European early modern printed material may find that TEI simplePrint, a TEI customization, provides appropriate guidance for encoding a transcription of the source document. Otherwise, refer to the General Recommendations section above and the Complete TEI P5 Guidelines for element recommendations and usage examples within the <text>.

5. Acknowledgments

This document is the result of a group of individuals with a range of experience with TEI text encoding, which formed together under the TEI Special Interest Group on Libraries and Digital Library Federation umbrellas. We would like to thank and acknowledge all of those who have given their time and expertise to develop these Best Practices.

The individuals who served as editors of this document are:

The individuals who have contributed to the writing of this document are:

The individuals who have contributed to the planning of this document are:

The individuals who have contributed complementary tools to this document are:

The individuals who have contributed to copyediting of this document are:

Lastly, we would like to thank the Digital Library Federation (DLF) for sponsoring two in-person meetings as part of the Spring 2008 Forum in Minneapolis, Minnesota, and the Spring 2009 Forum in Raleigh, North Carolina, in support of our revision work. The DLF also provided teleconferencing support for our regularly scheduled meetings.

6. Appendix: History of This Document

This document was formerly known as TEI Text Encoding in Libraries Guidelines for Best Encoding Practices.

The Text Encoding Initiative Guidelines for Electronic Text Encoding and Interchange (referred to as the TEI Guidelines) were first published in 1994 and represent a tremendous achievement in electronic text standards by providing a highly sophisticated structure for encoding electronic text. Digital librarians have benefited greatly from the standardization provided by these guidelines, and the potential for interoperability and long-term preservation of digital collections facilitated by their wide adoption.

In 1998, the Digital Library Federation (DLF) sponsored the TEI and XML in Digital Libraries Workshop at the Library of Congress to discuss the use of the TEI Guidelines in libraries for electronic text, and to create a set of best practices for librarians implementing them. From this workshop, three working groups were formed, the members of which represented some of the largest and most mature digital library programs in the U.S.

Group 1 was charged to recommend some best practices for TEI header content and to review the relationship between the Text Encoding Initiative header and MARC. To this end, representatives of the University of Virginia Library and the University of Michigan Library gathered in Ann Arbor in early October 1998 to develop a recommended practice guide. This work was assisted by similar efforts that had taken place in the United Kingdom under the auspices of the Oxford Text Archive the previous year. The section on the header is based on a draft of those recommended practices. It was submitted to various constituencies for comment. In 2008 and 2009, it was heavily revised by Melanie Schlosser, Kevin Hawkins, and other members of the TEI SIG on Libraries.

Group 2 was charged with developing a set of recommendations for libraries using the TEI Guidelines in electronic text encoding. This group included the following representatives from six libraries:

At the ALA Midwinter Meeting (January 1999), the DLF task force revised a draft set of best practices, called TEI Text Encoding in Libraries: Guidelines for Best Practices (often referred to as TEI in Libraries Guidelines). The revised recommendations were circulated to the conference working group in May 1999 and presented at the joint annual meeting of the Association of Computers and the Humanities and Association of Literary and Linguistic Computing in June 1999. Version 1.0 was circulated for comments in August 1999. These guidelines were endorsed by the DLF, and have been used by many digital libraries, including those of the task force members, as a model for their own local best practices. Libraries, museums, and end-users have benefitted from a set of best practices for electronic text in a number of ways, including better interoperability between electronic text collections, better documented practices among digital libraries, and a starting point for discussion of best practices with commercial publishers regarding electronic text creation.

Written in 1998, this first iteration of TEI in Libraries Guidelines made no mention of XML, XSLT, or any of the other powerful tools that have now become common parlance and practice in creating digital documents and collections. Based on these important changes in markup technology, it came to the attention of the DLF and members of the original Task Force that the TEI in Libraries Guidelines required substantial revision. In 2002, the TEI Consortium published a new edition of the complete TEI Guidelines that conformed to XML specifications. In order to remain useful, the TEI in Libraries Guidelines had to be updated to reflect these developments.

Furthermore, librarians need more guidance than the original TEI in Libraries Guidelines provided. There are many library-specific encoding issues which need to be addressed and documented to ensure consistency. The intention of this document is to provide recommended paths of encoding for these issues.

In addition, these library guidelines have the potential to be much more useful if they can serve as a training document from which librarians can learn about text encoding and addressing particular encoding challenges. To fulfill this role, the guidelines require more examples and detailed explanations, giving documentation of the use of TEI in a library context. Librarians also need a set of standards and best practices for vendors and publishers who create electronic text for digital libraries, so that these collections adhere to the same archival standards as locally-created electronic text collections. With detailed guidelines that could serve as an encoding specification, librarians might encourage vendors to follow the principles in these standards, to facilitate the long-term preservation of commercially published electronic text collections, and more readily allow for cross-collection searching.

In order to facilitate the evolution of this document, another DLF-sponsored Task Force—some of the representatives of which were on the original Task Force—met on October 24-25, 2003 at the Cosmos Club in Washington, D.C.:

These representatives met to revise the original TEI in Libraries Guidelines in order that they:

After producing Version 2.0 of the Guidelines, this group (with some changes in membership) met again at the Cosmos Club on February 13-14, 2006. Those in attendance were:

The group then released Version 2.1 in March 2006.

In April 2008, select members from the TEI Consortium Libraries Special Interest Group (SIG) and the DLF-sponsored TEI Task Force partnered to update the Best Practices. The revision was prompted by the release of P5, the newest version of the TEI, and the desire to create a true library-centric customization of the TEI. The group convened for a DLF-sponsored meeting at the Spring Forum in Minneapolis, Minnesota to tackle the revision work. Those in attendance were

Work continued through conference calls, in which Renee McBride (University of North Carolina, Chapel Hill) and Richard Wisneski (Case Western University) also participated, and at a DLF-sponsored meeting that took place as part of the DLF Spring Forum in Raleigh, North Carolina on May 6, 2009.

In April 2009, a year after the revision work began, the significantly revamped Best Practices, soon to be known as Best Practices for TEI in Libraries (version 3.0), were disseminated for public comment. At the DLF Spring Forum in Raleigh, a Birds-of-a-Feather session entitled TEI Text Encoding in Libraries was held to gather in-person public feedback. Comments received at the in-person meeting, from the TEILIB-L listserv, through a survey, and by direct email were gathered and prioritized at the DLF meeting. Renee McBride (University of North Carolina, Chapel Hill) agreed to map header elements to MARC elements, and Vitus Tang (Stanford University) provided valuable comments. In addition to addressing most of the comments received, it was resolved that Syd Bauman would generate an ODD document (containing both schema constraints and prose) for levels 1-4, further ensuring interoperability of texts encoded according to these Best Practices.

Version 3.0 contained updated versions of the widely adopted encoding ‘levels’ — from fully automated conversion to content analysis and scholarly encoding. They also contained a substantially revised section on the TEI header, designed to support interoperability between text collections and the use of complementary metadata schemas such as MARC. Furthermore, they explored the relationship between METS and TEI and the relationship between these Best Practices and the new vendor specification, TEI Tite.

The new Best Practices also reflected an organizational shift. Originally authored by the DLF-sponsored TEI Task Force, the new revision was a partnership between members of the Task Force and the TEI SIG on Libraries. As a result of this partnership, responsibility for the Best Practices moved to the SIG, allowing closer work with the TEI Consortium as a whole and a stronger basis for advocating for the needs of libraries in future TEI releases.

At its meeting in November 2015, the SIG decided to undertake a revision to these Best Practices. A workgroup met approximately monthly from early 2016 to 2018 to carry out this work, releasing version 3.1.0a in November 2017 and version 4.0.0 in September 2018.

TEI SIG on Libraries. Date: 2018-09-10