Version 1.2. ()

Wiki · Import

LERA natively supports plain text (TXT) and Extensible Markup Language (XML) files for the import of text documents (witnesses) according to a defined scheme described in this article.

# In General

LERA supports Unicode according to UTF-8. Other encodings such asISO-8859-1 or UTF-16le are automatically converted to UTF-8 when the files are uploaded. The focus of LERA is on comparing the content of text versions (witnesses), accordingly, only a few structural and special elements are processed and displayed. The native XML format follows the guidelines of the Text Encoding Initiative (TEI), whereby a certain part being processed and represented by LERA. These elements and attributes are described below. Additional elements and attributes in the XML files do not interfere, but are not treated either.

# Meta Data

Some meta data for the text witness is recognized by LERA on upload when encoded in a specific way to a XML file (i.e. within <TEI> … <teiHeader> … <sourceDesc> … <bibl> …). You can specify the following data:

# Structural elements

Line Breaks

Column Breaks


Page breaks

# Special elements

Page Numbers

Text Alterations

This includes various interventions by the author himself or by the editors, for example to correct errors or to resolve abbreviations. In general, a value original is used for representation, while in text comparison the value modified is used. Currently three types, errata (sic), abbreviations (abbr) and regularization (orig), are provided:
Errata Abbreviations Regularization

Placeholder for figures such as sketches or diagrams

Notes and Marginals

Proper Names and Referencing Strings


Editorial Markers

Other Special elements can be indicated with <metamark> to render them differently or ignore them on text comparison. Examples might be lacunas, line fillers, special symbols or text decoration.


Anchors can be used to control the alignment while comparing text witnesses.
To turn on prioritized alignment of segments with matching anchors, set the check box prioritize anchors (<anchor>) (available for the algorithm which aligns based on the similarity of segments).


# Text Emphasis



Small Caps

(convert lowercase letters to uppercase displayed in a smaller font size)

Spaced Letters