Descriptive Mark-up

This section includes those mark-up elements related to structure and content rather than to semantics. While the structure obviously impacts presentation, STEMMA is not a presentation format and so many aspects are left to the control of such formats (e.g. HTML+CSS) when data has been transformed for presentation.


HTML was originally created with a rather relaxed syntax describing presentational mark-up, and was designed for rendering content from Web pages, etc. Through its various versions, there has been an attempt to separate mark-up related to structure and content from that related to its presentation. For instance, the <del> tag for representing deleted text is now recommended over the older <s> tag for representing strikethrough text (the even-earlier <strike> tag was discontinued after HTML V4). Similarly, the <em> and <strong> tags for representing emphasis and strong text are now recommended over explicit italic and bold control using the <i> and <b> tags. After HTML V4, and in parallel with HTML V5, an XML variant called XHTML was developed, and its XML foundation meant that its syntax was stricter and its meaning was extensible through namespaces; it also dropped many of the presentational tags and a reliance on Cascading Style Sheets (CSS) was presumed as a way of achieving a more flexible rendering.


This is less than adequate for STEMMA: although its earlier versions intended to use the XHTML set of tags, it quickly became clear that there are instances where rendering must be explicit, such as for pre-formatted citations, and where prior formatting or emphasis needs to be described accurately in transcriptions. The latter is not simply a presentational matter as the mark-up needs to describe the text as-it-was, which in turn impacts on the semantics and significance of that text. In this respect, STEMMA is trying to describe a document in the same way that TEI ( would, except that TEI is too comprehensive and the resulting overlap with STEMMA would cause a clash of approach that does not align with its micro-history goal. For instance, the cite, personography, and placeography tags are inappropriate in STEMMA. A gentle introduction to the way TEI handles transcription may be found at: Transcription Guidelines.


The current version of STEMMA therefore uses HTML-like tag names in the context of its XML representation (note the use of all-lowercase tag names here), albeit with some extra attributes, and avoids the direct incorporation of full dialects or namespaces associated with the above. The <Text> element will accept the following HTML-like mark-up in order to support structure and content type, including a primitive level of formatting and emphasis.


Element, and attributes


<b> narrative-text </b>

Bold text


As-it-was (see below)




Start of next, or specific, column.


Text alignment. Default=L


Column index (1:n), must be in range of <page> setting.

[x=’x%’, y=’y%’]

Position of item relative to top-left of image

<colgroup> table-columns </colgroup>

Defines the columns of a table using <tcol>


Table width as percentage of available width. Default 100%

<em> narrative-text </em>

Emphasis (usually italic in text)

<i> narrative-text </i>

Italic text


As-it-was (see below)


Indent start of current line

<li> narrative-text </li>

List item. Ignored outside of <ol> or <ul> elements.


Start of a line in a transcription


Line number


Position of displayed line numbers

[x=’x%’, y=’y%’]

Position of item relative to top-left of image

<ms> narrative-text </ms>

Transcription or extract from manuscript text


Distinguishes different contributors


Distinguishes different styles

<ol> list-items </ol>

Ordered (numbered) list


Starting number. Default=1


Numbering type (1, A, a, I, i)

<p> narrative-text </p>



Text alignment. Default=L


Indent left margin


Paragraph  count (1:n) in column

[x=’x%’, y=’y%’]

Position of item relative to top-left of image


Start of a page in a transcription


Number of columns. Default=1


Page identification. May be non-numeric


Position of any displayed page identification


Generic synchronisation point within image, rather than for a specific element


Position of location relative to top-left of image

<s> narrative-text </s>

Strikethrough (deleted) text


As-it-was (see below)


Number of strikes (1, 2, *=heavy)

<strong> narrative-text </strong>

Strong (usually bold in text)

<sub> narrative-text </sub>

Subscript text

<sup> narrative-text </sup>

Superscript text

<table> body </table>

Defines a table


Defines a table column. Ignored outside of <colgroup>.


Text alignment. Default=L


Column width as percentage of table width.

<td> data-cell </td>

Table data cell. Ignored outside of <table> element.


No. of columns for data cell


No. of rows for data cell

<th> header-cell </th>

Table header cell. Ignored outside of <table> element.


No. of columns for header cell


No. of rows for header cell


Type of header cell (‘row’/’col’)


Synchronisation point in transcript of a recording


Number of hours, minutes, and seconds since start of recording

<tr> row </tr>

Table row. Ignored outside of <table> element.

<ts> narrative-text </ts>

Transcription or extract from typescript text


Distinguishes different contributors


Distinguishes different styles

<u> narrative-text </u>

Underlined text


As-it-was (see below)


Number of underlines (1, 2, *=heavy)

<ul> list-items </ul>

Unordered (bullet) list


Marker (‘disc’, ‘circle’, ‘square’)

<voice> narrative-text </voice>

Transcription or extract from voice


Distinguishes different contributors


Distinguishes different tones


Overlapping nested contributions, each transcribed. ‘id’ and ‘scheme’ not applicable with this attribute


Untranscribed contribution (identified by ‘id’ attribute) is background to nested contributions. Exclusive of ‘overlap’.


General Formatting

Explicit mark-up for visual rendering in authored narrative, such as colour and font, is not recommended since STEMMA’s descriptive and semantic mark-up allows software products to select them in a consistent way using some sort of style gallery.


The <p> and <br> elements behave as per their HTML equivalents in authored work, including the fact that <br> is not recommended to simulate paragraphs. The <b>, <i>, <u>, and <s> elements provide some primitive visual attributes similar to older HTML definitions, but they all have more functional uses within transcripts and extracts (i.e. when sic=’1’). For instance, <u> and <s> can distinguish single, double, or heavier use of lines; also, <s> indicates deleted text rather than specifically the use of strikethrough. Although these could have been done using the ‘scheme’ attributes (see below), they were allowed in transcription since their presence for other purposes would undoubtedly mean they would get applied in textual transcription too.


The sic attribute (short for Latin: sic erat scriptum, meaning "thus was it written") is used to distinguish between voluntary use of the respective elements (e.g. for citation formatting) and usage describing a transcribed prior document. The default value is determined by an enclosing <ts> and <ms> (implying sic=’1’), or otherwise (sic=’0’).


Because of their vague interpretation, the <em> and <strong> elements are designed for formatting in authored work rather than for representing transcribed material.


Text Structure

For authored work, a narrative article basically uses section headings, and paragraphs within sections. In acknowledgement of HTML, <p> elements nested within the same stream (i.e. excluding out-of-line text, such as notes) are ignored.


For textual transcription (including transcribed extracts), the structure is: pages, columns, paragraphs, and lines. Each of these may be related to specific positions in an image of the original material using SVG-like x and y coordinates. These specify percentage displacements from the top-left corner of the last image identified through a ResourceRef element with Mode=’SynchImage’. The fact that <page> also allows this means that a single image may show multiple pages.


Two main approaches to transcription are supported: line-based, which relies on <br> and <line> elements (possibly within <p>), and paragraph-based, which relies on flowed text within <p>.


The <page> element marks the head of a new page. The id is the actual page identification, if any, e.g. id=’122’. It is not incremented automatically as it may be non-numeric. The optional case is for unidentified pages. Line numbers are not reset on a new page.


<col> and <p> numbers run sequentially, starting at 1, unless explicit on these attributes.


The <line> element sets a new line count within a transcription.  These usually count non-blank lines from the start of the outermost <Text> element but this can be changed, e.g. to record lines in a specific page, column, or paragraph. Line numbers are automatically incremented but this element may be used periodically to keep the count in step. If the document being transcribed already identifies line numbers, and the mark-up is mirroring them, then the posn attribute identifies whether they were displayed in the left or right margin. This setting carries though until the next posn attribute or the end of the <Text> element; note that alternate pages usually switch from left to right margins in practice.


The units for paragraph indentation (see <p>) and line indentation (see <indent>) are based on some externally-configurable unit, such as the width of four spaces.


Transcription source

The <ts>, <ms>, and <voice> elements distinguish between transcriptions of typescript, manuscript, and audio data. Their ‘id’ attribute distinguishes different contributors, and can therefore mark different (written) hands or different voices. Their ‘scheme’ attribute further distinguishes different styles, and can therefore mark different colours, different fonts, or different tones. The specific nature of these schemes is less important than their identification, and their interpretation must be part of the analytical process. STEMMA does not mandate how these two attributes are used, but it is recommended that they be described in commentary accompanying a transcription.


The use of the ‘id’ and ‘scheme’ attributes effectively separates structure and content from presentational or stylistic issues in a transcription, analogous to the similar goals for formatted text in HTML5.


NB: these three elements are automatically ‘off’ in any new <Text> element; this means that they’re initially ‘off’ within nested <Text> elements, but the previous status is un-stacked at the end of the <Text> element. As with <b>, <i>, etc., the settings for nested cases of each element are merged, and unmerged as each inner element is closed.


Audio Transcription

Support for audio recordings may be divided into the following broad areas:


  • Specific audio contributions (such as the voice of an individual) – <voice>
  • Anomalous contributions from an individual that cannot be represented as text, including noises, pauses, and gestures – <Anom>
  • Alternative word meanings, clarifications, or other notes – <Alt> and <NoteRef>, exactly as with textual transcription
  • Time synchronisation <time>. This is analogous to <posn>, and other x/y coordinates, used in textual transcription.


The ‘scheme’ attribute allows intonation or emotional changes to be marked in the transcript, e.g. fast/slow, loud/soft/whisper, laughing, singing, false accent, imitation.


The ‘overlap’ and ‘bg’ attributes provide two different ways of representing overlapping audio contributions. The ‘overlap’ attribute’ defines a container <voice> element for multiple transcribed contributions. The ‘bg’ attribute identifies an untranscribed background contribution to its nested transcribed contributions.


See the example at Dialogue Transcription for a more detailed illustration.



The <table>, <colgroup>, <tcol>, <tr>, <th>, and <td> elements are very similar to their HTML equivalents (<tcol> analogous to HTML <col>). They attempt to focus on the structure and content rather than the presentation something that’s not relevant until the STEMMA representation has been transformed into a particular visual representation, such as HTML+CSS, Word, or PDF.


Enclosed <Text> segments with Class=’Caption’ are used as the table caption. Ones with Class=’Tablenote’ cause tablenotes to be deposited at the foot of the table.



The <ul>, <ol>, and <li> are very similar to their HTML equivalents