Presentational Mark-up

HTML was originally created with a rather relaxed syntax describing presentational mark-up, and so was designed for rendering content from Web pages, etc. Through its various versions, there has been an attempt to separate mark-up related to content structure from that related to its presentation. For instance, the <del> tag for representing deleted text is now recommended over the older <s> tag for representing strikethrough text (the even-earlier <strike> tag was discontinued after HTML V4). Similarly, the <em> and <strong> tags for representing emphasis and strong text are now recommended over explicit italic and bold control using the <i> and <b> tags. After HTML V4, and in parallel with HTML V5, an XML variant called XHTML was developed, and its XML foundation meant that its syntax was stricter and its meaning was extensible through namespaces; it also dropped many of the presentational tags and a reliance on Cascading Style Sheets (CSS) was presumed as a way of achieving a more flexible rendering.

 

This is less than adequate for STEMMA: although earlier versions intended to use the XHTML set of tags, it quickly became clear that there are instances where rendering must be explicit, such as for pre-formatted citations, and where prior formatting or emphasis needs to be described accurately, such as in transcriptions. The latter is not simply a presentational matter as the mark-up needs to describe the text as-it-was, which in turn impacts on the semantics and significance of that text. In this respect, STEMMA is trying to describe a document in the same way that TEI (http://www.tei-c.org/) would, except that TEI is too comprehensive and the resulting overlap with STEMMA would cause a clash of approach that does not align with its micro-history goal. For instance, the cite, personography, and placeography tags are inappropriate.

 

The current version of STEMMA therefore uses HTML-like tag names in the context of its XML representation (note the use of all-lowercase tag names here), albeit with some extra attributes, and avoids the direct incorporation of full dialects or namespaces associated with the above. The Narrative Text element will accept the following HTML-like mark-up in order to support a primitive level of formatting and emphasis.

 

Element, and attributes

Description

<br/>

Line-break

<p> narrative-text </p>

Paragraph

[align=’L/R/C’]

Text alignment. Default=L

[indent=’n’]

Indent left margin

[cols=’n’]

Number of columns. Default=1

<cb/>

Column-break, to next column. ignored outside of <p> element.

[align=’L/R/C’]

Text alignment. Default=L

<indent/>

Indent start of current line

 

 

<ol>

Ordered (numbered) list

[start=’n’]

Starting number. Default=1

[type=’chr’]

Numbering type (1, A, a, I, i)

<ul>

Unordered (bullet) list

[type=’chr’]

Marker (disc, circle, square)

<li> narrative-text </li>

List item. Ignored outside of <ol> or <ul> elements.

 

 

<em> narrative-text </em>

Emphasis (usually italic)

<strong> narrative-text </strong>

Strong (usually bold)

 

 

<b> narrative-text </b>

Bold text

[sic=’boolean’]

As-it-was (see below)

<i> narrative-text </i>

Italic text

[sic=’boolean’]

As-it-was (see below)

<u> narrative-text </u>

Underlined text

[sic=’boolean’]

As-it-was (see below)

[style=’value’]

Number of underlines (1, 2, *=heavy)

<s> narrative-text </s>

Strikethrough (deleted) text

[sic=’boolean’]

As-it-was (see below)

[style=’value’]

Number of strikes (1, 2, *=heavy)

 

 

<sup> narrative-text </sup>

Superscript text

<sub> narrative-text </sub>

Subscript text

 

 

<ts> narrative-text </ts>

Typescript (see below)

<ms> narrative-text </ms>

Manuscript (see below)

 

 

<table> body </table>

Defines a table

<tr> row </tr>

Table row. Ignored outside of <table> element.

<th> header-cell </th>

Table header cell. Ignored outside of <table> element.

[colspan=’n’]

No. of columns for header cell

[rowspan=’n’]

No. of rows for header cell

[scope=’term’]

Type of header cell (‘row’/’col’)

<td> data-cell </td>

Table data cell. Ignored outside of <table> element.

[colspan=’n’]

No. of columns for data cell

[rowspan=’n’]

No. of rows for data cell

 

Explicit mark-up for visual attributes in authored narrative, such as colour, font, bold, etc., is not recommended since STEMMA’s semantic mark-up allows software products to select them in a consistent way using some sort of style gallery. Each <Text> element implicitly constitutes a separate paragraph, if there is no explicit indication. Each paragraph has an implicit line-break at the end.

 

The sic attribute (short for Latin: sic erat scriptum, meaning "thus was it written") is used to distinguish between voluntary use of the respective elements (e.g. for citations) and usage describing a transcribed prior document. The default value is governed by whether the enclosing <Text> element has Transcript=’1’ (implying sic=’1’), or otherwise (sic=’0’).

 

The ts/ms elements specify whether transcribed text was in typescript or manuscript form. The default is governed by the Manuscript=’boolean’ attribute on the enclosing <Text> element.

 

For side-by-side text, a paragraph can be notionally divided into independent columns, where there are no dividers and the width is judged by the rendering application. Text alignment can be controlled separately for each column. Text indent is accumulative for nested paragraphs, and columns relate to the available width after any indentation.