Introduction‎ > ‎Document Structure‎ > ‎


A Citation entity identifies the source of some information mentioned in the Dataset. Examples might include books, newspaper articles, BMD certificates, census data, tax records, court records, film, tombstones, military service records, journey manifests, cemetery records, oral history, church records, pension records, land or property transfers, etc.


These are common historical sources, and there are accepted printed citation formats applicable to each of them, but this Citation entity goes further; it can also identify a collection of works, a repository or institution, or even represent attribution to an individual.


In the citations of normal written or printed works, there are two main citation modes that may be employed within text: document labels and source labels, being applicable to documents and images respectively. Citations may involve reference notes linked to inline superscript indicators in the main text. Alternatively, they may involve a source list or bibliography at the end of the work. Parenthetical in-text citations such as “Smith (2004, p. 39) claims that...”, or “...(Smith 2004, p.39)...” if all details are parenthesised are commonly associated with published sources in academic work and are less appropriate for genealogical or historical citations. This is because they do not accommodate the source provenance or analytical notes that are frequently required.


There are citation conventions that apply to different source types and scenarios in order to present some consistency, and these have specifications for their layout, quotation marks, punctuation, and use of italics. Several citation styles are in common use. For instance, in the humanities there are: Modern Language Association (MLA), Harvard referencing, Modern Humanities Research Association (MHRA), and the Chicago Manual of Style (CMOS). There are other styles commonly used in law or the sciences too.


The Board for Certification of Genealogists (BCG) recommends the use of both CMOS and EE for family history. EE is a style devised by Elizabeth Shown Mills in Evidence Explained: Citing History Sources from Artifacts to Cyberspace (Baltimore: Genealogical Publishing Co., 2009) to cover the wider range of historical and unpublished sources used in family history.


It should be understood, though, that all these citation styles and modes relate to the final-form written or printed citations. Their application is therefore relevant to a specific end-user rather than to computer storage. Since those final-form citations are designed to be humanly-readable, they also embody elements of a specific locale, culture, and preferred style. This is a problem for electronic documents as they are not computer-readable, and so cannot be adjusted to suit the locale or preferences of an arbitrary end-user. It is therefore necessary to go back to the essence of a citation rather than consider specific physical implementations i.e. to provide sufficient information through a digested citation to uniquely identify a source, its characteristics, and any analytical assessment. These citation-elements — implemented through STEMMA’s Parameter mechanism — should be sufficient to support the formatting appropriate for any given end-user.


Although it is possible to generate citations of different style, and for different locales, from the discrete citation-element values, there are many complications in the real world. A citation sentence may contain different layers describing the provenance of the source and its information, or it may contain analytical notes. A reference note may contain multiple citation sentences a tour of these scenarios was covered in Cite Seeing. Subsequent references to the same source would typically use a shortened form of the associated reference note (see ‘WhereIn’ attribute), or the author may have employed an explicit hereinafter-cited-as term, or the Latin abbreviation Ibid. A footnote may have woven two source references into the same piece of text. Certain parts of a citation may not have been available (e.g. an undated document), or were erroneous, and so the citation would need to override any simple template-like formatting. In effect, authors of narrative work are loath to delegate generation of their citations to a piece of software working blindly from a set of data values. It is therefore necessary to support hand-crafted forms, and change the focus of citation-elements to that of correlation and interrogation rather than formatting.


The scheme presented here is a generalised computer-readable one that would cope with all possible source types and scenarios. It does not strive to enumerate all possible source types, or specify what elements they require, or mandate a particular presentation style; the main goals of this scheme are to keep it open-ended so that source types can be defined freely, to parameterise the scheme so that it can interface to external citation-templates, and to give it a hierarchical structure for representing different layers of a citation (e.g. for provenance or location).




<Citation Key=’key’ [Abstract=’boolean’]>

[ <Title> citation-title </Title> ]

[ <URI> source-type-uri </URI> ]

[ <Params>

{ PARAM_DEF... } | { PARAM_VALUE ... }

</Params> ]

[ <DisplayFormat [Mode=’citation-format-mode’]>


</DisplayFormat> ] ...


[ <BaseCitationLnk Key=’key’>

[ TEXT_SEG ] ...

</BaseCitationLnk> ]

[ TEXT_SEG ] ...






<ParentCitationLnk Key=’key’ [Type=’layer-type’]>



[ TEXT_SEG ] ...






<Param Name=’name’ [Type=’type’]  [SemType=’sem-type’]

[ItemList=’boolean’] [Optional=’boolean’] [WhereIn=’boolean’]>







{ <Param Name=’name’  [Key=’key’] [Subst=’substitution’]>


</Param> }


{ <Param Name=’name’ [Subst=’substitution’]>

{ <Item [Key=’key’]> value </Item> } ...

</Param> }



The parameterisation is available in the citation-title, the format-strings, narrative elements, and the values of Parameters themselves (i.e. within a Params element).


Note that Parameter names are local to the corresponding source-type. There is no sharing of Parameter names between different source-types, and no implied semantics in any of their names. If two source-types each have a Parameter called ‘Publisher’ then they are each interpreted in the context of their respective source-types. In effect, no semantics are conveyed directly by the Parameter name that is the purpose of the optional SemType attribute.


The valid Parameter data-types are documented at: Data Types. The same ItemList approach to lists is taken as for Property values. The semantic type is indicated by the SemType attribute which may use the Dublin Core vocabulary, e.g. SemType=’DC:title’ or SemType=’DC:publisher’. The default value for the Optional attribute is 0 (i.e. false) which means that a non-blank value must be provided. The ‘WhereIn’ attribute flags parameters that identify a location or entry within a source, as opposed to the source itself, its provenance, or its location. The ‘Subst’ attribute allows the formatting of a value to be overridden, and is especially useful for unknown values. For instance, an undated document might be represented with a date Parameter having a value of “?” but a substitution of “n.d.” or “[1832]”.


The <BaseCitationLnk> element may nominate an Abstract Citation from which data may be inherited by the current Citation, in much the same vein as base classes and derived classes in software programming. An Abstract Citation must define no embedded Keys, can only reference other abstract entities, and must contain Parameter definitions rather than Parameter settings. Any application of Parameter substitution must therefore occur after the inheritance process has completed. If an implementation creates a temporary conglomerate entity in memory by doing a physical merge then it must not be persisted back to the data file, otherwise it constitutes a data corruption. See Inheritance and Parameters for more information.


It is important retain a clear view of the distinction between a Citation and a Resource. As an example, consider UK BMD references. These might be linked to the defining body, say with something like, in order to create a unique source citation. However, if you wanted to be able to pull up the appropriate census page from some Web site then that would be done via a corresponding Resource entity.


Some related articles may be found at: Cite Seeing and Citations for Online Trees.


Semantic Typing


The simple Dublin Core (see Dublin Core Metadata Initiative) terms cannot clearly distinguish, say, the title of an article from the title of a journal containing that article, or provide a clear indication of other data related to the containing journal such as publication date as distinct from the article submission date, or the volume and issue numbers. That same page recommends the use of the OpenURL (ANSI/NISO standard, Z39.88-2004) ContextObject for representing the context of a bibliographic citation, although it does not take this to the level of a hierarchical chain. The OpenURL concept is designed to provide the context of a citation in a machine-readable form that can be resolved by an unspecified library or archive. In other words, the Dublin core recommendation doesn’t cite a source directly but as a library-independent hyperlink to content. At best, it constitutes a reference to an indefinite source.


The SemType attribute associates such semantic information with the individual citation-elements (i.e. Parameters) but leaves the Parameter names to be chosen independently to suit the source-type. Other semantic types could be applied using the same attribute, but with a different namespace.


The STEMMA scheme described here is fully in keeping with those Dublin Core recommendations but is not specifically tied to it. It allows each type of source to be represented by a source-type-uri. Parameters can be applied to build up a citation description for a specific instance of that source-type. The source-type-uri also acts as a global key for retrieving localised text for soliciting Parameter values, data-types for validating the Parameter values, and for interfacing to a citation-template system in order to generate a formatted string for the user. If omitted then an effective one must be available through inheritance.


Citation Chain


Citations may be linked to describe the provenance of a source, the provenance of the information itself, where the originals are held, and any analytical comments. These are known as citation layers and the associated chain forms part of a hierarchy created through the use of the <ParentCitationLnk> element.


Note that STEMMA Citation chains do not differentiate between citing a specific source of information, citing a collection or work that the information was contained within, or citing a repository or institution hosting that work or collection they are all citing something in the more literal sense. They do not mandate the juxtaposition of definite and indefinite sources,[1] or the ordering of original and derivative references (see below). Supporting citation layers avoids duplication and provides a stronger representation overall.


The Dublin Core Metadata Initiative has encountered the issue of a chain but has tried to solve it by adding additional terms and namespaces (see dc-citation-guidelines/).


The links between the layers may be characterised using the Type=’layer-type’ as follows. Note that this doesn’t describe the layers themselves — which should be obvious from their content — but rather the relationship between the layers.





A brief summary or a précis of --


Information cited by the source. Source-of-the-Source.


Analytical comments.


Consulted through derivative, usually online or in database


Extracted portion from --


Consulted through general image copy


Media conversion from --


Other provenance information, differing from ‘Citing’.

Repository (default)

Location of original source.


Revised, abridged, or otherwise modified from --


Transcribed details from --


Translated details from --


These should cope with instances of image derivatives where the emphasis is placed on the image or the original document. This choice is covered in detail by Elizabeth Shown Mills at “QuickLesson 19: Layered Citations Work Like Layered Clothing”, Evidence Explained: Historical Analysis, Citation & Source Usage ( : posted 4 Sep 2014, updated 5 Mar 2016, accessed 4 Apr 2017), under “Online Records at State-Agency Sites”.


Display Format


STEMMA allowed preferred hand-crafted citations to be specified in the <CitationRef> element (see CITATION_REF). This is particularly useful when there are shortened forms (e.g. employing ‘hereinafter cited as ...’) which cannot be generated directly from the Parameters representing the citation-elements. Individual Parameters may also override their default formatting, say for substituted text in the absence of a value, or for abbreviated list formatting. Taking the onus off formatting allows the Parameter settings to be used moer for correlation and interrogation.


Citation entities will require formatting to a given style and locale before they can be displayed. A later version may allow styles to be automatically selected from Citation Style Language (CSL) templates CSL is an open XML-based language for defining the parameters and formatting for different citation types. Such styles can be browsed and searched via the Zotero Style Repository, although it currently has no concept of a URI string which is unfortunate because it would be a convenient handle to distinguish the templates and applicable source-types in the repository. A problem with such citation-template schemes is that they try to format plain textual elements into a simple template, whereas STEMMA assumes that objects (in the OOP sense) representing, say, a Person, Place, or Contact can be provided. The advantage of this scheme is that the template system can call-back on well-defined methods to obtain a particular style of name, or specific contact details; otherwise the genealogical software product is assumed to have intimate knowledge of the specific template.


In the absence of any external formatting support for citations, or any explicit hand-crafted citations, the <DisplayFormat> element can also be used as a simple STEMMA-defined citation-template. It allows a number of language-specific text strings to be defined for different formatting modes (e.g. full reference note the default), and these can make use of mark-up and parameterisation to employ them in multiple scenarios. Although some brief examples are presented below, a fuller example may be found at: Citation Template. NB: this template feature is purely declarative and currently contains no decisional control over the generation of the citation text.




Here’s a simple example of a traditional book citation:


<Citation Key=’cOldNottm’>

<Title>Old Nottingham Notes</Title>

<URI> http://stemma </URI>


<Param Name=’Author’>James Granger</Param>

<Param Name=’Title’>OLD NOTTINGHAM : Its Streets, People, etc</Param>

<Param Name=’Publisher’>Nottingham Daily Express Office</Param>

<Param Name=’Date’>1904</Param>

<Param Name=’Pages’/>



Reprinted from the Nottingham Daily Express, October 3rd, 1903 – July 9th, 1904.




A corresponding citation invocation, for a specific page, might appear as:


<CitationRef Key=’cOldNottm’>

<Param Name=’Pages’>46-48</Param>



Whether this generates a long or short reference note depends on whether the same source is referenced earlier in the current <Narrative> element.


Citations can become very complex since the author will not only want to cite the source, and the information obtained from that source, but the context of how it substantiates or contradicts their assertions and conclusions. This often involves some type of analytical commentary in the citation. For instance:


Death notices, Ulster Gazette and Daily National Intelligencer, both dated 24 January 1815. Corra Bacon-Foster, "The Story of Kalorama," Records of the Columbia Historical Society (1910), 108, states Louisa left four children; three have been identified. In 1810, Charles "Cating" and a female, both over 44, were enumerated with one male and female aged 26-44; one male and female aged 16-25; and one male under 10 - suggesting that George, Louisa, and their first son may have been living in the Catton household. See 1810 U.S. census, Ulster County, New York, New Paltz, p. 116, line 6; NA micropublication M252, roll 37.


Each reference note may contain multiple “citation sentences” (separated by periods), and each of these may contain multiple layers (separated by semicolons). See Cite Seeing for a deeper discussion.


[1] Note that academic citations, such as those in journals, often refer to an indefinite source. This allows them to be much briefer but it only works because such sources are published and easily accessible; it makes no difference where the article or paper was obtained from.