Introduction‎ > ‎


STEMMA stands for “Source Text for Event and Ménage MApping”.


STEMMA was primarily a data model for long-term storage and data exchange although it has since become much more. It therefore strived to hold data entities and their connections rather than any presentation information. There is deliberately no concept of font control, colouring, or date and numeric formatting. These are the responsibility of the software tools utilising a STEMMA Document.


The STEMMA source format is portrayed here using XML (eXtensible Markup Language) but this should be viewed as just one physical representation since the data model itself could be represented in a variety of ways (e.g. JSON). XML is a textual serialisation format designed originally by W3C to represent structured data in a generalised way, mainly to facilitate communication over the Internet. XML is a well-established standard that provides for automatic validation according to schema definition files. It also supports namespaces for versioning of the main schema and for formal extensions to the main schema.


STEMMA is designed to represent generalised micro-history data, including family history, as opposed to merely biological lineage (literally genealogical data). Genealogy is often associated with biological lineage, particularly in the form of family trees, whereas micro-history is concerned with more types of subject entity such as persons, places, animals, and groups all events in the history of those subjects and arbitrary connections between them. Family history is therefore a form of Micro-history. This is a far more ambitious goal and STEMMA strives to find a balance between a strong formal approach and the flexibility to accommodate the unexpected and human contributions (i.e. narrative work, reasoning, and general notes).


STEMMA gives Persons, Places, Animals, Groups, and Events equal billing as top-level entities, thus providing social, geographical, and temporal dimensionality to the data. In addition, there are top-level entities representing narrative work, sources, citations, source analysis, and attachments, which contribute an informational dimension. Each of these entities has a Key by which they can be referenced from other entities, and each supports a strong narrative structure. This narrative is more than just a set of brief comments or plain-text notes since it can arbitrarily reference other entities by their Key. The embedding of references within narrative also allows a STEMMA viewing tool to easily generate hyperlinks to the referenced items, thus supporting drill-down in order to identify supporting evidence and reasoning, or to find the source of the evidence.


Most of these top-level elements are also hierarchical. A hierarchy for Persons or for Animals is expected (i.e. lineage), and some products already represent a Place-hierarchy where each place is connected to a broader parent Place. STEMMA goes further by giving Events a hierarchical structure, thus allowing longer-term Events to embrace shorter-term Events and so give them a more granular structure to help with the representation of history. Furthermore, it allows Groups to be hierarchical, and Citations to be hierarchical such that, say, two documents from the same collection can share common data along a chain.


Other noteworthy structural features of STEMMA include:


  • Ability to represent not just biological lineage, or extended family (as in adoptive parents), but arbitrary connections between people who may not be part of the family at all. See The Lineage Trap.
  • Powerful multi-language narrative feature allowing rich-text to be added to all entities, or used for complete research articles. Narrative subsections can be given attributes such as Surety and Sensitivity, and can distinguish information from inference. It can also embed computer-readable links to other entities, including other narrative, using semantic mark-up.
  • Separate syntax to handle all top-level entity definitions (e.g. Event) independently of references to them (e.g. EventLnk). This allows both the definitions and the references to contribute information in a natural contextual way.
  • Inheritance mechanism allowing entity data to be factored out to increase sharing and reduce duplication.
  • Defined methods for declaring extended vocabulary, properties, and for formally extending the core schema.


Although STEMMA was initially conceived as supporting import/export or long-term storage of data, this quickly became a secondary feature. A result of its deep level of representation meant that no database-orientated product could adequately index it.[1] However, indexing it into memory, on-the-fly, meant that (a) full and efficient indexing was possible, (b) that no import/export was necessary as the definitive source format could be exchanged, and (c) that no special consideration was needed for long-term storage or backup of database content. The article Do Genealogists Really Need a Database? explains how reliance on a conventional database is folly, and introduces performance degradation, risk of corruption, incompatibility between different database vendors or proprietary schemas, and forces the need to invent other representations for import/export, etc. The Introduction section mentions ‘historical-data document format’, meaning that a STEMMA Document can be shared and transmitted just as a word-processor document or spreadsheet might you don’t need a database for viewing those!

[1] A simple but demonstrable example is the transcription of Josh (with a superscript ‘h’), which is shorthand for Joseph but is invariably transcribed as Josh, and so is incorrect.