Integrating Tree Data into Forensic Genetic Genealogy Workflows

In forensic genetic genealogy investigations, whether you're a law enforcement investigator, a forensic scientist, or a genealogy specialist working an unsolved case, you or someone you work with is likely going to search a genetic genealogy database. When that search returns results, you’ll find yourself looking at matches, sometimes dozens or hundreds. Each of these matches represents a possible clue, a potential path that leads to the unknown person at the center of your investigation.

But matches alone don’t solve cases. They are just the beginning. What comes next is the process of building family trees, mapping out how these genetic matches are related to each other and, ultimately, how they might connect to the person you’re trying to identify. And the moment you start building trees, you’ll encounter a file format that underpins nearly all genealogical data exchange: GEDCOM.

The Othram research team is building new ways to visualize and analyze human relationships, in ways that go far beyond traditional family tree diagrams. Our graph-based approach allows us to overlay DNA matches, infer relationships, detect patterns, and streamline casework. You can explore a public preview of this work at maps.othram.com, and we’ll be writing a full post soon about how Othram Maps helps investigators go from raw tree data to actionable intelligence.

From DNA Matches to Trees

When you search a forensic genetic genealogy database, you’re presented with a set of individuals who share DNA with the unknown person you’re investigating. These matches might include estimated relationships, shared centimorgan values, and biogeographical ancestry predictions. But this data doesn’t lead directly to a name. It gives you a set of puzzle pieces that only make sense when placed into a broader genealogical context.

To place those matches in context, you build trees. You identify common ancestors between matches, trace lines of descent, and narrow the focus until you can resolve identity. Sometimes trees are constructed by a single researcher, but more often, tree-building is a collaborative effort shared among investigators, outside contributors, and genealogists. And the format almost universally used to sharing tree data is GEDCOM.

What’s important (and exciting!) to understand is that traditional tree-building is evolving. Graph structures—augmented with DNA matches, time-stamped records, and contextual metadata—will ultimately power AI-based systems capable of resolving identity rapidly and at scale. But even as we look forward, we must continue to support and understand the legacy data infrastructure that makes today’s work possible. GEDCOM is a key part of that infrastructure.

The Origins of GEDCOM

For decades, genealogists have shared family trees using a simple, text-based file format called GEDCOM. It’s a format that predates modern genomics, the internet as we know it, and even most consumer software. And yet, it continues to power much of the data exchange in genealogy and plays a surprisingly important role in modern forensic workflows.

GEDCOM, short for Genealogical Data Communication, is a text-based file format created in the 1980s by FamilySearch. At the time, the goal was to create a standard for exchanging genealogical data across software programs, many of which used proprietary formats. GEDCOM is structured, portable, and designed to be human-readable. Despite its age, it remains the most widely used format for genealogical data exchange.

A GEDCOM file organizes data as a hierarchy of individuals, families, life events, and relationships. Each person is assigned a unique identifier, and family connections are expressed through references to shared family records. Events like birth and death are recorded with dates and locations. Here’s an example:

1 NAME John Ronald Reuel /Tolkien/
1 SEX M
1 BIRT
2 DATE 03 JAN 1892
2 PLAC Bloemfontein, Orange Free State (now South Africa)
1 DEAT
2 DATE 02 SEP 1973
2 PLAC Bournemouth, England
1 FAMS @F1@

This snippet defines a person named John Ronald Reuel Tolkien, his birth and death details, and a link to the family record that includes his relatives.

Using GEDCOM in Forensic Casework

In forensic work, it is common to receive GEDCOM files as part of ongoing investigations. These may come from relatives of missing persons, law enforcement agencies, or genealogists assisting with a case. Some trees were built decades ago; others are assembled quickly in support of real-time forensic searches. They often vary in quality and completeness and may include non-standard tags introduced by commercial software.

Despite their inconsistencies, GEDCOM files contain valuable data that can be translated into something more powerful. At Othram, we parse and convert GEDCOM files into graph structures that allow us to visualize relationships, detect redundancies, and layer in additional evidence. Each person becomes a node. Each relationship becomes a connection. Life events and sources become metadata.

Once in graph form, the tree can be expanded, refined, and analyzed in new ways. We can test hypotheses, overlay DNA match data, detect pedigree collapse, or automate lineage extension. This approach powers Othram Maps, a lightweight GEDCOM visualizer and tree editor purpose-built for forensic workflows. In the near future, this same infrastructure will support more advanced features including y-DNA, mtDNA, and autosomal segment data integration.

Understanding Graphs

A graph is simply a network of connected points. In our case, those points are people, and the connections between them represent relationships. A parent-child relationship forms a directed connection. A spousal relationship becomes an undirected link. Rather than a top-down tree, you get a flexible structure that reflects the real-world complexity of human families.

Graphs are powerful because they allow us to move beyond rigid diagrams and into a computational model of family structure. In a graph, we can traverse relationships in any direction: across siblings, cousins, or generations. We can analyze the network’s shape to detect patterns. We can identify duplicate ancestors when multiple users have added the same individual under slightly different names or birth details. We can even find conflicts, like someone being listed as a parent despite being born decades after their child.

In collaborative workflows, especially when multiple trees are merged or imported from different sources, it’s common to see duplicate ancestors introduced unintentionally. These might be individuals with slightly misspelled names, varied birthplaces, or incomplete source information. A graph structure makes it easier to detect and collapse these duplicates by identifying when two nodes share identical descendants or occupy overlapping positions in the network. Deduplicating these structures is essential for maintaining clarity in the tree and for avoiding logical errors when evaluating DNA match relationships.

Graphs also allow us to model non-linear structures that arise frequently in real genealogical data. Real-world examples include pedigree collapse, where distant cousins share the same ancestors more than once, or unknown parentage, where one half of a family structure is intentionally left incomplete. Graphs make it possible to manage and navigate these scenarios with minimal confusion and maximal computational control.

This structure also enables privacy-aware features. Many forensic reports require redacting the names of living individuals. But not every tree has consistent death dates or living status indicators. In a graph, we can use descendant searches and generational depth to infer who is likely living based on the age of their grandchildren, or who might be misclassified. This helps automate redaction and flag ambiguous cases for further review.

Just as important as what GEDCOM-powered graphs do include is what they do not include. GEDCOM files do not natively store DNA information. There is no built-in format for recording shared centimorgan values, triangulated segment boundaries, or match probabilities. While some platforms have introduced custom extensions—such as _DNA_MATCH or _FTDNA—they are not part of the official GEDCOM specification and may be lost or corrupted when files are transferred between third-party tools.

That’s why it’s essential to pair the structured family relationships of GEDCOM with external tools and metadata layers that represent the full scope of forensic analysis. The Othram research team has built systems to accommodate both.

Why Graphs Matter in Practice

Every GEDCOM file Othram works with becomes a graph. That graph can be visualized, edited, enriched with forensic data, and used as a foundation for deeper analysis. It’s a flexible platform for building and reasoning through investigative genealogical structures.

Graphs enable logic checks, pattern recognition, and automated discovery, all of which are essential for resolving identity quickly and accurately. As we move toward faster, more scalable solutions for forensic genetic genealogy, graph-based representations of GEDCOM data—enhanced with DNA, demographic, and temporal evidence—will be the engine behind AI-assisted resolution of even the most complex cases.

The Othram research team is building tools that make GEDCOM files more usable and extensible. This data source is integrated into a larger system for identity resolution. You don’t have to be an expert in the GEDCOM format—but you need systems that can interpret it, convert it, and extract value from it.

If you’d like to see what your GEDCOM file looks like as a graph and to see a preview of what you can do with graph representations, you can see it for yourself, right now at maps.othram.com.