Q scores are a key metric of data quality in DNA sequencing. Introduced in the early 1990s alongside the advent of automated Sanger sequencing, Q scores provide a quantitative way to measure the confidence in sequenced DNA bases. While these scores remain important in certain genomic applications, their relevance in forensic genetic genealogy is more nuanced. Let’s dig into what Q scores represent and what they mean for forensic genetic genealogy.
A Brief History of Q Scores: Phred, PHRAP, and Consed
The concept of Q scores originated with the development of Phred, a program designed to interpret the fluorescence intensity plots generated by early automated sequencers that used the Sanger sequencing method. Before Phred, scientists relied on manual interpretation of radioactively labeled bands on gels—a method that was both time-consuming and prone to error.
Phred introduced the idea of base quality scores, now known as Q scores, to quantify the accuracy of each base call. These scores are logarithmic, meaning that each increment represents an order of magnitude improvement in confidence. The introduction of Q scores was a game-changer, allowing for more reliable sequencing and facilitating the rise of large-scale genomic projects.
For high-quality Sanger sequencing data, the typical accuracy is around 99.4%, which corresponds to Q scores between Q20 and Q30. This means that in a high-quality Sanger read, there is a 1 in 250 to 1 in 1,000 chance of an incorrect call.
Phred was soon complemented by PHRAP, a program that assembled these high-quality reads into contigs (continuous sequences) by aligning overlapping sequences. To visualize and edit these assemblies, scientists used a tool called Consed, which provided a graphical interface for reviewing sequence data and adjusting aligned contigs based on quality scores. Together, Phred, PHRAP, and Consed formed a powerful toolkit that made it possible to scale genome sequencing projects with Sanger sequencing technology.
Q Scores Today
Fast forward to today, and Sanger sequencing has largely been displaced by newer sequencing methods, most notably Illumina’s Sequencing by Synthesis (SBS) technology. These newer methods, often referred to as Massively Parallel Sequencing (MPS), have enabled fast and cost-effective analysis of entire genomes.
Although sequencing technology has changed over the last several decades, the Q score framework introduced during the Sanger sequencing era continues to be used by Illumina and others in the market. Illumina, the current market leader in MPS, advertises sequencing capabilities in which more than 85% of the bases achieve a Q30 or higher score. Remember, Q30 indicates a 99.9% confidence in a base call. Now, newer chemistries and platforms are promising even higher Q scores, but does it matter?
There is a place for higher quality scores like Q40 and Q50. For example, in fields like oncology and rare disease research, where detecting low-frequency variants is critical, the ability to trust base calls is paramount, and investigators seek the highest scores possible for maximum confidence. However, in forensic genetic genealogy, where the focus is on common variants, Q40+ scores have less impact.
Sequencing Depth is More Important in Forensics
Forensic genetic genealogy relies on common Single Nucleotide Polymorphisms (SNPs) to build SNP profiles that can be matched against public or private databases. Historically, these databases were populated by SNP profiles developed using microarray technology, specifically based on the Illumina Global Screening Array (GSA). The value of pushing Q scores beyond Q30 in pursuit of measuring common variants, like those found on the GSA microarray is negligible compared to improving other aspects of the sequencing process, such as sequencing depth.
Sequencing depth refers to the number of times a particular base is read during the sequencing process. Particularly for forensic genetic genealogy, achieving sufficient sequencing depth is far more critical than maximizing base quality scores. This is especially true when it comes to accurately detecting heterozygous variants—positions in the genome where the two alleles differ. Higher sequencing depth ensures that both alleles are sufficiently represented in the data, reducing the risk of missing a heterozygous call. Accurate detection of these heterozygous variants is crucial because when SNP profiles are searched in genetic genealogy databases, they are often compared using Identity-by-Descent (IBD) segment matching algorithms. IBD segments are stretches of DNA shared between individuals that indicate common ancestry, and, as it turns out, these segments are generally terminated by homozygous calls. If heterozygous variants are not accurately detected, the IBD segments may be inaccurately defined, leading to incorrect conclusions about genetic relationships.
DNA sequencing serves various applications, each with specific metrics to evaluate its performance. When selecting a sequencing method and its associated metrics, it’s essential to begin with the end goal in mind: What are we trying to achieve? In forensic genetic genealogy, the objective is clear—we need to create ultra-sensitive profiles optimized for detecting distant genetic relationships. If you aren't aiming to detect all relatives, you are doing it wrong. Inadequate, incomplete, or inaccurate DNA profiles can severely compromise the effectiveness of downstream genetic genealogy, making it difficult or even impossible to resolve complex cases.
If you are not ready to onboard this new technology in your own forensic setting yet, come to Othram. Our team operates the world's first purpose-built forensic laboratory for forensic genetic genealogy. We developed Forensic-Grade Genome Sequencing® or FGGS® to enable ultra-sensitive detection of distant relationships. It's part of our Multi Dimensional Forensic Intelligence (MDFI) platform.
More forensic genetic genealogy cases have been solved with Othram FGGS® than any other method. Let’s work together to unlock answers and bring justice to those who need it most. Get started here.