DNA sequencing is the process of determining the order of nucleotides (A, T, C, and G) in a DNA molecule. This information is essential for understanding the genetic basis of life, health, and disease. However, not all DNA sequencing methods are the same. Depending on the technology used, the length of the DNA fragments that can be sequenced can vary from a few hundred to tens of thousands of bases. This difference has important implications for the accuracy and completeness of the genomic information that can be obtained. Long-read sequencing allows scientists to sequence DNA fragments that are thousands of nucleotides or more in length. Unlike short-read sequencing, which produces shorter DNA fragments (typically 50–600 base pairs), long-read sequencing captures much more extensive genetic information. Here are some key points about long-read sequencing:
- Longer DNA Fragments: Long-read sequencing generates individual reads from single DNA molecules. These reads can span from 1,000 to 20,000 bases or more.
- Complex Structural Variants: While short-read sequencing is excellent for capturing most genetic variation, long-read sequencing excels at detecting complex structural variants. These variants include large inversions, deletions, or translocations that may be challenging to identify using short reads.
Illumina Complete Long Read Sequencing
One of the prominent long-read sequencing technologies is Illumina Complete Long Read sequencing. Here are its features:
- Read Length: Illumina Complete Long Read technology produces contiguous long-read sequences with an N50 of 5–7 kb, with some reads exceeding 10 kb.
- Workflow Compatibility: It works seamlessly with all NovaSeq systems, allowing access to both long- and short-read data on a single instrument.
- DRAGEN Analysis: Leverages the accuracy and speed of DRAGEN analysis, combined with the ease of BaseSpace Sequence Hub apps.
Short Read Sequencing: The current standard
The most widely used DNA sequencing technology today is short read sequencing (SRS), which generates reads of typically 150 bp in length. SRS is supported by a wide range of platforms, dominated by Illumina’s fleet of instruments, and is valued for its high accuracy and relative cost-effectiveness. SRS has been the gold standard for genetic profiling in the last two decades, and has enabled many breakthroughs in genomics, such as the Human Genome Project, the 1000 Genomes Project, and the Cancer Genome Atlas.
However, SRS has some limitations that prevent it from capturing the full spectrum of human genetic variation. SRS relies on the reconstruction of longer sequence contigs from overlapping short reads, which can be challenging in regions of the genome that are complex, repetitive, or low-complexity. These regions include many protein-coding genes that are clinically relevant, such as those associated with segmental duplications, tandem repeats, or GC- or AT-rich DNA. SRS also has difficulty detecting larger and more complex forms of genetic variation, such as structural variants (SVs), which are events >50 bp in size that involve insertions, deletions, inversions, duplications, or translocations of DNA segments. SVs are estimated to account for more than half of the genetic variation between individuals, and are implicated in many diseases and traits. Moreover, SRS requires PCR amplification of sequencing templates, which can introduce artefacts and bias, and prevents the detection of native base modifications, such as methylation, that can affect gene expression and function.
Long Read Sequencing: The next generation
Long read sequencing (LRS) is a DNA sequencing technology that enables the sequencing of much longer DNA fragments than SRS, ranging from thousands to hundreds of thousands of bases. LRS is also known as third-generation sequencing, and is represented by two main platforms: Oxford Nanopore Technologies (ONT) and Pacific Biosciences (PacBio). LRS offers several advantages over SRS, such as:
- The ability to span complex and repetitive regions of the genome, and to resolve haplotypes (the combination of alleles on each chromosome).
- The ability to detect and characterize SVs with higher sensitivity and specificity, and to discover novel SVs that are missed by SRS.
- The ability to sequence DNA without PCR amplification, and to directly detect base modifications, such as methylation, without chemical conversion.
LRS has been rapidly evolving in recent years, and its accuracy, throughput, and portability have been improving, while its costs have been decreasing. LRS has been applied to various research and clinical settings, such as:
- Genome assembly: LRS can generate high-quality and complete genome assemblies for humans and other organisms, revealing novel genomic features and resolving gaps and errors in existing reference genomes.
- Genetic diagnosis: LRS can identify and characterize causal variants for rare and complex genetic disorders, especially those involving SVs or repeat expansions, such as Huntington’s disease and Fragile X syndrome.
- Cancer genomics: LRS can reveal the complex and dynamic genomic alterations that occur in cancer cells, such as chromosomal rearrangements, copy number variations, and microsatellite instability, and can inform personalized treatment strategies.
- Infectious diseases: LRS can provide rapid and accurate identification and characterization of pathogens, such as bacteria, viruses, and parasites, and can monitor their evolution, transmission, and drug resistance.
- Epigenomics: LRS can detect and quantify base modifications, such as methylation, and can reveal their impact on gene regulation and function in health and disease.
Conclusion
Long read sequencing is a powerful and promising technology that can provide more comprehensive and accurate genomic information than short read sequencing. LRS can reveal the full spectrum of human genetic variation, and can lead to the discovery of novel mechanisms of disease and new therapeutic targets. LRS can also enable point-of-care testing and health care in remote settings, due to its portability and real-time analysis. As LRS becomes more accessible and affordable, it will find more utility and impact in the clinical space, and will fundamentally change how we study and understand the genome.