Long-read Whole Genome Sequencing

Comments · 114 Views

This article describes the use of PacBio HiFi technology for whole-genome sequencing of humans, plants, animals, or microorganisms.

Overview

A large number of genetic variations and somatic mutations exist within human, plant, animal, or microorganism genomes. Genetic variation is associated with disease risk and phenotypic variation between individuals. In recent years, whole-exome sequencing (WES) has been a popular method for identifying the genetic basis of disease in research and clinical settings. However, because the exome is limited to protein-coding regions (or approximately 1.5% to 2% of the genome), it cannot account for variation outside this narrow subset of genetic information. Whole-genome sequencing (WGS) based on traditional short-read sequencing technologies allows comprehensive testing while uncovering most types of clinically relevant genetic variants. Despite its many benefits, due to its inherent technical limitations, short-read sequencing cannot provide unambiguous solutions for certain types of variants, such as structural variants with precise detection boundaries, precise size detection of extended short tandem repeats, potential compound Phasing of mutations, etc.

Long-read sequencing technology, PacBio SMRT sequencing and Nanopore sequencing technology, can perform whole-genome sequencing on different individuals or groups, and perform biological information analysis at the individual or group level. It can comprehensively explore single nucleotide variations (SNVs) and small insertion-deletion mutations (InDels), copy number variations (CNVs), and large structural variations (SVs) in the genome. Long-read whole-genome sequencing is useful for the study of genetic and rare diseases associated with genetic structural variation, for diagnosis, and for the development of targeted therapies and drugs to halt their progression. The technique can also study epigenetic mutations.

Somatic SVs in liver cancers.

Somatic SVs in liver cancers. (Fujimoto et al., 2021)

What is Whole Genome Sequencing (WGS)?

Long-read whole-genome sequencing is the analysis of the entire genome, including intronic and exonic regions of an organism's (whether human, plant, animal, or microbial) DNA, providing highly accurate long reads.

Advantages of PacBio HiFi Whole Genome Sequencing

Accurate long-read and unbiased coverage

HiFi Sequencing delivers whole-genome sequences at complete and accurate resolution with unbiased single-molecule reads up to 25 kb, 99.9% median read accuracy, and the ability to sequence through repeats and GC-rich regions information.

Complete, sequential, correct, and staged genome assemblies

HiFi read length and accuracy can resolve even the most difficult repeats (fragmentary repeats and centromeres), resulting in genomes with longer contigs, fewer fragmented genes, and correct sequences of all regions.

Highest precision and recall for variant calling

Only HiFi sequencing can achieve the highest precision and recall to identify all types of variants, including SNVs, indels, and SVs.

Detection Range of PacBio HiFi Whole Genome Sequencing

With extremely accurate long reads and unbiased coverage, PacBio HiFi Sequencing provides a truly comprehensive view and is currently widely used for whole genome sequencing of humans, plants, animals, or microorganisms.

Human Whole Genome Sequencing

The complexity of the human genome, containing countless genes, introns, exons, and regulatory sequences, requires a sequencing method that captures every detail. PacBio HiFi rises to the challenge to help researchers better understand disease, identify genetic predisposition, and pave the way for personalized medicine.

Plant and Animal Whole Genome Sequencing

Huge genetic diversity exists in plants and animals. From understanding crop genetics to improving yields to studying animal evolution, PacBio HiFi provides continuous, complete, and correct de novo assembly of even the most complex genomes.

Microbial Whole Genome Sequencing

Microorganisms often possess repeat-dense and GC-rich genomes. Traditional sequencing methods would not work here, but PacBio HiFi ensures closed chromosomes and plasmids, providing researchers with unprecedented insights into microbial genetics, behavior, and evolution.

Applications of PacBio HiFi Whole Genome Sequencing

Unparalleled Variant Detection

The fidelity of genetic variant detection fundamentally determines the utility of genomic data. With PacBio HiFi sequencing, one can confidently catalog a plethora of genetic variants, from single nucleotide variants (SNVs) to intricate structural rearrangements.

Historically, variant detection was marred by short-read technologies' limitations, especially when discerning regions with high homology or low complexity. PacBio, however, with its single-molecule long reads, provides an unobstructed view of such intricate genomic landscapes. For instance, a recent study employing PacBio HiFi sequencing was able to detect rare structural variants in a clinically relevant gene, which conventional short-read sequencing had missed.

De Novo Sequencing

De novo sequencing, or sequencing without a reference genome, has always been a challenging endeavor. With the ability to produce HiFi reads up to 25 kilobases in length, PacBio has revolutionized this space. Complex genomes, laden with repetitive regions and intricate structural variants, can now be efficiently and affordably assembled.

A testament to this prowess, a groundbreaking study on a particularly enigmatic plant genome demonstrated that PacBio HiFi sequencing could reduce assembly ambiguities by over 90% compared to traditional methods. This allows researchers to reconstruct genomes with unparalleled accuracy, ensuring that even the minutest genetic nuances are captured.

Decoding Genome Structure with Precision

Understanding the genome's structural intricacies is essential for various genomic endeavors, from evolutionary biology to disease research. PacBio HiFi sequencing, with its remarkable ability to generate long reads, can traverse even the most repetitive and complex regions of the genome.

In one exemplary study, researchers leveraged this technology to elucidate the genome structure of an ancient plant species. Where previous methods provided fragmented insights, PacBio offered a contiguous and holistic view of the genome. Such granular insights pave the way for a deeper understanding of genome evolution and functional genomics.

Epigenetics

The realm of epigenetics concerns itself with heritable changes that don't alter the DNA sequence but significantly impact gene expression. Remarkably, PacBio HiFi sequencing is not just limited to reading the base sequence. It is adept at capturing epigenetic modifications during a routine sequencing run.

Without necessitating specialized chemistry or sample preparation, researchers can now simultaneously glean information about DNA methylation patterns alongside the genetic sequence. A case in point is a recent study on a human cell line where PacBio's technology was pivotal in identifying critical epigenetic modifications associated with disease progression.

References

  1. Fujimoto, Akihiro, et al. "Whole-genome sequencing with long reads reveals complex structure and origin of structural variation in human genetic variations and somatic mutations in cancer." Genome medicine. 13.1 (2021): 65.
  2. Kaplun, Ludmila, et al. "ONT long-read WGS for variant discovery and orthogonal confirmation of short read WGS derived genetic variants in clinical genetic testing." Frontiers in Genetics. 14 (2023): 1145285.