Long-read sequencing, also called third-generation sequencing, is a DNA sequencing technique currently being researched which can determine the nucleotide sequence of long sequences of DNA between 10,000 and 100,000 base pairs at a time. This removes the need to cut up and then amplify DNA which is normally required in other DNA sequencing techniques.
Image Credits: Gio.tto / Shutterstock.com
History of DNA sequencing
One of the most basic forms of DNA sequencing is Sanger sequencing. This method can sequence relatively small fragments of DNA of up to about 900 base pairs. Fragments of DNA are replicated many times, all of varying lengths and all with a fluorescent tag on one end. These tagged fragments can be mapped out to determine the exact sequence of the original DNA.
The more modern forms of DNA sequencing are called next-generation sequencing. These techniques are faster, cheaper and can much more efficiently determine long DNA sequences compared to Sanger sequencing. This is achieved through high-throughput analysis of many different DNA fragments at once.
These DNA fragments tend to range from 50-700 base pairs in length, but the techniques used can determine DNA sequences made up of millions of base pairs.
Long-read sequencing, sometimes also called third-generation sequencing, is a very recent DNA sequencing technique that can read the DNA sequence of much longer DNA fragments at a time. These normally range from between 10,000 and 100,000 base pairs but have been shown to be able to read even 1-2 million base pairs at a time.
How does long-read sequencing work?
Long-read sequencing has been described as solving a jigsaw puzzle with large pieces. The DNA fragments produced in this technique are easier to assemble into a complete DNA sequence than in other sequencing techniques.
There are two main technologies within scientific research which utilize long-read sequencing: Oxford Nanopore sequencing, and PacBio single-molecule real-time (SMRT) sequencing. These techniques implement different methodologies, but are both capable of sequencing long lengths of DNA.
Nanopore sequencing measures changes in ionic current when single-stranded DNA fragments are moved through a nanopore, which are very small proteins forming pores are embedded within a membrane. Different DNA sequences will produce different levels of resistance when they pass through these pores, so the exact nucleotide sequence can be determined.
SMRT sequencing works by detecting different levels of fluorescence that are generated when a target DNA sequencing is replicated with modified nucleotides. This occurs in a series of wells and is limited by the quality of the DNA polymerase in use.
Advantages of long-read sequencing
Long-read sequencing has several distinct advantages compared to next-generation sequencing technologies.
One of the major advantages is that long-read sequencing can much more accurately sequence DNA containing repeats, which is where the same sections of DNA repeated within the genome. Sanger sequencing and next-generation sequencing often struggle with these repeats when assembling their DNA fragments.
These repeats, or copy number variations, are much easier to detect in long-read sequencing which is very important. For example in Huntingdon’s disease, the copy number of the DNA sequence ‘CAG’ dictates if a person is likely to develop the disease. Determining this copy number can have large implications in the diagnosis or prediction of genetic disease.
This sequencing technology can also more accurately detect larger-scale mutations, where long sections of DNA are deleted or moved. These structural variants often have roles in genetic disorders but have not been extensively studied in the past due to the lack of technology available.
What has been achieved with long-read sequencing?
In 2018, Jain et al. and other researchers from the University of California used long-read sequencing to accurately map the human Y chromosome centromere. The centromere is a very important section of all chromosomes which has a vital role within division, and its dysregulation has been linked to cancer formation and several different genetic syndromes like Down’s Syndrome and Turner Syndrome.
Nanopore sequencing has been used to detect and identify pathogens within clinical environments in as short as 6 hours from when the samples were taken.
Nanopore sequencing was also used during the ebola outbreak to rapidly and efficiently test blood samples for presence of the virus. The equipment was flown into West Africa and used directly on-site to monitor the epidemic.
Heather, J. M., & Chain, B. (2016). The sequence of sequencers: The history of sequencing DNA. Genomics. https://doi.org/10.1016/j.ygeno.2015.11.003
PHG Foundation. Long read sequencing technologies. (2018). https://www.phgfoundation.org/
Koren, S., & Phillippy, A. M. (2015). One chromosome, one contig: complete microbial genomes from long-read sequencing and assembly. Current opinion in microbiology. https://doi.org/10.1016/j.mib.2014.11.014
Amarasinghe, S. L., et al., (2020). Opportunities and challenges in long-read sequencing data analysis. Genome biology. https://doi.org/10.1186/s13059-020-1935-5
Eid, J., et al., (2009). Real-time DNA sequencing from single polymerase molecules. Science. https://doi.org/10.1126/science.1162986
Jain, M., et al., (2016). The Oxford Nanopore MinION: delivery of nanopore sequencing to the genomics community. Genome biology. https://doi.org/10.1186/s13059-016-1103-0