In a recent study published in Nature Biotechnology, researchers developed a whole-genome sequencing (WGS) pipeline based on nanopore technology for the rapid identification of genetic variants.
WGS has benefitted critically ill patients by enabling the identification of the genetic causes of infections. However, the time needed for WGS impedes its usage in acute infections. Although the standard WGS turnaround time is weeks, studies have reported reduced turnaround time to days for neonates, with the fastest reported time being 14.3 hours.
The authors of the present study hypothesized that their PromethION platform (Oxford Nanopore Technology) could further decrease WGS time and enable real-time base calling, alignment, variant calling, and variant filtration.
About the study and findings
In the present study, researchers developed a nanopore WGS pipeline and reported its characteristics and clinical efficacy in the rapid identification of causative genetic variants.
The nanopore WGS pipeline was designed to overcome the shortcomings of previously used WGS approaches. The pipeline used an optimized protocol for sample preparation, sequence distribution across 48 flow cells, maximized deoxyribonucleic acid (DNA) quality with reduced preparation time, and computation methods such as Google Cloud and graphic processing unit (GPU) acceleration, and multiple software and pipelines with parallel sequence run.
In addition, the variant calling was accelerated, and the speed of variant filtration was increased. These improvisations decreased the overall run time, enabled near real-time base calling and alignment, improved variant calling, and accelerated variant filtration to enable rapid WGS diagnosis. The pipeline improvisations were described using the Genome in A Bottle (GIAB) HG002 sample in the present study.
First, the standard sample preparation protocol was used for distributing sample libraries over 48 flow cells. For improving the quality of DNA quality obtained from small volumes of blood, the DNA-extraction protocol was adapted and subsequently, a yield of 36μg of high-molecular-weight and >60kb sized DNA was obtained from 1.6 ml of blood within 50 minutes.
For reducing library preparation time, the quantity of input DNA was increased to 4μg for every reaction (from 1μg) and eight reactions were run in parallel (rather than sequential). This approach yielded an optimal genetic library of 16μg, which allowed loading 333ng of the library to each flow cell. Moreover, the authors did not use sample barcoding, which reduced the library preparation time by 37 minutes.
For making the pipeline cost-effective, the cells were reused for multiple samples by removing DNA from the samples by nuclease washes after every run. The maximum DNA carryover rate was 0.4%. On further validation, the maximum tolerable carryover between the samples was 1%.
For performing near real-time base calling and alignment, the Guppy software and the Minimap2 software were used, respectively, with an HG002 sample depth of 218-Gb. However, even on applying maximum throughput (2.5 Gbmin−1), an overhead time of 18.5 hours for base calling and alignment post-sequencing would be observed. Thus, computational methods, such as Google Cloud and parallelized base calling and alignment across multiple graphics processing units (GPUs) with compressed fast5 type files, were used.
In addition, the time-periodic upload model was used in which raw data was distributed across 16 compute instances running in parallel, each running the Guppy and Minimap2 software for three flow cells per instance. This decreased the overhead time to 25 minutes.
For improving variant calling, the team used two pipelines-PEPPER–Margin–DeepVariant and Sniffles for identifying long reads of small variants and structural variants (SV), respectively. The runtime was reduced to 29 minutes by parallel use of the DeepVariant and Sniffles pipelines across 14 GPUs and two central processing units (CPUs), respectively. The runtime further decreased to 23 minutes by integrating the NVIDIA Parabricks software and the DeepVariant pipeline.
For improving the accuracy of detection of genetic insertions and deletions (indel), the team modified the images generated by the DeepVariant software to incorporate realignment of reads based on the presence of alternative alleles for each indel event. This not only improved indel detection but also reduced the curation time needed for variant assessment.
For improving the efficiency of variant filtration, the team developed a customized classification tree scheme based on the Stanford Clinical Genomics Program’s (GCP’s) classification. Several criteria were applied based on patient phenotype and variant annotations and scored independently. Only the high prioritized variants with total scores ≥4 were manually reviewed. In the HG002 sample, this variant filtering method reduced the count of prioritized variants from 101 to 20, narrowing down the list of probable causative variants of genetic disease.
The team applied the nanopore WGS pipeline to two cases: a 57-year-old male patient with severe COVID-19 and a 14-month-old female infant under intensive care post respiratory failure and cardiac arrest, in which the pipeline identified variants of the TNNT2 gene and LZTR1 gene as the probable causative variants, respectively, within eight hours of blood collection.
To summarize, the study findings showed that the nanopore WGS pipeline yielded human WGS data within two hours and identified causative genetic variants within eight hours. This indicates about 50% improvement on the fastest reported WGS time to date.