In this interview, Rami Mehio, head of software and informatics at Illumina, shares his experiences and contributions to major genomic projects like the UK Biobank's whole genome sequencing. He discusses the challenges and innovations in genomic data analysis, highlighting Illumina's role in advancing genetic research and precision medicine.
Please could you introduce yourself and give us a brief description of your professional background?
My name is Rami Mehio. I lead the Software and Informatics development at Illumina. I joined Illumina in 2018 as part of the Edico Genome acquisition and have since been responsible for overseeing Bioinformatics, sequencer software, cloud data platforms, and clinical software across Illumina’s portfolio. Prior to joining Illumina, I was at Edico and spearheaded the development of the DRAGEN BioIT processor and aided in its commercialization.
Could you describe Illumina's specific role and contributions in the UK Biobank's whole genome sequencing project, especially in terms of the technology and expertise provided?
Illumina is the sequencing technology partner for the project, meaning that the whole-genome sequencing (WGS) was done with Illumina sequencers. Illumina was also chosen as a bioinformatics partner in the analysis of each genome and their joint calling into a cohort. As such, the secondary analysis was performed using DRAGEN's award-winning germline pipeline with its multi-genome graph mapping and variant calling. To keep up with the computational and storage tasks of 500,000 WGS, the aggregation was performed with DRAGEN iterative gVCF genotyper (IGG) on Illumina Connected Analytics (ICA) cloud platform and employed ML-based filtering allowing for improved sensitivity and precision of variants.
Handling and analyzing such an extensive dataset must have presented unique challenges. What were these challenges, and how did Illumina’s technology address them?
The main challenge was ensuring that we had the right computational infrastructures in place to support analyzing 500,000 genomes. The secondary analysis of the 500,000 genomes was done in about six weeks on Amazon Web Service (AWS). We had to put quality assurance processes in place to make sure the analysis jobs for the rest of our customers were not starved of compute nodes.
Another challenge that we experienced was with aggregation, particularly with the number of files, the number of API calls, the size of the data, and the cost. This exercise allowed us to architect and tune DRAGEN IGG and ICA to make it a product that is unparalleled and able to aggregate millions of genomes with high precision and low cost. The architecture also allowed for solving the N+1 problem. This means, that if we were to aggregate an additional thousand genomes, we would be able to do it incremental and not do the 510,000 job.
How does Illumina’s technology improve the identification of less frequent genetic variants, and what impact does this have on genetic research?
The DRAGEN pipeline has unique features that improve the sensitivity and precision of the data, meaning we can detect variants that other pipelines have difficulty identifying. DRAGEN does this by using multireference genome technology that better matches the reference to the samples. This allows for accurate detection and mapping in difficult and highly polymorphic regions of the genome. We also introduced machine learning into our later versions of DRAGEN enabling us to significantly reduce false positives while improving sensitivity. DRAGEN’s precision and sensitivity have been put to the test and corroborated with two PrecisionFDA awards in germline disease, inherited disease, and oncology.
In what ways does your technology ensure that the data from this project is compatible and comparable with other large-scale population health studies?
Credit for this goes to the UK Biobank and its pharma Consortium members, some of the leaders in the All of Us program and its associated sequencing centers, and the leadership at Genomics England. They agreed on adopting the same version of the DRAGEN pipeline, and Illumina was able to support and remove obstacles. We provided details of the pipeline and the configurations on our centralized location and worked closely with each program to ensure consistency across the groups. A common pipeline is a key necessity for the data to be compatible and increase the statistical power of the cohorts.
What advancements in software and informatics have emerged from this project, and how do they push the boundaries of genomic research?
This is probably the biggest aggregation of whole-genome sequencing in the world at this time. Usually, aggregating large cohorts is quite difficult. From our experience, projects often tend to struggle when dealing with more than 10,000 samples. DRAGEN IGG on ICA is now able to scale to hundreds of thousands of samples while also solving the N+1 problem - adding another 10,000 samples to the cohort of 500,000 does not require the user to restart the joint calling from the beginning.
Based on the outcomes of this project, what are the broader implications for future research and healthcare, particularly in the context of precision medicine?
WGS data will enable researchers to identify rare non-coding variants that contribute to disease onset and progression. It will also identify mutations that protect against disease. By combining the WGS data with the rich clinical and lifestyle data of UK Biobank participants, researchers are now uniquely equipped to answer questions about why some individuals develop particular diseases but others do not and why certain conditions worsen in some individuals over time.
It will also help accelerate drug discovery and development by allowing researchers to identify new drug targets. This is important because pharmaceutical companies have found that potential drug targets supported by clear genetic evidence are twice as likely to result in effective medicines.
Can you discuss the importance of collaboration and partnership, like that seen in the UK Biobank project, in advancing genomic research?
Through collaboration, this partnership has enabled the dream of sequencing and analyzing a large number of genomes for the purpose of improving healthcare to become a reality.
The UK Biobank's vision for producing and making these cohorts of data publicly available is commendable. It opens the door for polygenic risk score evaluations and more precise drug discoveries.
Through this collaboration, Illumina’s software has matured, and our capabilities have grown. We’ve established our capabilities in the informatics space and it enabled us to bring more precise meaning to data.
Where can readers find more information?
About Rami Mehio
Rami is the global head of software and Informatics development at Illumina. He joined Illumina in 2018 as part of the Edico Genome acquisition and has continuously expanded his leadership, which now includes overseeing all of instrument software, cloud platforms, bioinformatics, and clinical software across Illumina’s entire portfolio. Over the past few years, Rami’s organization has helped establish Illumina as a leading provider in informatics, delivering innovative, reliable software products developed in deep collaborations with KOLs.