Combining health data with whole genome sequence data in cancer patients can help doctors provide more tailored care

NewsGuard 100/100 Score

In a recent study published in Nature Medicine, a group of researchers evaluated the impact of integrating whole-genome sequencing data with clinical outcomes across 13,880 tumors from 33 cancer types, assessing precision care potentials within the United Kingdom (UK) National Healthcare System (NHS) through the 100,000 Genomes Cancer Programme.

Study: Insights for precision oncology from the integration of genomic and clinical data of 13,880 tumors from the 100,000 Genomes Cancer Programme. Image Credit: namtipStudio/
Study: Insights for precision oncology from the integration of genomic and clinical data of 13,880 tumors from the 100,000 Genomes Cancer Programme. Image Credit: namtipStudio/


In the past decade, the UK has seen a 4% rise in cancer cases, highlighting the necessity for advanced molecular cancer testing, including tests for hereditary cancer genes and pharmacogenomics. The 100,000 Genomes Project, a major initiative by the UK Government within the NHS, aimed to standardize whole-genome sequencing (WGS) for cancer and rare diseases using a high-throughput, International Organization for Standardization (ISO)-accredited bioinformatics pipeline.

This project evaluated the role of WGS for cancer patients in the NHS. Patients consented to link their genomic data with anonymized health records in a secure Research Environment, promoting cancer research. The data contributed to the National Genomic Research Library, linked to longitudinal health data, facilitating genomic research and healthcare integration. The NHS Genomic Medicine Service, established in October 2018, leverages this knowledge to deliver genomic testing and care, ensuring equitable access and comprehensive testing through the National Genomic Test Directory. This directory standardized test methods, gene targets, and criteria across England.

Further research is needed to deepen our understanding of cancer genomics and enhance personalized treatment strategies, ultimately improving patient outcomes.

About the study

The methods employed in the present study followed stringent guidelines to ensure quality and accuracy. Deoxyribonucleic acid (DNA) was extracted from samples as per the Sample Handling Guidance, requiring 10 µg of germline DNA and a minimum of 1.3 µg tumor DNA for Illumina TruSeq Polymerase Chain Reaction (PCR)-free library preparation. PCR-based preparation was an alternative for insufficient DNA quantities. In certain scenarios, formalin-fixed tumor tissue was used for WGS.

Sequencing was performed on the High-Throughput Sequencin (HiSeq ) platform, achieving 100× coverage for tumor samples and 30× for normal samples. Quality checks included ensuring adequate high-quality sequencing data, sufficient genome coverage, low cross-patient contamination, and consistent sequencing data quality, monitored using principal component analysis.

The Illumina North Star pipeline was utilized for primary WGS analysis, with ISAAC software for read alignment. Despite ISAAC's limitations, all genomes were realigned with the Illumina Dragen platform for improved accuracy. Variant calling involved multiple tools and filters to minimize false positives, ensuring reliability in the final data set.

Copy number aberrations (CNAs) were identified using Canvas, while Manta was employed for structural variants (SVs) and long indels. The accuracy of somatic variant calling was verified for ISO accreditation.

Annotation and reporting involved aligning and trimming SNVs and small indels, with annotation through databases like Ensembl, Catalogue Of Somatic Mutations In Cancer (COSMIC), and Clinical Variant Database (ClinVar). Interpretation of CNAs considered the gene's role in cancer, reporting only significant changes. Co-occurrence analysis of somatic small variants and CNAs was also conducted.

For germline variant reporting, only those classified as pathogenic or probably pathogenic in ClinVar were considered. These variants were reviewed within Genomic Tumor Advisory Boards for clinical relevance.

The study also analyzed mutational signatures and tumor mutational burden (TMB), employing tools like SigProfiler and algorithms like Homologous Recombination Deficiency Detection (HRDetect) and CHORD for comprehensive assessment.

Clinical data resources included patient and sample data collected via OpenClinica, supplemented by data from NHS England, Public Health England, and Office for National Statistics. This data was linked to genomic information to corroborate clinical submissions and determine tumor stage and type.

Survival analysis utilized R software, employing Kaplan–Meier plots and Cox proportional-hazards models. The date of death was obtained, with additional health event data used for right-censoring. Ethically, the research adhered to all relevant regulations, with approval from the East of England-Cambridge South Research Ethics Committee. Participants, identified by NHS professionals, provided written informed consent.

Study results 

The Cancer Programme of the 100,000 Genomes Project, an initiative under the NHS, sequenced 16,358 tumor-normal pairs from 15,241 cancer patients between 2015 and 2019. This extensive whole-genome analysis (WGA) covered 33 tumor types, predominantly fresh-frozen samples, with matched normal samples mostly derived from blood.

The study achieved 100× coverage for tumor samples and 30× for normal samples, surpassing the coverage of the Cancer Genome Atlas (TCGA) cohort. Certain tumor categories, like hematological and pediatric cancers, were excluded. The sample collection was confirmed by linking genomic data with the National Cancer Registration and Analysis Service (NCRAS) and Hospital Episode Statistics (HES) datasets.

Notably, breast invasive carcinoma, sarcoma, colon adenocarcinoma, and kidney renal clear cell carcinoma were among the most sequenced tumor types. The distribution of samples varied across 13 NHS GMCs in England, with significant variations in age and biological sex across tumor types. Staging information was available for most tumors, revealing a high percentage of advanced stage cancers in certain types like ovarian high-grade serous carcinoma and skin cutaneous melanoma.

In the realm of clinical actionability, WGS enabled the detection of a wide range of genetic alterations, including somatic and germline variants. These findings were integrated into standardized genomic results and reviewed by Molecular Tumor Boards (GTABs). The analysis revealed a high percentage of tumors harboring mutations in National Genomic Test Directory (NGTDC)-recommended genes, though variability was observed across cancer types. This variability underscores the need for personalized clinical interpretation.

Furthermore, the analysis highlighted the presence of mutations in cancer types where they are not currently indicated for testing, suggesting new avenues for clinical trial recruitment and review.

The landscape of somatic small variants was dominated by Tumor Protein P53 (TP53) mutations, with varying frequencies across different cancer types. PIK3CA was the second most frequently altered gene, with mutations spanning multiple tumor types. The study also observed a high prevalence of amplifications or losses in key genes across all cancer types. The inclusion of fusions in the NGTDC, particularly in lung cancers, has become a standard of care. Additionally, the study highlighted the importance of germline findings, particularly in ovarian high-grade serous carcinoma, where a significant number of patients harbored Breast Cancer Type 1 Susceptibility Protein (BRCA1) and BRCA2 variants.

Pangenomic markers like TMB and HRD status were also evaluated, showing significant variation across cancer types. These markers are becoming increasingly important in predicting treatment outcomes and guiding clinical decisions. The study's ability to link WGS data with real-world clinical data allowed for a detailed assessment of treatment outcomes based on pangenomic markers. For example, HRD status was associated with better outcomes in patients treated with platinum therapies, particularly in invasive breast carcinomas and ovarian high-grade serous carcinomas.

The co-occurrence of different types of mutations was also explored, revealing significant relationships between copy gains and specific oncogenes. Survival analysis using real-world data highlighted the impact of mutations in certain genes on overall survival, with Cyclin-DependentKinase Inhibitor 2A (CDKN2A) mutations notably associated with poorer outcomes. This comprehensive analysis underscores the value of integrating genomic and clinical data in understanding cancer genomics and improving patient care.

Journal reference:
Vijay Kumar Malesu

Written by

Vijay Kumar Malesu

Vijay holds a Ph.D. in Biotechnology and possesses a deep passion for microbiology. His academic journey has allowed him to delve deeper into understanding the intricate world of microorganisms. Through his research and studies, he has gained expertise in various aspects of microbiology, which includes microbial genetics, microbial physiology, and microbial ecology. Vijay has six years of scientific research experience at renowned research institutes such as the Indian Council for Agricultural Research and KIIT University. He has worked on diverse projects in microbiology, biopolymers, and drug delivery. His contributions to these areas have provided him with a comprehensive understanding of the subject matter and the ability to tackle complex research challenges.    


Please use one of the following formats to cite this article in your essay, paper or report:

  • APA

    Kumar Malesu, Vijay. (2024, January 16). Combining health data with whole genome sequence data in cancer patients can help doctors provide more tailored care. News-Medical. Retrieved on April 19, 2024 from

  • MLA

    Kumar Malesu, Vijay. "Combining health data with whole genome sequence data in cancer patients can help doctors provide more tailored care". News-Medical. 19 April 2024. <>.

  • Chicago

    Kumar Malesu, Vijay. "Combining health data with whole genome sequence data in cancer patients can help doctors provide more tailored care". News-Medical. (accessed April 19, 2024).

  • Harvard

    Kumar Malesu, Vijay. 2024. Combining health data with whole genome sequence data in cancer patients can help doctors provide more tailored care. News-Medical, viewed 19 April 2024,


The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of News Medical.
Post a new comment

While we only use edited and approved content for Azthena answers, it may on occasions provide incorrect responses. Please confirm any data provided with the related suppliers or authors. We do not provide medical advice, if you search for medical information you must always consult a medical professional before acting on any information provided.

Your questions, but not your email details will be shared with OpenAI and retained for 30 days in accordance with their privacy principles.

Please do not ask questions that use sensitive or confidential information.

Read the full Terms & Conditions.

You might also like...
New cancer projections show increased prostate cases by 25% in 2050, despite prevention efforts