Harnessing Pediatric Cancer Genomic Data in the Cloud

Thought LeadersDr. Jinghui ZhangChair, Department of Computational BiologySt. Jude Children's Research Hospital

An interview with Dr. Jinghui Zhang, conducted by Kate Anderton, BSc

What is the history of genomics research at St. Jude Children’s Research Hospital?

In 2007, before the advent of next generation sequencing, St. Jude used microarrays to characterize gene expression and copy number variation and Sanger Sequencing to identify sequence mutations in leukemia. The experiments, led by Dr. James Downing and Dr. Charles Mullighan, were considered a major breakthrough at the time.

Image Credit: Mopic/ Shutterstock

In 2010, genome studies really started to take off around the world. St. Jude launched a USD $60 million project in collaboration with Washington University called the Pediatric Cancer Genome Project (PCGP) to investigate major pediatric cancers with very poor outcomes using whole genome sequencing.

At the time, only one human cancer genome had been sequenced using next-generation sequencing (NGS), and this was the AML genome. When we first started the project, the aim was to get Washington University to do the sequencing, as they were the ones who had sequenced the AML genome and thus, had the technology to carry out more analyses.

However, we now have a very integrative team who come from all areas of cancer research, from the clinical leads at St. Jude working on leukemia, solid tumors and brain tumors, to scientific leaders in these areas and computational biologists who work alongside them.

The big push at the moment is getting the Pediatric Cancer Genome Project completed. Phase I of the project, which focused on characterizing the genomic landscapes of major pediatric cancers, took three years to complete, from 2010-2013, then transitioned to a focus on germline predisposition analysis and epigenomic landscape between 2013-2015.

The project is now in its third phase, and the real clinical genomics efforts have begun. The first two years (2015-2017), involved a lot of planning and development of clinical sequencing infrastructure, and expansion of the germline research to include long-term survivors of pediatric cancer. We are now starting real-time clinical testing for every patient at St. Jude.

What is St. Jude Cloud and what does it provide to the global research community?

St. Jude research has generated significant data on pediatric cancer, and our CEO Dr. Downing and the institution as a whole are committed to sharing data with the broader research community. All PCGP data have been uploaded to public centralized repositories.

Image Credit: ImageFlow / Shutterstock

Scientists who want the data make a request, get approved, and download the data to their own local computing infrastructure. Data access has been granted to 300 research laboratories across the world.

What motivated us to develop St. Jude Cloud was the realization that downloading data from a public repository to a local computing infrastructure is a time-consuming process and only feasible for researchers with access to large computing infrastructure.

Hosting data on St. Jude Cloud will democratize data access and enable scientists to focus on carrying out innovative analysis instead of download data. Scientists can access data using our visualization tools and bring their own tools to the cloud to carry out innovative analysis.

What impact do you think St. Jude Cloud will have on patient care?

St. Jude Cloud attracts two main types of user. One is computer scientists and computational biologists who are interested in applying their innovative analysis tools to our data sets and using the data to make new discoveries. The other is translational scientists pursuing new treatments and diagnostic tests.

Our data sets cover the entire genome, so there are inevitably components of the genome that we haven’t explored in great detail yet. Someone with a particular research interest may use the data St. Jude has gathered as a starting point for a new research investigation.

Also, if another lab finds the same mutations as we do, they can combine their data with ours on St. Jude Cloud, which may provide sufficient statistical power by using the joint data. A statistically significant finding could guide the development of new treatments and diagnostic techniques.

My dream is that researchers will start to share data more routinely and the scientific community can work as a whole. As I said, pediatric cancer is a very rare disease, and if we don’t share data, it will be very difficult to move forward with treatment protocols and classify cancers in more detail for precision medicine.

Please describe your recent research in the field of pediatric cancer at St. Jude.

My most recent paper, published in Nature, focused on results of a pan-cancer analysis study, which was carried out in collaboration with the National Cancer Institute (NCI). We carried out genome-wide mutation signature analyses on almost 1,700 pediatric leukemias and solid tumors.

Before the study, we didn't expect to see a striking pattern in terms of the mutation signatures, because unlike adults who get cancer, children are not exposed to many environmental risk factors for cancer, such as cigarette smoke or UV radiation. We therefore didn’t expect the environment to be a major contributor.

However, we were surprised to find a UV signature in eight aneuploid leukemias. We then wondered, is this observation real or is it just an incidental finding that is present to that group of patients from the Children's Oncology Group that we sampled?

As we only have eight patients coming out of the study of about 300 cases, we will need to replicate the study in an independent cohort. However, Dr. Scott Newman, group lead for bioinformatics analysis at St. Jude who is leading the scientific aspect of St. Jude Cloud, explored whether we can use the data already in St. Jude Cloud to replicate the UV signature we found from the COG cohort.

We're very satisfied to see that the UV signature was replicated. Furthermore, the replication was made in a cohort analyzed on a sequencing platform that is different from that used in the NCI study, further confirming that the pattern is not an artifact of sequencing.

Hence, we have now replicated the findings in two independent cohorts, using two different sequencing platforms, so we are very confident about the results. A study like that, typically, if you tried to download the data, would take at least a half year, minimum; whereas computing on St. Jude Cloud takes a fraction of that time to complete.

What technology did you use to carry out your research?

There are different aspects to the technology. We have our own local High-Performance Computing Cluster, but a cloud platform allows us to tab into even larger computing resources for managing intensive computing.

As St. Jude starts to use the results from clinical genomics for therapy, we need to complete data analysis in a fixed amount of time. Therefore, we need to overcome the computational bottleneck that holds us back at specific stages of data processing, and one of the ways this can be done is through cloud technology.

We are also starting to bring epigenetic profiling data sets into St. Jude Cloud. We think this will be very helpful in interpreting the regulatory variants in non-coding regions. These are areas of the human genome that have been rarely characterized, at present.

Another area is 3D genome architecture, looking at how different regions on the genome interact. This really helped us gain an insight into the regulatory variants that impact transcriptional networks in cancer.

Image Credit: klss / Shutterstock

How is technology changing the way that research is carried out at St. Jude and across the world?

Over the past decade, technology has become a lot more accessible. Take, for example, genome sequencing. In the past, only big genome centers could carry out sequencing analyses.

Now, using St. Jude Cloud, small labs can access huge datasets and computing power. If they send a specimen to a reference lab, they can check the quality of the data and ask questions about the validity of the results. I think, in a way, technology has democratized these genome-wide assays in a similar way to how Apple and Microsoft make computers accessible to the general public.

We are facilitating the ability of bench scientists to access complex data in a way that allows them to apply domain knowledge and their scientific insight to the data that is already out there. This allows us to connect the dots between computational scientists and bench scientists, who are currently considered to be two separate entities.

We hope that, with the increased accessibility of these assays and computing platforms, computational biology will become integrated and bench scientists can contribute to data analysis, even though they may not completely understand the computing technology behind it.

What does the future hold for your research team?

The pan-cancer study focused on the diagnosis of cancer, and relapsed cancers and how total evolution, from diagnosis to relapse, has not yet been fully elucidated.

What we'd like to do now is focus more on how tumors evolve in drug treatment. Hopefully, we can use this data to develop new cures for high-risk patients who relapse, because this is the bottleneck of treating pediatric cancer patients.

The second area that we’re working on is studying the general health of long-term survivors of pediatric cancer. This is part of an initiative called St. Jude LIFE, which is led by Dr. Les Robison at St. Jude.

The St. Jude LIFE program longitudinally follows patients from St. Jude throughout their lifetimes to better understand the risks for side effects later in life. The goal is not just to cure patients with pediatric cancer, but also to ensure that they have a healthy and productive life.

We want to understand the long-term effects of cancer therapies, and whether it is possible to use genomics data to identify patients at high risk for late-stage toxic side effects. We hope that the findings can inform future research, so we can develop less toxic therapies that will improve the quality of life for kids in the future.

Finally, in our study, we only looked at the coding region of the genome, which equates to around 3 percent of the total genome. Now we need to look at the 97 percent non-coding DNA and identify variants.

This is more challenging than looking at coding mutations, because you need to understand the epigenetic landscape of pediatric cancer to be able to have a good interpretation of the non-coding variant that you're looking at.

Where can readers find more information?

About Dr. Jinghui Zhang

Jinghui Zhang, PhD, is a computational biologist whose work focuses on the integrative analysis of large-scale, multi-dimensional genomics data to understand the initiation and progression of diseases.

In her early career, Dr. Zhang participated in the development of the widely used Basic Local Alignment Search Tool (BLAST) algorithm and led the genetic variation analysis of the first assembled human genome.

Dr. Zhang’s lab has developed innovative computational tools for analyzing and visualizing genetic variations and somatic mutations and has led the largest pediatric pan-cancer study in collaboration with the National Cancer Institute.

Dr. Zhang’s research group is primarily responsible for analyzing the whole-genome sequencing data generated from the St. Jude Children’s Research Hospital-Washington University Pediatric Cancer Genome Project (PCGP).

Advertisement

Comments

The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of News-Medical.Net.
Post a new comment
Post