A Belgian team of computational biologists led by Stein Aerts (VIB-KU Leuven) has developed a new bioinformatics method called cisTopic. Inspired by text-mining methods, cisTopic helps scientists to gain insight into the mechanisms underlying the differences in gene regulation across and within the cells in our body by looking for common topics. In a new publication in Nature Methods, Aerts and his team demonstrate the broad range of applications of this method, from brain research to cancer biology.
Our genomes are controlled by combinations of regulatory molecules that "switch on" target genes in our DNA. These regulatory molecules bind to so-called enhancer and promoter regions in our chromosomes. Understanding when and how they are activated, can teach us a lot about the cellular diversity in our bodies.
"All the cells in our body essentially contain the same DNA," explains prof. Stein Aerts, who heads the lab for computational biology at VIB and KU Leuven. "What makes every cell type unique is which genes are active at any given time."
Recent advances in single-cell technology have enabled scientists like Aerts to look at gene activity and the accessibility of regulatory DNA regions for thousands of individual cells. But this information has not yet solved the challenge of reverse engineering the genomic regulatory code.
Carmen Bravo González-Blas and Liesbeth Minnoye, two young researchers in Aerts' lab, set out to tackle this problem. "The data we can obtain from a single cell, regarding accessibility of specific regulatory regions in its DNA, is very sparse. Yet, we wanted to group individual cells into clusters based on similarities of these accessible regions."
To tackle this problem, Bravo González-Blas borrowed a computational technique from the text-mining field, called "topic modelling". She explains: "In text mining, computers can discover "topics" from large collections of text, as well as terms that are important for each topic. When applied to our problem of gene control, the computer discovers topics that are important for each cell type in our body. It also allowed us to identify regulatory regions for each topic."
"We evaluated our new method on a variety of data sets, and found that it allows us to accurately recover both expected and new cell types," adds Minnoye. "Particularly on very sparse data, our method is more robust than previously developed approaches."
Learning more about complex tissues
The researchers applied cisTopic to cell populations that are biologically complex, such as the cells present in the mammalian brain. Not only did cisTopic allow them to recover the major cell types in the brain, but the team was also able to identify new subpopulations and master regulators of neuronal cell types.
"In addition to the brain, we also used cisTopic to investigate dynamic changes in gene accessibility in melanoma cell cultures from patients," adds Aerts. "When we modulated one of the known important modulators in these cancer cells, we could for the first time track changes in the accessibility of different DNA regions over time. Such approaches will finally allow us to better understand what these master regulators actually do in cancer cells, and which genes they control."
These different applications illustrate the value of the team's new method for studying the players and mechanism that orchestrate gene regulation in our cells. According to computational biologists like Aerts, this is an important step towards real-time and personalized monitoring of cell states in health and disease.