Artificial intelligence can solve problems at remarkable speed, but it's the people developing the algorithms who are truly driving discovery. At The University of Texas at Arlington, data scientists are creating sophisticated formulas that enable AI to interpret massive biological datasets to uncover how diseases start, how the immune system responds, and what treatments might work best.
Xinlei (Sherry) Wang, Jenkins Garrett professor of statistics and data science in UT Arlington's Department of Mathematics, has received a four-year, $1.28 million federal grant to advance her study, "Statistical and Deep Generative Modeling for Enhanced CyTOF Data Interpretation and Discovery."
Dr. Wang, who also serves as the founding director for research in the Division of Data Science, is leading efforts to create AI models that can analyze complex biomedical data. In simple terms, CyTOF is a cutting-edge lab technology that scans thousands of individual cells at once and measures dozens of proteins within them.
The challenge, Wang said, is presenting that data in a way other scientists can easily use. That's where her team of Bayesian statisticians and data scientists come in. They develop bioinformatic and statistical tools that serve as a "one-stop shop" for analyzing CyTOF data.
Using a Bayesian framework, the researchers are building a single statistical model that produces clear, interpretable results. This model is designed to show how CyTOF data-highly detailed information from single-cell analysis-is generated, revealing the underlying patterns more accurately. AI can uncover hidden relationships in the data that people might miss, and it delivers results much faster.
Without AI integrated into our Bayesian framework, you couldn't scale and it would take several days or even longer to get results. With AI, you get reliable, rigorous results within seconds, even for millions of cells. We model what we know about the data using transparent Bayesian models so the parameters are interpretable. For example, a parameter might indicate increased protein expression in the disease group compared to the control group."
Xinlei (Sherry) Wang, Jenkins Garrett professor of statistics and data science, UT Arlington's Department of Mathematics
The algorithms combine data from single-cell transcriptomics-next-generation gene sequencing-with CyTOF, a detailed single-cell protein analysis. Together, they provide a fuller picture of what's happening inside cells. Each cell carries clues about health and disease that could ultimately lead to better treatments for diseases such as cancer. The system can analyze millions of cells at once, each with 40 to 100 protein expressions or tens of thousands of gene expressions, identifying different cell types and comparing healthy and diseased cells.
Wang and her team's work is already gaining attention. Kevin Wang, a recent doctoral graduate now serving as a tenure-track assistant professor at Davidson University, won the Best PhD Poster Award last spring for presenting the group's preliminary results at the 2025 Conference of Texas Statisticians.
In addition, a study recently published in Nature Communications-co-authored by Wang, postdoctoral researcher Zeyu Lu, and colleague Lin Xu-introduced a tool called Bayesian Identification of Transcriptional Regulators from Epigenomics-Based Query Regions Sets, or BIT, to enhance the accuracy of boost gene research.
Other members on Wang's team include members from UTA's Division of Data Science; Li Wang, associate professor of mathematics; Yike Shen, assistant professor of earth and environmental sciences; and UT Southwestern researchers Yuqiu Yang and Andy Xiao.
"AI is powerful, but it's often a black box," Wang said. "We are designing user-friendly, open-source software so end users can run it on their laptops. Existing algorithms can't handle big data this efficiently. We combine statistical rigor, uncertainty quantification, and scalability-all in one framework."
Source:
Journal reference:
Lu, Z., et al. (2025). BIT: Bayesian Identification of Transcriptional regulators from epigenomics-based query region sets. Nature Communications. doi.org/10.1038/s41467-025-60269-4