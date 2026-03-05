Researchers from the Johns Hopkins Kimmel Cancer Center and The Johns Hopkins University have created a novel database structure that allows investigators anywhere to more easily study multiple types of cancer data - including laboratory results, genetic sequencing and imaging data - in one setting.

Called AstroID, the resource organizes clinical and correlating blood and tissue specimen information in six tiers, including information from the patient (deidentified to protect privacy); diagnosis; clinical events such as treatment or a blood draw; specimens such as material from a biopsy or serology; and then details about how those are processed by the lab into tissue blocks and vials, down to individual slides or aliquots.

The structure, built in a commercial web-based application called REDCap, can be subsequently scaled to accommodate thousands of patients and the spatial characterization of billions of cancer cells. A description of AstroID, which has been made available for any researcher to use, was published Dec. 25, 2025, in the Journal for Immunotherapy of Cancer. The work was supported in part by the National Institutes of Health.

Researchers at Johns Hopkins Medicine have now deployed this structure in their laboratories for 16 different patient groups with multiple tumor types, and they have over 1 billion cells spatially mapped and tagged with clinical information from patient experiences.

Typically in oncology, each patient's course includes multiple visits, treatments and outcome measures, explains Janis M. Taube, M.D., director of the Division of Dermatopathology and co-director of the Tumor Microenvironment Laboratory at the Bloomberg~Kimmel Institute for Cancer Immunotherapy. To identify and characterize biomarkers, these parameters need to be linked to multiple tests and assessments, including blood-based laboratory values, tissue-based pathology, radiography, genomic studies and more.

What this structure does is allow me to ask questions across all of this data that's already been gathered, and across tumor types, and combine it all together in the context of the longitudinal patient experience." Janis M. Taube, M.D., Director of the Division of Dermatopathology and co-director of the Tumor Microenvironment Laboratory at the Bloomberg~Kimmel Institute for Cancer Immunotherapy

For example, her lab often conducts studies of patients with melanoma. If she conducted a study 10 years ago looking at patients' age at diagnosis and what therapies they received, and then later wanted to do another study of this patient population and survival, she might have had to repeat some of the steps to compile a new cohort and regather information about treatments received, what specimens were collected, and clinical outcomes. "Investigators across the whole institution are also trying to tap into these patients and collect this information," she says. "There were really huge inefficiencies across how we were working, and duplicating efforts."

It had been painstaking for researchers to manually enter data, so cancer studies typically were designed around relatively small cohorts, adds Alexander Szalay, Ph.D., Bloomberg Distinguished Professor and professor in the Department of Computer Science at The Johns Hopkins University.

"What we are trying to do is to scale out so we can handle patients on the order of hundreds or thousands of patients in a study," says Szalay, who also is the director of the Institute for Data Intensive Science at Johns Hopkins. "One of our postdoctoral students, Elizabeth Will, in partnership with graduate student Benjamin Green, came up with this wonderful idea of how to organize all the medical and specimen data into multiple hierarchical tiers, which then can be easily translated to a query-oriented platform based on a large relational database."

While the Johns Hopkins team for now is using this platform for cancer studies, the structure could be adapted to characterize longitudinal biospecimens from any disease process, they said.

Publicly available code for AstroID is at github.com/IUREDCap/redcap-etl-module. Additional information for AstroID is available at github.com/AstroPathJHU/AstroID/releases/tag/v0.0.1. Exported data can be explored on its own for research purposes on clinical outcomes, independent of additional biomarker correlates, or merged and queried with a variety of scientific correlates.

Study coauthors were Scott Carey, Govind Warrier, Aasheen Qadri, Andrew Jorquera, Sigfredo Soto-Diaz, Daphne Wang, Joel C. Sunshine, Julie Stein Deutsch, Robert A. Anders, Qingfeng C. Zhu, Ludmila Danilova, Leslie Cope, Evan J. Lipson and Logan L. Engle of Johns Hopkins, and Tricia R. Cottrell of Queen's University in Ontario, Canada.

The work was supported by The Mark Foundation for Cancer Research, the Melanoma Research Alliance, the Marilyn and Michael Glosserman Fund for Basal Cell Carcinoma and Melanoma Research, the Bloomberg~Kimmel Institute for Cancer Immunotherapy and the National Cancer Institute (grant #s R01CA142779 and T32CA009071).