Autism is classified as a 'spectrum' for a reason: Each case is different. Scientists have struggled to parse through the many ways autism can manifest, much less to link these varying observable traits (called phenotypes) to underlying genetics.
A new study in Nature Genetics from researchers at the Flatiron Institute's Center for Computational Biology (CCB) and their collaborators leverages data from SPARK, the largest-ever study of autism, to analyze phenotypic and genotypic data from more than 5,000 participants with autism of ages 4–18. The study identifies four groups for which individuals with autism share similar traits and links them to biological processes associated with specific genetic variants. With these classifications and information about the mechanisms that drive them, scientists can work toward more precise and personalized support, such as counseling or physical therapy, and help individuals access appropriate interventions earlier.
"A clinically grounded, data-driven subtyping of autism would really help kids get the support they need early on," says study co-lead author Natalie Sauerwald, a CCB associate research scientist. "If you know that a person's subtype often co-occurs with ADHD or anxiety, for example, then caregivers can get support resources in place and maybe gain additional understanding of their experience and needs."
Our study takes a 'person-centered' approach, in which we focus on the full spectrum of traits that an individual might exhibit rather than just one trait, like IQ. This approach was key to our discovery of these clinically relevant autism classes and to deciphering the biology that underlies them."
Olga Troyanskaya, senior research scientist and deputy director for genomics at CCB, senior author of the study
"This study is a powerful demonstration of how data from SPARK can lead to new, clinically-relevant insights, and it also underscores the power of leveraging machine learning approaches to analyze the large amount of phenotypic and genotypic data available in SPARK," says Kelsey Martin, executive vice president of autism and neuroscience at the Simons Foundation. "Participants in SPARK volunteer this data, and we are incredibly grateful for their generosity and their commitment to accelerating research."
The study was co-led by Aviya Litman of Princeton University, Sauerwald of the Flatiron Institute, and Troyanskaya, who holds joint appointments at Princeton and the Flatiron Institute, along with Christopher Y. Park and Yun Hao of the Flatiron Institute; LeeAnne Green Snyder and Jennifer Foss-Feig of the Simons Foundation; Chandra L. Theesfeld of Princeton; and Ilan Dinstein of Ben Gurion University in Israel.
Navigating a treasure trove of data
The project began after Sauerwald, one of the study's first authors, spoke with autism researchers about leveraging CCB's computational tools to analyze phenotypic and genotypic data from SPARK. SPARK, a landmark study supported by the Simons Foundation Autism Research Initiative (SFARI), is dedicated to improving the lives of people with autism by identifying the causes of autism and supporting research that informs more effective therapies, treatments, services and support. To date, the study has engaged over 150,000 people with autism and more than 200,000 of their family members.
"I think [SPARK] is the only cohort that has this combination of extensive phenotypic data as well as genetic data," says Sauerwald.
But finding the best way to analyze the data would be a challenge: It includes lots of different measures collected in lots of different ways.
"Some of our data is simple yes-or-no - does a participant have a particular trait or not?" says Sauerwald. "Other data is more nuanced, like questions that have categorical responses such as language levels, or still others that vary along a spectrum, such as the age at which a child reaches a developmental milestone."
The team tried many types of models to see which could best integrate the data and landed on a type called general finite mixture modeling. Mixture modeling is unique because it can handle these different data types individually and then integrate them into a single probability for each person, describing how likely they are to belong to a particular class.
A mixture model also allowed the team to take what they call a 'person-centered' approach to the data. Most studies take a 'trait-centered' approach, in which scientists pick a trait and examine everyone who exhibits it. A person-centered approach starts with a person and examines all their traits together, much like a clinician would provide care by attending to the whole individual.
"Our goal with the person-centered approach is to maintain representation of the whole individual so that we can more fully model their complex spectrum of traits together," says Litman, the study's other lead author. "Our model allowed us to do this, and to define groups of individuals with shared phenotypic profiles, which translated to clinically similar presentations."
Four distinct classes
Based on the results of their model, the scientists were able to classify SPARK participants into four main groups.
- Individuals in the first group, Social and Behavioral Challenges, have many co-occurring traits such as ADHD, anxiety disorders, depression and mood dysregulation. They also tend to display restricted or repetitive behaviors and challenges with communication. However, these individuals don't show many developmental delays: They tend to hit their developmental milestones at the same pace as children without autism. One of the larger groups, it constitutes around 37% of the participants.
- The second group, Mixed ASD with Developmental Delay, is the inverse of the Social and Behavioral Challenges group. While these individuals hit many of their milestones later in development than their peers without autism, they typically don't have the same kinds of issues with anxiety, depression, mood dysregulation or disruptive behaviors. This group represents approximately 19% of the participants.
- The third group, Moderate Challenges, includes individuals who show challenges in the areas laid out in the Social and Behavioral group, but typically not all of them, and to a lesser degree. This group also does not show developmental delays. Roughly 34% of participants fall into this category.
- The fourth and final group, Broadly Affected, is characterized by widespread challenges, including restricted and repetitive behaviors, social communication, developmental delays, mood dysregulation, anxiety and depression. This is the smallest group, accounting for around 10% of the participants.
Importantly, the researchers stress that these classes likely aren't a definitive, comprehensive grouping, but rather a place to start. "This doesn't mean that there's necessarily only four classes," says Troyanskaya. "I think what this demonstrates is that there are at least four classes. But having the four, which are clinically and biologically relevant, is significant."
Uncovering pathways at play
The classes were established by phenotype; that is, looking only at traits and not at genetics. Then, when the scientists started to study the genetics within each class, they were surprised at the results. Specifically, the genetic variants found in individuals within each class affected biological processes in very distinct ways.
In one analysis, the team traced how specific genetic changes affect certain genes-and then looked at what those genes actually do by studying which molecular circuits, or pathways, they act in.
Researchers found that each autism subtype had its own biological signature.
"There was little to no overlap in the impacted pathways between the classes," says Litman. "And what was even more interesting is that while the impacted pathways - things like neuronal action potentials or chromatin organization - were all previously implicated in autism, each one was largely associated with a different class."
Remarkably, the team discovered that not just which genes were impacted by mutations-but when they were activated-differed by class.
"In the Social and Behavioral Challenges class, quite surprisingly, the impacted genes were mostly active after birth, and this group also experienced very few developmental delays and the latest average age of diagnosis," says Litman. "We found the opposite to be true for the ASD with Developmental Delays class, where impacted genes were mostly active prenatally."
Big data leads to big insights
The team hopes this work underscores the importance of large datasets that contain many types of data.
"I think this work highlights how important it is to have large cohorts with matched phenotypic and genetic data," says Litman. "This way, we can connect across them and make discoveries that are not apparent by just looking at one modality alone."
In the future, the team would like to dive into even more types of data under this lens, including looking at the 'non-coding' portion of the genome. These genes constitute more than 98 percent of the genome but are less studied because they do not go on to create proteins. They still play very important roles in regulating gene expression and other cellular processes implicated in autism.
"The more data, the more discovery," says Sauerwald. "We know there's a lot of contribution from the non-coding genome in autism, but we haven't been able to study it yet in the context of these classes. So a big next step is going to be adding in this other 98 percent."
Source:
Journal reference:
Litman, A., et al. (2025). Decomposition of phenotypic heterogeneity in autism reveals underlying genetic programs. Nature Genetics. doi.org/10.1038/s41588-025-02224-z.