Using artificial intelligence, scientists have discovered mutations in parts of non-coding DNA known as “Junk DNA” that can lead to autism. This is the first study of its kind to connect the dots between autism and the human genome. The study titled, “Whole-genome deep-learning analysis identifies contribution of non-coding mutations to autism risk,” was published this week in the journal Nature Genetics.
Timofeev Vladimir | Shutterstock
The research was led by Olga Troyanskaya, deputy director for genomics at the Flatiron Institute's Center for Computational Biology (CCB) in New York City and a professor of computer science at Princeton University. Troyanskaya worked alongside Robert Darnell, a Professor of Cancer Biology at Rockefeller University and an investigator at the Howard Hughes Medical Institute.
The scientists used artificial intelligence software to scan the genome sequences of 1790 people with autism, along with their siblings and parents who did not have autism. Inherited mutations were excluded from the results.
This meant that the genomes of participants with a family history of autism were only picked up on if their DNA contained spontaneous mutations, rather than mutations they could have inherited from their parents.
The AI aspect of the study allowed the team to accurately connect the DNA mutations detected to the development of autism in the individuals.
This is the first clear demonstration of non-inherited, non-coding mutations causing any complex human disease or disorder.”
Olga Troyanskaya, Lead Author
Study co-author, Jian Zhou, said that there are many other diseases such as cancers and heart disease that could be evaluated using these techniques: “This enables a new perspective on the cause of not just autism, but many human diseases.”
Only around 1 to 2 percent of the genome is made up of genes that encode proteins. These proteins regulate the various functions of cells throughout the body. Much of the remaining non-coding regions serve to regulate gene expression. The scientists noted that some had mutations in the regions that did not code for any proteins while some had mutations in regions with coding functions. Both were similarly associated with autism.
Why artificial intelligence?
Only around 1 to 2 percent of the genome is made up of genes that code for proteins. The proteins then regulate the various functions all over the body. The rest of the genome works by regulating the coding regions of the genes.
When the mutations were seen in the coding regions of the genes, there was a 30 percent association with autism. On the other hand the other cases of autism where there was no positive family history, the connection remained unclear. This prompted them to explore the non-coding regions of the genome to see if there are any connections between the two.
The team quickly realized that trying to discover mutations in non-coding DNA is comparable to looking for a needle in a haystack! There are often dozens of mutations in non-coding regions of DNA, and many of these will not cause a person to develop a particular disease. This meant that the scientists needed to look outside the box, as traditional genomics tools would fail to detect the correct mutations.
Troyanskaya and her colleagues decided to use artificial intelligence technology to look for sequences that could predict the mutations in the non-coding regions of the genome.
This is a shift in thinking about genetic studies that we're introducing with this analysis. In addition to scientists studying shared genetic mutations across large groups of individuals, here we're applying a set of smart, sophisticated tools that tell us what any specific mutation is going to do, even those that are rare or never observed before.”
Chandra Theesfeld, Co-author
The researchers also noted that specific mutations in the non-coding regions could be linked to different IQs of children on the autism spectrum.
The machine learning model used data from the Simons Simplex Collection from the Simons Foundation. This collection contains records of the whole genome sequences of nearly 2000 “quartets” that are associated with autism in a child and are associated with normalcy in siblings and parents.
These four factors indicate mutations coming spontaneously in the child with autism with no inheritance. The team calculated the predicted effects of these mutations on the sibling that is unaffected by autism. Zhous explains, “The design of the Simons Simplex Collection is what allowed us to do this study. The unaffected siblings are a built-in control.”
Co-author Christopher Park, a research scientist at CCB in a statement said, “. This is consistent with how autism most likely manifests in the brain. It's not just the number of mutations occurring, but what kind of mutations are occurring.”
The team saw that certain mutations when inserted into cells in the lab altered the way genes were expressed and this predicted the outcome of such mutations by the machine.
Troyanskaya added that many diseases could be explored in those 98 percent of non-coding genes saying, “. Right now, 98 percent of the genome is usually being thrown away. Our work allows you to think about what we can do with the 98 percent.”
The authors concluded the study, stating, “predictive genomics framework illuminates the role of noncoding mutations in ASD and prioritizes mutations with high impact for further study, and is broadly applicable to complex human diseases.”
Simons Foundation Press Release. 27th May 2019. simonsfoundation.org.
Zhou, J., et al. (2019). Whole-genome deep-learning analysis identifies contribution of noncoding mutations to autism risk. Nature Genetics. doi.org/10.1038/s41588-019-0420-0.