Scientists use machine learning to interpret mosquito genome

Scientists are using machine learning to identify important sequences of DNA within the mosquito genome that regulate how the insect's cells develop and behave.

The research project, funded by the National Institutes of Health (NIH), could have implications for disease control, potentially facilitating efforts to use genetic engineering to control mosquito populations, or to create mosquitoes that have reduced ability to transmit maladies, such as malaria, to humans.

"Our work will break new ground in the field of mosquito genomics and genetics," says Marc Halfon, PhD, professor of biochemistry in the Jacobs School of Medicine and Biomedical Sciences at the University at Buffalo. "Mosquitoes are responsible for hundreds of thousands of deaths each year. Although we know the sequence of the mosquito genome, we have little functional information about what much of that genome sequence does.

"Our work will take important steps toward filling in this crucial missing information. It will demonstrate our ability to functionally annotate the regulatory elements within genomes of various insect disease vectors without requiring extensive -- and expensive -- new genome-scale experimental data for each."

The project is funded by a $449,000 grant from the National Institute of Allergy and Infectious Diseases. It focuses on Anopheles gambiae, an important vector for malaria transmission.

Using machine learning to interpret the mosquito genome

Within the genome of every plant and animal, there are regulatory switches -- strings of DNA that control the behavior of genes, dictating when and where in the body different genes are turned on and off.

These regulatory sequences matter because they can affect a species' mating success and resistance to insecticides, Halfon says. In addition, regulatory mechanisms are crucial to genetic engineering of mosquitoes, in which researchers seek to control the expression of foreign or mutated genes introduced in a target animal.

For over a decade, Halfon has worked with UB's Center for Computational Research to build a database called REDfly that contains more than 5,600 regulatory sequences for a different insect species, the fruit fly Drosophila melanogaster. Now, his team is leveraging this trove of information to learn more about regulatory mechanisms within the mosquito genome.

With Saurabh Sinha, a computer scientist at the University of Illinois at Urbana-Champaign, Halfon developed a software called SCRMshaw that learns the regulatory sequences within REDfly, then searches the genomes of other insects for strings of DNA with similarities. The software has successfully identified regulatory sequences in mosquitoes that look nothing like Drosophila sequences to the human eye, but that possess similar traits (such as containing a related assortment of short 3- to 6- letter DNA subsequences).

"Finding regulatory elements is hard -- traditionally, it has been done by tedious experimental work that examines one gene at a time," Halfon says. "We wanted to know how you can do this faster: Just by looking at a DNA sequence, can you tell where the regulatory elements are? In at least some cases, the answer appears to be, 'Yes'."

Early implementation of SCRMshaw

Using SCRMshaw in mosquitoes, Halfon, Sinha and colleagues were able to identify some of the regulatory sequences that may cause the activity of a network of genes to shift from the midline of the ventral nerve cord -- analogous to the human spinal cord -- to the lateral regions during the formation of the embryo of the mosquito Aedes aegypti, which transmits Zika, dengue fever and chikungunya.

This work, published online June 21 in the journal Developmental Biology, highlights how SCRMshaw can pinpoint regulatory sequences in non-Drosophila species.

"It shows how we can use SCRMshaw to address interesting biological questions of development and evolution," Halfon says.

The next step is to use the new NIH funding to conduct extensive discovery of regulatory elements within Anopheles gambiae.

"We will focus on trying to identify regulatory sequences most useful for understanding aspects of mosquito biology that are relevant to its role as a disease vector -- for instance, development of the salivary glands or the midgut, or olfaction -- or that could be useful for biocontrol methods, such as genes affecting reproduction," Halfon says. "Once we have generated a high-confidence set of regulatory element predictions, we will test them in transgenic mosquitoes."


University at Buffalo


The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of News Medical.
Post a new comment
You might also like...
Machine learning used to generate a new holistic model for coronary artery disease