Neighbor clustering better identifies functionally related genes on the bacterial chromosome

Download PDF Copy

Add News Medical on Googleas a preferred source

Aug 1 2007

The moment a bacterial pathogen makes contact with its host, its goal is simple: to infect.

To do the job, it has to turn a specific array of genes on and off and show a little know-how in adapting to its new environment. A new tool developed at Rockefeller University allows scientists to identify more precisely than before this specific array of genes , known and unknown , that are expressed as a result of this interaction as well as determine what functions they may perform.

The new technique takes advantage of information already stored within the structure of all bacterial chromosomes , namely, that neighboring genes are more likely to share similar functions and regulatory roles than distant ones , and improves upon previous methods used by scientists to survey genes expressed by a bacterial pathogen and groups them in biologically meaningful ways. The tool comes at a time when, more than ever, biologists need statistically savvy ways to organize the unprecedented body of data generated by microarray techniques, which are used to track the transcription of nearly every gene.

The researchers, whose findings appear in the July 2007 online issue of PLoS Computational Biology , tested their new method, called neighbor clustering, on group A Streptococci ( Streptococcus pyogenes ), a ubiquitous organism that has a predilection for the upper respiratory tract and causes strep throat. Specifically, they wanted to figure out a more reliable way to identify which genes are expressed when Streptococcus initially latches on to its host, determine the function and regulatory roles of these genes and group them in clusters whose members are functionally related.

"By figuring out which streptococcal genes are expressed during the first overt steps of infection, researchers can better understand ways to prevent it," says Vincent Fischetti, co-head of the Laboratory of Bacterial Pathogenesis and Immunology, whose research associate, Patricia Ryan, is the lead author of the study.

For years, scientists tried to reliably group genes into functionally meaningful clusters by looking at similar expression patterns. But despite using rigorous statistical approaches, Ryan and Fischetti, like others, consistently found that gene clusters were not reliably organized by function or regulation. Perhaps most problematic, genes of unknown function usually clustered together, providing scientists few clues as to what their functions might be.

To overcome these limitations, Ryan and Fischetti developed GenomeCrawler, a computer program written by Brian Kirk, a computational biologist at Rockefeller. It looks at microarray data and then uses neighbor clustering to identify statistically significant clusters of neighboring genes that are co-expressed. When GenomeCrawler systematically walked through the bacterial chromosome, it not only identified many more gene clusters than their initial analysis provided, but the genes within these clusters were more related.

Even more surprising, GenomeCrawler identified clusters that contained genes of unknown and known function. The statistical significance of these clusters suggests that the unknown genes are functionally related to known genes, a finding that Ryan and Fischetti confirmed experimentally. "Our method, unlike the one before, gives us preliminary clues as to what the functions of these unknown genes might be," says Ryan. "Rather than looking for a needle in a haystack, you now have a smaller pile of hay to work with."