By comparing 140 sequenced bacterial genomes, researchers have uncovered a system for regulating genes essential to bacterial replication - and they did it solely by computer keystrokes and mouse clicks.
Mikhail Gelfand, a Howard Hughes Medical Institute international research scholar at the Institute for Information Transmission Problems (IITP) in Moscow, and his postdoctoral fellow, Dmitry Rodionov, used comparative genomics to identify a new transcription factor system in bacteria that represses expression of genes involved in DNA replication. They scanned gene sequences and proteomes of several taxonomic groups of bacteria, identifying not only a highly conserved signal sequence, but also the regulatory transcription factor that bound it, the repressor nature of the signal, and other genes also regulated by this system.
"We provided a very detailed description of a system just by doing bioinformatics alone," says Gelfand, director of the IITP's research and training center of bioinformatics. "It's a proof of principle that you can go a very long way by comparative genomics now." Their findings will be published in the July issue of Trends in Genetics, with early publication now online. Gelfand is presenting the work on June 24, 2005, at the annual meeting of HHMI international research scholars in Mérida, Mexico.
Gelfand and Rodionov started their search using a technique called phylogenetic footprinting to review the upstream DNA sequences of a group of genes that code for ribonucleotide reductase enzymes. These enzymes convert the ribonucleotide building blocks of RNA into the deoxyribonucleotides used to build DNA. This conversion is critical for duplicating a bacterium's entire genome before it divides to reproduce.
The search revealed a conserved palindromic sequence occurring upstream of many ribonucleotide reductase (Nrd) genes. A genetic palindrome is a sequence of nucleotides on one strand of DNA that reads the same as the sequence on the opposite strand, only backwards - a common feature of DNA sequences that are recognized by regulatory molecules. They designated the sequence the NrdR-box.
Because the signal was found in so many diverse groups of bacteria, the researchers thought it might represent a universal regulatory mechanism. The next question was whether the signal was promoting or repressing expression of Nrd genes.
The team observed that their signal always overlapped with the promoter signal, the region of DNA required for the initiation of the conversion of gene to protein. Molecules that promote transcription recognize and bind to this sequence, which lies just outside of the gene. Repressor signals commonly work by allowing other proteins to bind on top of the promoter sequence and physically block promoters. Therefore, the duo predicted that the NrdR-box functioned as a repressor sequence.
Next, the researchers identified the transcription factor protein that binds to the NrdR-box. To do this, they used a bioinformatics approach they call phylogenetic profiling, compiling a list of genomes that clearly contained the NrdR-box and those that clearly did not have it. Then they searched the proteomes of 63 bacteria species, looking for proteins that strictly followed the same present-or-absent pattern as the NrdR-box. Only one protein cluster matched the pattern, and it represented a family of proteins that shared traits of transcription factors.