Finding out where gene-regulator proteins bind to DNA and identifying the genes they regulate just got a step easier thanks to a new technique developed by scientists at the U.S. Department of Energy’s Brookhaven National Laboratory.
The technique could greatly speed the process of unraveling the role these proteins play in turning on and off the genes that establish the very identity of cells — be they brain cells, liver, or blood — as well as what might go awry in certain conditions like cancer.
The Brookhaven scientists, together with collaborators from Oregon Health & Science University, Emory School of Medicine, and Stony Brook University, have published the first results using their technique in the December 29, 2004 issue of Cell, where they describe the human-genome binding sites of a regulator protein known as CREB.
“Though scientists have now decoded almost all of the entire human genome — the series of nucleotide bases (labeled A, T, G, and C) that make up the source code for running the machinery of the cell — we are just beginning to decipher it,” said biologist John Dunn, who led Brookhaven’s role in the research. “It’s as if we have in our hands a giant book of life, but we are barely beginning to learn how to read it.”
“Our technique gives us a new way to index the code, to find the places where regulators act — where the on/off switches are that determine which genes are at work in different types of cells under different conditions,” he said.
Previous, individual experiments have identified about 100 places where CREB binds to DNA and regulates genes in humans, so scientists know it is important, particularly in regulating cell differentiation, survival, and the function of nerve cells. But there has been no easy way to screen the entire genome. “We are the first to do a genome-wide survey,” says Dunn.
The problem has been the sheer magnitude of information in the genome: three billion nucleotides, and many tens of thousands genes. Trying to ascertain which of these genes CREB regulates by more traditional methods, evaluating one gene at a time, would be too labor-intensive, expensive, and take a very long time.
Scientists have been working on short cuts, but all so far have limits. For example, in one recent technique, scientists mix the regulator protein of interest (let’s say, CREB) with the entire genome of a cell and let it bind. Then they fragment the genome into smaller pieces, 500 to 1,500 nucleotide bases long. Using antibodies that specifically recognize and bind to CREB, they then isolate the pieces that have CREB attached, and wash all the others away. Scientists can use traditional gene-sequencing methods to decipher the sequence of A, T, G, and C on these fragments and then locate their original positions on the genome, but it is a slow and somewhat expensive process. Other methods, such as matching the pieces to their compliments on microarrays, are limited by the size of genome they can analyze.
Dunn’s team has come up with a technique to determine the positions of these 500-to-1,500-base-long pieces on an entire genome — even one as large as the human genome — relatively quickly and in very large numbers.
After isolating the CREB-bound fragments, they release the CREB and then cut the DNA with a “restriction” enzyme that recognizes a particular nucleotide base sequence, CATG. Further manipulation of the cut ends allows them to isolate 20-base-long-fragments, or “tags,” in each direction from these cut sites. This gives the scientists a large number of very short DNA “tag” sequences, which all have the same start sequence of CATG plus 16 additional unknown bases. The scientists then “glue” together the tags into chains and sequence them.
While small in length, the tags are large enough to allow computer searches to locate their specific positions on the complete genome in a manner that is analogous to a laser scanner reading barcodes on items in the grocery store. In this case the inventory isn’t cans of food but the database of the human, mouse, or rat genome sequences. And because the scientists know that each of these tags lies within, at most, 500 to 1,500 bases of a CREB-binding site, they are able to search the region around the specific locations for CREB binding sequences or genes that might be regulated by CREB.
Using this technique, the scientists have identified some 6,302 genome-binding sites for CREB, including many that are located near known genes. Genes identified as being regulated by CREB using this method include the gene responsible for causing Huntington’s disease in mice, which is important for making advances in understanding Huntington’s disease in humans, and genes that may play an important role in certain cancers.
“This technique can be applied to any protein that binds to specific sequences in the DNA and promises to be a very useful tool,” says Dunn.
This research was funded by the Office of Biological and Environmental Research within the U.S. Department of Energy’s Office of Science, the National Institutes of Health, and the Howard Hughes Medical Institute. The Department of Energy’s Office of Science was a founder of the Human Genome Project, a nationwide effort to generate the instrumentation and biological and computational resources necessary to sequence the entire human genome, identify all functional genes, and help transfer this information and related technology to the private sector for the benefit of society (see DOEgenomes.org).