Since the genome sequence of the bacterial pathogen Haemophilus influenzae was published in 1995, the genetic code of many other large, complex, medically, and commercially significant organisms including humans has also been elucidated.
However, the techniques used to derive these genetic sequences are imperfect, and many researchers may be unaware of potential errors lurking within the publicly available published, or "canonical" sequence. If an organism's genome is unstable, variable, and contains rearrangements within a population or between strains, there may be no single true linear structure that will be valid for that organism, and imposing a linear sequence may not be biologically meaningful.
Now, researchers at Cold Spring Harbor Laboratory and New York University describe a high throughput microarray technique that involves testing many samples simultaneously and which can be used to assemble physical maps and validate genomic sequence assemblies. The findings appear in the latest issue of the Journal of Computational Biology.
The research was conducted by Joseph West, John Healy, and Michael Wigler of Cold Spring Harbor Laboratory, and William Casey and Bud Mishra of NYU's Courant Institute of Mathematical Sciences. Mishra is a Professor of Computer Science and Mathematics at the Courant Institute and also has an appointment in the Department of Cell Biology at NYU's School of Medicine.
Using their micro-array hybridization method, which used flourescently labeled snippets from the genome of the fission yeast S. pombe and examined how they bind to probes arrayed on a glass slide, they were able to computationally derive the "distance" between probes in the genome and organize the probes along the genome. The resulting physical map of the S. pombe genome was compared to the corresponding map computed from publicly available S. pombe sequence. The comparison showed a small number of significant discrepancies between their results and that of the map derived from the public sequence released in 2002. S. pombe's genome is only about 14 million bases long (almost a thousandth of the human genome), and is widely considered to be a gold-standard in whole-genome assembly.