Bioinformatics is the application of information techniques for the storage, retrieval, and analysis of large quantities of biological data.
Image Credit: mopic / Shutterstock
Through the development of algorithms and statistical testing, research can be carried out faster and more accurately. Bioinformatics is used in different fields of research; however, it is especially important in genomics, such as in genome analysis, gene identification, genome-wide association studies and evolutionary studies.
Bioinformatics and genomic analysis
Traditional and next-generation processes aim to sequence the genome allowing the analysis of DNA sequences. However, these methods produce many fragments of DNA, like fragments of a jigsaw puzzle, which need to be aligned and compiled to create a final complete sequence. The use of bioinformatics can align these fragments quickly and cheaply, aiding genomic sequencing.
The human genome was initially sequenced between 1990 and 2003, and has since been uploaded online and extensively annotated. Annotation is the process whereby genes and their protein products are labeled directly onto the genome.
The volume and complexity of the produced data would have taken many years to compile manually. However, with the advent of bioinformatics, scientists have the capacity to carry out the compilation and annotation processes quickly and with better precision.
Bioinformatics and identification of mutations
Bioinformatics is vital in the research of de novo mutations. One example of a method that is used to identify these mutations is whole exome sequencing. Whole exome sequencing is used to sequence only the protein coding regions of DNA (the exomes), which makes up only 1% of the genome, thereby making it much faster than genome sequencing.
However, large quantities of data are produced whereby bioinformatics application becomes vital for data curation, sequence alignment, and analysis.
An example of the application of bioinformatics in genome sequencing is the diagnosis of Cantu syndrome. This syndrome is characterized by cardiac defects, unique facial features, and excessive hair. One study compared the exome of a child (with the condition) to the parent exomes (without the condition) which resulted in the identification of five candidate genes that were significantly different.
These genes were then sequenced allowing for the identification of a causative dominant missense mutation in the ABCC9 gene. The ABCC9 protein is part of the ATP-dependent potassium channel responsible for relaying chemical messages across cells.
This mutation has also been identified in many other patients with this condition, and therefore it has been suggested that loss of function of this kinase results in Cantu syndrome.
Using whole exome sequencing and bioinformatics, 50% of rare diseases genes have so far been identified, with the rest is expected to be sequenced by 2020.
Another use of bioinformatics is in the identification of cancerous mutations. Through the development of automated systems, large volumes of sequential data can be produced and used to identify previously unknown point mutations.
Bioinformatics also works to create new algorithms that can compare different sequences, thereby aiding in identification of mutations.
Bioinformatics and genome-wide association studies
Genome-wide association studies (GWAS) carry out genomic scans with the attempt to identify specific markers that can indicate an individual’s susceptibility to a genetic disease. Genetic association between a specific marker and the disease can improve detection and treatment. If used on a large scale, this can also aid in the development of prophylactic treatments.
To carry out GWAS, the genomes of individuals with a disease and those without a disease are compared. Development of highly automated systems has led to the high-throughput identification of single nucleotide polymorphisms (SNPS).
By comparing SNPS, those which are more common in individuals with the disease can be identified and used as disease markers. This information is then stored online and made available to scientists across the globe.
The first published GWAS was age-related macular degeneration (AMD). Out of 116,204 SNPS that were genotyped, one study observed a link between the complement factor “H” (CFH) gene and AMD. Therefore, individuals susceptible to AMD can be screened for the presence of the CFH gene.
Several other disease genes have been characterized after that with the intention of helping doctors and other health care professionals in identifying possible risk of a genetic disease and allowing for appropriate disease management.
Bioinformatics and evolutionary studies
By studying the changes in DNA within organisms and comparing them to other species, the genetic changes associated with evolution can be classified. Evolution is the process that involves small, cumulative changes in DNA that eventually leads to the formation of novel species.
Bioinformatics has aided research in the evolutionary process by allowing comparison of DNA sequences, sharing of data, prediction of future evolution and classification of complex evolutionary processes.
When put together, the data can be used to create a phylogenetic tree that can trace several species to their original ancestry.
These are only a few of the myriad applications of bioinformatics within genetics. Overall, bioinformatics has thrown open enormous opportunities in the field of genomics and targeted gene therapy.