Recent advances in sequencing techniques have allowed whole genome sequencing to become more common and quicker. Whole genome bisulfite sequencing (WGBS) is one such next generation sequencing technique that allows users to analyze DNA methylation at single base resolution.
DNA methylation is an epigenetic mechanism that regulates gene expression and has applications in several biological fields, such as diseases and cancer.
Scientist analyzes DNA gel used in genetics, forensics, drug discovery, biology and medicine. Image Credit: Gopixa / Shutterstock Methodology
Methylation is critical to several biological functions and has a role in certain disease states. Gene expression is regulated by DNA methylation by recruiting proteins or inhibiting binding of transcription factors to DNA. It occurs during development, where it helps regulate tissue-specific transcription as the cells differentiate.
WGBS combines sodium bisulfide conversion of the sequence with high throughput DNA sequencing. The sodium bisulfite reaction protects methylcytosines from conversion, whereas unmethylated cytosines are converted into uracil. After PCR, they are converted into thymines, whereas the methylated cytosines will appear as cytosines. Methylcytosines are the methylated versions of the cytosine bases. This is done by transferring a methyl group onto the C5 position of the cytosine.
Single nucleotide polymorphism (SNP) genotyping or next generation sequencing is often used for interrogation of DNA methylation of individual CpG sites. CpG sites are cytosine and guanine that are separated by one phosphate group, and are known as CG sites. The human methylome contains around 28 million CG sites in humans. There are also non-CG methylated sites: CHG or CHH, where H represents adenine, thymine, or cytosine.
Genomic libraries can be constructed after bisulfite conversion. One of the first experiments using WGBS found that after filtering to ensure accuracy in their genomic library, they covered around 93% of all cytosines that could theoretically be covered. This was similar to classic coverage seen in a conventional bisulfite-sequencing experiment for a single locus.
However, bisulfite converted sequencing libraries show very low diversity because the bases that appear are predominantly adenine, guanine, and thymine, with a very small fraction of cytosines. This is because the cytosines that appear are only the methylated cytosines, whereas the unmethylated cytosines have been converted into thymine.
Sequencing the whole genome is generally quite expensive. Therefore, although it has been applied to large genomes such as the human genome, large numbers of individual samples are seldom sequenced. Reduced representation bisulfite sequencing (RRBS) has been developed for this, in which the bisulfite reaction occurs but the sequencing is limited to around 1% of the genome. This enables sequencing of genome of several individuals.
Issues with Current Methodology
It has been discovered that a GC content bias in amplification of DNA fragments for high throughput sequencing applications exists. Methylated DNA will have higher GC content post bisulfite conversion and PCR in comparison to unmethylated DNA, which means there may be over representation of methylated DNA when building sequencing libraries.
In addition, overrepresentation of methylated DNA has also been noticed with an increase in the number of cycles during PCR and due to the enzyme used for PCR amplification. The study recommends limiting the number of cycles during PCR and to use an optimal polymerase to amplify the library.
Some methods attempted to avoid PCR amplification when building DNA sequencing libraries. However, this does not work for DNA treated with bisulfite because the uracil would inhibit cluster formation. There would need to be reagents, which included a DNA polymerase sensitive to uracil. This is why uracil is converted to thymine during PCR in WGBS.
The uracils in the bisulfite-treated DNA are replaced by amplification of the DNA at varying PCR cycles. In addition, higher input amounts are needed for the majority of protocols as the sodium bisulfite treatment can degrade DNA leading to limitations with regard to the samples that can be sequenced.
To enhance the accuracy of WGBS, researchers are developing programs that use more than one bisulfite-read mappers. This has proved to increase the accuracy of detection by reducing the effects of read heterogeneity on detecting methylated DNA thereby helping in comprehensive sequencing of WGBS data.