Studying mRNAs, or the transcriptome, can yield important information about how genes work in an organism.
Gene expression refers to the translation of DNA code information to a functional protein. The intermediary for that conversion is messenger RNA, or mRNA. Genes are transcribed into mRNA, and the mRNA is used by the ribosome to build a protein.
Serial analysis of gene expression (SAGE) uses mRNA from a particular sample to create complementary DNA (cDNA) fragments which are then amplified and sequenced using high-throughput sequencing technology.
The mechanism behind SAGE is based on tags which can identify the original transcript, and rapid sequencing of chains of tags linked together. The procedure essentially simplifies sequencing by linking the cDNA segments together in a long chain.
The resulting analysis gives a snapshot of the transcriptome of the sample, including the identity and abundance of each mRNA.
Steps of SAGE
SAGE is a complex protocol with many steps.
- Step 1: mRNA is isolated from the sample and reverse transcribed using biotinylated primers to generate cDNA
- Step 2: cDNA is bound via biotin to streptavidin microbeads
- Step 3: cDNA is cleaved with restriction enzymes freeing it from the beads
- Step 4: Cleaved DNA is washed out, leaving truncated cDNA bound to the beads
- Step 5: Two oligonucleotides with sticky ends are added to the remaining truncated cDNA, in separate samples
- Step 6: Cleaved DNA is “tagged” enzymatically, removing it from the beads
- Step 7: Sticky ends are repaired with DNA polymerase
- Step 8: Blunt ended tags from the two separate samples are ligated together, generating ditags with two different oligonucleotide adapter ends
- Step 9: Ditags are cleaved to remove the oligonucleotides. Ditags will form long cDNA chains, or concatemers
- Step 10: Transform concatemers into bacteria for replication
- Step 11: Isolate concatemers from bacteria and sequence
Challenges when using SAGE
One challenge is that the tags are only about 13 or 14 base pairs. It can be difficult to identify such a short tag if it’s from an unknown gene.
The flip side of that problem is that SAGE can be used to find unknown genes, and in some studies it’s an advantage to be able to measure gene expression quantitatively without prior sequence information.
Tags may also have issues with specificity; multiple genes could share the same tag if there is an overlap in sequence. There also can be inconsistencies with the restriction enzymes, and incompatibilities for certain species.
SAGE and DNA microarray
SAGE is similar in many ways to a DNA microarray; however, in a DNA microarray, the mRNAs hybridize to cDNA probes on the array. In SAGE, the data output is based on sequencing. That means SAGE analysis is more quantitative and it does not depend on the use of known genes.
Microarray experiments are generally less costly, and so are used more often in larger-scale studies.
A study of new markers in cancer illustrates how SAGE can be used in biomedical research.
Researchers compared gene expression levels in cancerous tissues with those in non-cancerous tissues to search for markers that could diagnose the pancreatic cancer at an early stage.
Because the results of a SAGE analysis of a large number of representative tissues had already been published online, the scientists were able to search the database for genes preferentially expressed in pancreatic cancer.
From this, they were able to identify a gene calledprostate stem cell antigen (PCSA), that had previously not been associated with pancreatic cancer.