Non-coding RNAs constitute almost 98% of the human genome. Long non-coding RNA (Lnc RNA) is a large class of this family where the RNA sequences are more than 200 nucleotides long and do not encode for proteins.
Classification of Lnc RNA
Lnc RNA can be further classified based on their position in the genome compared to the protein coding genes. They can be broadly divided in to five groups. This classification is only position based and does not provide information about the functions of lnc RNA.
Stand-Alone Lnc RNA
These sequences are present in between the coding sequences and are also called “linc RNAs” or “large intergenic non-coding RNAs”. This category of lnc RNA has a sequence length of 1kB and is often transcribed by RNA Pol II. Xist, HOTAIR, and MALAT1 are some of the common stand-alone lnc RNAs
Natural Antisense Transcripts (NATs)
Transcription is known to occur in both sense and anti-sense strands where almost 70% of sense strands have complementary anti-sense strands. Sense-anti sense pairs (SAS) have been shown to be formed by two coding mRNAs and also lnc RNA SAS pairs. For example, Xist/Tsix is a lnc RNA SAS which controls the X chromosome inactivation. Some of the SAS pairs may also contain coding/non-coding pairs. However, the biological function of most NATs still remains unclear.
Pseudogenes do not code for proteins due to nonsense, frameshift, or other mutations. Most of the pseudogenes do not undergo transcription and are “dead”; however 2−20% of pseudogenes are transcribed and sometimes may even undergo translation. Few of these pseudogenes have been shown to regulate the post-transcriptional expression of genes. Some studies hypothesize that lnc RNA such as Xist may have evolved by pseudogenisation of protein coding genes.
Long Intronic ncRNAs
Apart from the small non-coding RNAs such as snoRNA and miRNA, introns have also been recently shown to possess longer non-coding sequences. These sequences have been shown to have specific expression patterns and respond to certain stimuli. However, the effect of these sequences needs to be studied further.
Divergent Transcripts and Enhancer RNAs
Several non-coding transcripts which range from 20-2500 nucleotides are present which may be associated with transcription start site, degradation or longer antisense RNAs. These transcripts along with enhancers and bidirectional transcripts have no known biological function yet.
Functions of Long Non-Coding RNA
Lnc RNA is being increasingly associated with several developmental processes and disease states.
Regulation of Allelic Expression
Lnc RNA has been shown to be involved in dosage compensation and genomic imprinting. The difference in the dosage of X-linked genes between XX females and XY males is compensated by a mechanism called dosage compensation where one of the X chromosomes is inactivated. Lnc RNA has been shown to regulate X chromosome inactivation. They are also involved in genome imprinting where the gene is expressed according to its parent of origin.
Role in Development
Lnc RNA has several roles during development, ranging from regulating pluripotency to specifying lineage. Pluripotent transcription factors localize to the introns of lnc RNA to regulate them and they control the pairing of X chromosomes. Lnc RNA also regulates the expression of Hox genes which are involved in the anterior posterior pattern formation.
Role in Cancer
Some of Lnc RNA has been shown to promote cell proliferation and repress tumor suppressors. They also effect the transcription of cytoskeletal and extracellular matrix proteins.