The Dark Matter of the Genome

Only 1−2% of the human genome encodes protein coding genes. The rest of the genome consists of non-coding RNA, untranslated regions, splice sites and transposable elements. Most of the functions of these elements are unknown.

Dark Matter of the genomeImage Credit: Black Prometheus / Shutterstock

Non-coding RNAs

Non-coding RNAs or ncRNAs are transcripts which do not code for proteins. Micro RNAs (miRNAs) are small ncRNA with 18−25 nucleotides. They can bind to complimentary regions on mRNA and prevent their translation and reduce the stability.

Deletion of miRNA has also been associated with cancer progression. Apart from deletion, point mutations in miRNA can also affect miRNA processing and its target recognition of the mRNA sequence.

Long non-coding RNA

More than 50,000 non-coding RNAs are transcribed in the human genome which are not translated in to proteins. These non-coding RNAs are mostly longer than 200 nucleotides in length; hence they have been termed long non-coding RNAs (Lnc RNA).

Although they do not code for proteins, studies have uncovered critical roles they play in the regulation of several processes inside the cell. Lnc RNA can be present in nucleus or cytoplasm where they have been shown to regulate the cell cycle, cell differentiation, proliferation, and transcriptional regulation of gene expression. Lnc RNA can act by recruiting epigenetic effectors which can modify the expression of protein coding genes without altering the DNA sequence.

Transposable elements

A large portion of the non-coding regions are constituted of “jumping genes” or “transposable elements”. These regions can “jump” from one region of the genome to another. Several functions have been attributed to these genes. They can encode regulatory sequences which in turn regulate the expression of protein coding genes.

As these genes can move and insert themselves in to different regions, they can sometimes enhance, reduce, or totally stop the expression of coding sequences based on where they get inserted.  For example, some of these genes have been found to be involved in the neurodegenerative disease Amyotrophic Lateral Sclerosis (ALS).

Regulatory elements

Although regulatory regions do not code for proteins, they contain promoters and enhancers which can influence the expression of coding genes. Also, any structural alterations in these regions, such as translocations, deletions, insertions, or duplications can lead to changes in the interaction between the regulatory elements and coding genes. Many of them are also present in the vicinity of oncogenes and regulate their activation or repression.

5’-Untranslated regions (5’-UTR)

5’-UTR as the name suggests, are sequences which are not translated, and they lie adjacent to the coding regions in mRNA.  Although functions of all the 5’-UTRs are not known, many of them have been found to regulate translation or mRNA stability through different mechanisms.

They can also influence translation of coding regions by reducing the access of translational machinery to the coding regions. Mutations in this region can also lead to creation of initiation codons. For example, generation of premature start codons by mutations in 5’-UTR have been shown to create melanoma.  But the functional characterization of 5’-UTR and their mutations is still incomplete.

Introns and splice sites

Introns are also non-coding regions, and often mutations and alterations in introns and intronic splice sites do not receive much attention. However, changes in the splice sites in introns can lead to deletion of exons or inclusion of introns present next to them.

Many cancers are associated with mutations in intronic splice sites which lead to deletion of essential exons.  Introns may also contain regulatory elements, and mutations may lead to destruction of those sequences leading to change in the gene expression.

Although the non-coding region constitutes almost 98% of our genome, they may contain important regulatory factors which control the levels and expression of the 2% of the coding regions.

Further Reading

Last Updated: Feb 26, 2019

Dr. Surat P

Written by

Dr. Surat P

Dr. Surat graduated with a Ph.D. in Cell Biology and Mechanobiology from the Tata Institute of Fundamental Research (Mumbai, India) in 2016. Prior to her Ph.D., Surat studied for a Bachelor of Science (B.Sc.) degree in Zoology, during which she was the recipient of an Indian Academy of Sciences Summer Fellowship to study the proteins involved in AIDs. She produces feature articles on a wide range of topics, such as medical ethics, data manipulation, pseudoscience and superstition, education, and human evolution. She is passionate about science communication and writes articles covering all areas of the life sciences.  


Please use one of the following formats to cite this article in your essay, paper or report:

  • APA

    P, Surat. (2019, February 26). The Dark Matter of the Genome. News-Medical. Retrieved on July 25, 2024 from

  • MLA

    P, Surat. "The Dark Matter of the Genome". News-Medical. 25 July 2024. <>.

  • Chicago

    P, Surat. "The Dark Matter of the Genome". News-Medical. (accessed July 25, 2024).

  • Harvard

    P, Surat. 2019. The Dark Matter of the Genome. News-Medical, viewed 25 July 2024,


The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of News Medical.
Post a new comment

While we only use edited and approved content for Azthena answers, it may on occasions provide incorrect responses. Please confirm any data provided with the related suppliers or authors. We do not provide medical advice, if you search for medical information you must always consult a medical professional before acting on any information provided.

Your questions, but not your email details will be shared with OpenAI and retained for 30 days in accordance with their privacy principles.

Please do not ask questions that use sensitive or confidential information.

Read the full Terms & Conditions.

You might also like...
Scale Biosciences announces ScalePlex technology to simplify single cell genomics studies of any scale