Only 2% of the DNA sequence in the human genome is used to make proteins, while the rest are noncoding DNA sequences without completely explained function, colloquially also known as junk DNA. Throughout evolution, the non-coding segment of eukaryotic genomes has been expanded by various mechanisms such as deletions or insertions of DNA sequences, as well as by the whole genome duplication process.
Unlike the coding part, non-coding region can vary tremendously in size – even between closely related species. If we compare the amount of non-coding genomic DNA sequence that aligns between human and mouse, significant variability in different regions of the genomes can be observed, and the fraction of repetitive DNA also varies. The proportion of functionality within such abundant non-coding regions remains a contentious question.
Functional fraction of the genome
A bulk of DNA sequences can have important functional roles, despite the fact that they do not encode proteins. Recent research conducted by high resolution evolutionary approach shows that a total of 8.2% (7.1–9.2%) of the human genome can be deemed functional, which represents three times more functionality when compared to a genetic counterpart of a mouse.
A large number of non-coding sequences produce RNA molecules that can regulate gene expression by turning them on and off. The DNA from which such regulatory RNA is transcribed may be very secluded from the genes they control, sometimes even located on different chromosomes. Other DNA sequences contain enhancer or inhibitory elements.
The significant proportion of the mammalian genome (13.6%) may function via formation of highly conserved and specific RNA secondary structures. Since a myriad of them act as a secondary structure elements of functional importance, they are often used as a tool to study evolutionary selection in higher eukaryotes.
Three major fractions of eukaryotic DNA
A large fraction of non-coding DNA in eukaryotic cell is forming multiple copies of DNA sequences in the genome, generally referred to as repetitious DNA. While some of these sequences are quite short, others can be significantly longer and interspersed at various locations within the genome. The existence of these repetitive sequences was initially recognized in experiments where denatured eukaryotic DNA was observed to renature nonuniformly; in other words, some of it demonstrated a more rapid reassociation when compared to the bulk of cellular DNA.
Approximately 50-60% of mammalian DNA reassociates at a slow rate, indicating that it consists primarily of a single-copy DNA. Since only one copy of each gene is contained in such haploid DNA set, the single-copy DNA fraction contains practically all of the genes that encode mRNA (and eventually proteins).
For another 25-40% percent of mammalian DNA an intermediate rate of reassociation has been shown. This DNA is primarily composed of a large number of copies, characteristic for a relatively small number of sequence families in a specific organism. Because these sequences can sometimes be copied and reinserted into a new place within a genome, they are also known as mobile DNA elements.
Lastly, approximately 10-15% of mammalian DNA reassociates at a very rapid rate. This rapid reassociating of repetitious DNA is also referred to as simple-sequence DNA; it is composed predominantly of several different sets of short (up to 10 base pairs) sequences in long arrays of tandem, adjacent repeats.