By Dr Tomislav Meštrović, MD, PhD
MicroRNAs (miRNAs) are evolutionary conserved small non-coding RNA molecules that post-transcriptionally regulate gene expression by base-pairing to mRNAs. Recent evidence shows that miRNA-mediated gene regulation is pivotal for normal cellular functions, and as much as one-third of human mRNAs may be miRNA targets.
Tens of thousands miRNAs in over 150 species were discovered to date. When working with such vast number of protein-coding genes, proper nomenclature is important to distinguish between gene loci, transcript and products. Correspondingly, there is a specific nomenclature for miRNAs as well.
Rules of nomenclature
Experts agreed on the characteristics that must be met for a sequence to be considered miRNA. Those include the detection of a 22-nucleotide RNA molecule, identification of that molecule on a pool of complementary DNA made from RNA with specified sizes, phylogenetic conservation of the molecule and the presence of a hairpin which has important regulatory roles.
Typically, both the gene locus and precursor miRNA (pre-miRNA) of a miRNA is referred as "mir", while the mature miRNA product is designated "miR". When miRNAs are closely related in terms of their sequence, they are given additional suffixes in form of numbers and letters to distinguish them (e.g. mir-33 or mir-4990).
In addition, each name is preceded by three letters specific for each species. For humans (Homo sapiens) those letters are "hsa" (e.g. hsa-mir-367), for a common rat (Rattus norvegicus) they are "rno" (e.g. rno-mir-1), while ggo-mir-155 is an example of gorilla's miRNA name.
Multiple miRNAs can be evolutionary related, thus a letter after the number in the suffix is used to differentiate among multiple members of the same family (e.g. hsa-mir-451a and hsa-mir-451b). If two diverse loci produce identical mature products, additional number is given after the full name. For instance, ggo-mir-515-1 and ggo-mir-515-2 produce the same final microRNA product: ggo-miR-515.
Names referring to genomic loci should be written in italics for easier differentiation from mature sequences. It is also a good practice to add a tag to the name indicating from which double-stranded RNA (made in the process) the mature sequence comes from (e.g. dme-miR-1-5p from the 5' arm of the precursor and dme-miR-1-3p from the 3' arm of the precursor).
A few exceptions exist for miRNAs discovered before the naming system described became a standard. Their names are in common use from earlier mutation screens, such as lin-4 and let-7 from the model roundworm Caenorhabditis elegans. Such non-sequential identifiers are discouraged for newly discovered miRNAs.
A database of miRNA sequences
The miRBase database (formerly known as the microRNA Registry) represents the primary online repository for miRNA sequences and annotation. The initial goal was to maintain consistent gene nomenclature and nucleotide sequence for both published and unpublished miRNA sequences. Today it provides a comprehensive access to all published miRNAs.
As a result of the increasing number of small RNA sequencing efforts, the database has grown exponentially since its initial publication in 2002. Furthermore, technical advances in the field of massively parallel sequencing have led to the detection of more miRNAs, but to more reliable sequence annotation as well.
The database contains miRNAs from two principally different sources. Experimentally verified mature miRNAs are annotated with the experimental method used for discovery and primary literature references. In this database, sequences that are proposed homologs of miRNAs verified in a related organism can also be found. Unique names for novel miRNA genes are also provided to gene hunters prior to publication of results.
- Enright AJ, Griffiths-Jones S. miRBase: a database of microRNA sequences, targets and nomenclature. In: Appasani K. MicroRNAs: From Basic Science to Disease Biology. Cambridge University Press, 2008; pp. 157-171.
Last Updated: Feb 23, 2015