Compound Identification for Nontargeted Metabolomics

Download PDF Copy

In an untargeted metabolomic approach, thousands of metabolites are studied in one sample; however, the chemical identity of the compound is not known as in the targeted approach.

The data about the compound are acquired during the study or after the completion of the study, and this information is used to annotate or determine the metabolites.

Image Credit: toeytoey/ Shutterstock.com

In order to maximize the interpretation of the sample and its impact, robust compound identification is performed in a non-targeted approach. With robust identification or annotation, biological data interpretation is done for a single research study and the data from varied research groups can be compared.

Metabolite annotation and identification standards

The Chemical Analysis Working Group (CAWG) of the Metabolomics Standards Initiative (MSI) and Core Information for Metabolomics Reporting (CIMR) has categorized metabolites into four categories. They are:

Level 1: Identified metabolites
Level 2: putatively annotated compounds
Level 3: putatively characterized compound classes
Level 4: unknown compounds

Researchers need to clearly distinguish and identify the level while reporting the metabolites.

Most of the metabolites that have already been identified, characterized, and reported are of non-novel nature. Metabolites that were not recognized during the first time are recognized depending on co-characterization with reliable samples. A minimum standard of reporting for Level 1 non-novel compounds that are identified is made available.

Metabolites that are not available commercially are identified as putatively annotated compounds (Level 2). Chemical reference standards are not available for these metabolites; however, they are identified based on the physicochemical properties that are in line with spectral databases.

In the spectral and chemical databases, no structure is proposed for metabolites identified as putatively characterized compound classes (Level 3). These compounds are identified depending on the distinctive physicochemical similarities or spectral properties of a given chemical class of compounds.

Compounds are termed as unknown compounds (Level 4) when they are not identified or classified, yet can be distinguished and computed depending on the available spectral data.

Nomenclature for the metabolites

CAWG supports the traditional principles for identifying novel metabolites and recommends researchers name the compound in line with the International Union of Pure and Applied Chemistry (IUPAC) nomenclature.

For non-novel metabolites, IUPAC provides the standards for compound classification. Since these names are lengthy and most complex, they are replaced with short and common names.

The numerical identifiers mentioned below can be used to reference compounds.

IUPAC International Chemical Identifier (InChI)
Chemical Entities of Biological Interest (ChEBI)
Simplified Molecular Input Line Entry Specification (SMILES)
Chemical Abstract Service (CAS)
PubChem compound identifier (CID)
Molfile

The proprietary property of CAS numbers makes it less favorable. Most often InChI, CID, and SMILES codes are preferred. CAWG suggests using InChI codes as they are convenient for exchanging data and communicating with databases and recommends researchers report at least one IUPAC chemical name or any common name along with a structural code for the metabolites that are identified in order to publicize them.

In the case of metabolites that are not identified, minimum standards are established for reporting them. For example, pesticides or herbicides need to be thoroughly differentiated from endogenous metabolites for all compounds that are not known.

Key initiatives

There are many key initiatives and groups working to further develop and apply the standards.

MetaboLights is the first among the general-purpose public metabolomic repositories to meet MSI standards on reporting metadata. Authors can report the correct intended name of the compound and map it to the available metabolite database such as ChEBI and the level of confidence is also defined. The analytical metadata helps to track the compounds that are not known.
METLIN is a curated database that offers multiple searching capabilities including a similarity search for characterization of unknown compounds.
The isoMETLIN metabolite database is intended for isotope-based metabolomics. Metabolites that incorporate isotopic labels can be identified using isoMETLIN. With the help of isoMETLIN, users can search computed isotopologues greater than 1 million that are derived from the database METLIN. Depending on specific isotopes such as 13C or 15N and based on mass-to-charge values, the search is carried out.
Coordination of Standards in Metabolomics (COSMOS) assists data providers across the European Union to establish and promote standards for easier distribution of metabolomic data.
Metabolomics Workbench assists in updating available standards and in creating standards that are missing in metabolomics.
The metabolomics society has initiated Metabolite Identification Task Groups and The Data Standards to make sure standards are evolved continuously to meet the future requirements. The goal of the society was to provide effective communication and coordination among people who are involved in the development of standards and other stakeholders. The role of Metabolite Identification Task Group is to provide engagement among the members of the community regarding the use of reporting standards for identifying the compound.