Analyzing Metabolomics Data

NewsGuard 100/100 Score

The availability of data is the foremost step in analysis. There are several metabolomic databases available, each of them serving a different purpose. Their goal is to put the metabolites in some order so that it becomes easy for researchers to spot and analyze the data.

Image Credit: toeytoey / Shutterstock
Image Credit: toeytoey / Shutterstock

The Human Metabolome Database (HMDB) has over 40,000 metabolite entries and aims to recognize all the metabolites present in humans. MassBank is a spectral database that has over 39,000 entries. Over 75,000 entries are available in the database METLIN—a database for bacteria, animals, and plants. Lipid metabolites and pathways strategy (LIPID MAPS) is the largest repository for lipid molecular structures. Madison metabolomics consortium database has over 20,000 data, a resource for mass spectrometry and nuclear magnetic resonance-based metabolomics research. The metabolic pathways are contained in the Kyoto Encyclopedia of Genes and Genomes (KEGG).

Metabolomics Data Processing and Data Analysis

Types of Data Analysis Techniques

The two important factors to consider while analyzing data are their organization and visualization, so that the data can be interpreted or hypotheses can be devised. There are four major techniques available to analyze metabolomics data. They are:

  • Unsupervised learning method
  • Supervised learning methods
  • Pathway analysis methods
  • Time course data methods

Unsupervised Learning Method

During the analysis stage, we may want to get an idea of the data structure.

Unsupervised learning helps to learn about the data; more precisely, it helps to discover the data trend. The data are not labeled under any class and the unsupervised learning method will discover the data. Thus the researcher has little information or assumptions about the data that are under analysis. Being the first step in the analysis process, unsupervised learning assists in visualizing the data. The following four methods are the most frequently used for analyzing metabolites.

  • Principal component analysis (PCA) — When the number of metabolites is greater, finding few combinations of the data helps in dimension reduction. When applying the PCA algorithm in the original dataset, the total variation can be found. Most information about the dataset is retained by the principal components that actually replace all the correlated variables. A score plot is used to find the groups, while a loading plot is to discover variables that separate the groups from each other.
  • Clustering technique helps to group data that are similar, so that the data in one cluster are alike and relatable when compared with the data in another cluster. The two most widely used clustering techniques in metabolomics are k-means clustering and hierarchical clustering. In k-means clustering, the data are divided into k clusters that do not overlap. Unlike k-means clustering, hierarchical clustering does not stop at finding specific numbers of clusters, but continues to split all the data until a hierarchy of clusters is formed. It is often combined with a heat map for data matrix visualization.
  • Self-organizing map (SOM) is a visualization tool that assists in visual discovery of the clusters present in data.

Supervised Learning mMethod

Widely used in the biomarker discovery, categorization, and prediction, supervised learning methods deal with datasets having response variables that are either continuous or discrete. These methods find the association between covariates and response variables, and accurate predictions are made.

Partial least squares (PLS) is mostly used in metabolomics research. PLS is widely used for identifying biomarkers and in classifying diseases, while support vector machine (SVM) is used in cancer research.

Pathway Analysis Methods

Pathway analysis helps to find the biological mechanisms in the list of identified metabolites. The two most common methods are 1) over-representation analysis (ORA) and 2) functional class scoring (FCS).

ORA is the simplest method, which is performed when the pathways differ considerably among the two study groups. Some limitations of ORA are addressed by the functional class scoring (FCS) method. Single metabolite statistics are obtained first and these are aggregated to evaluate a pathway-level statistic, either univariate or multivariate. Most often enrichment score, mean, and meridian are used for univariate pathway-level statistics. Hotelling’s T2 statistic is widely used for multivariate statistics

Time Course Data Methods

The metabolites concentration may vary with time, thus a time dimension is created in the dataset. We need to include a time dimension and continue using unsupervised learning methods and visualization tools such as PCA and SOM. Additionally, profile graphs can be drawn to check the profiles of the metabolites for the various clusters. Statistical techniques such as analysis of variance (ANOVA)-based models are used to compare the different change patterns of the metabolites.

During the metabolomics research process, researchers observed some differences among metabolites within the same cluster. When the variation is greater, the repeated measures (RM) model is used. When it is required to analyze many metabolites in parallel by considering the structure of the correlation, a generalized ANOVA called ANOVA-simultaneous component analysis (ASCA) is used.

Methods such as time series data analysis, functional-based method, smoothing splines mixed effects model, and hierarchical linear model are also suggested.

Further Reading

Last Updated: Jul 19, 2023

Afsaneh Khetrapal

Written by

Afsaneh Khetrapal

Afsaneh graduated from Warwick University with a First class honours degree in Biomedical science. During her time here her love for neuroscience and scientific journalism only grew and have now steered her into a career with the journal, Scientific Reports under Springer Nature. Of course, she isn’t always immersed in all things science and literary; her free time involves a lot of oil painting and beach-side walks too.


Please use one of the following formats to cite this article in your essay, paper or report:

  • APA

    Khetrapal, Afsaneh. (2023, July 19). Analyzing Metabolomics Data. News-Medical. Retrieved on May 25, 2024 from

  • MLA

    Khetrapal, Afsaneh. "Analyzing Metabolomics Data". News-Medical. 25 May 2024. <>.

  • Chicago

    Khetrapal, Afsaneh. "Analyzing Metabolomics Data". News-Medical. (accessed May 25, 2024).

  • Harvard

    Khetrapal, Afsaneh. 2023. Analyzing Metabolomics Data. News-Medical, viewed 25 May 2024,


The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of News Medical.
Post a new comment

While we only use edited and approved content for Azthena answers, it may on occasions provide incorrect responses. Please confirm any data provided with the related suppliers or authors. We do not provide medical advice, if you search for medical information you must always consult a medical professional before acting on any information provided.

Your questions, but not your email details will be shared with OpenAI and retained for 30 days in accordance with their privacy principles.

Please do not ask questions that use sensitive or confidential information.

Read the full Terms & Conditions.

You might also like...
Mediterranean diet and exercise reshape gut microbiome, aiding weight loss