Development and Application of Statistical Methods for Analyzing Metabolomics Data

A special issue of Metabolites (ISSN 2218-1989). This special issue belongs to the section "Bioinformatics and Data Analysis".

Deadline for manuscript submissions: closed (31 January 2021) | Viewed by 37698

Special Issue Editors


E-Mail Website
Guest Editor
Biometris, Wageningen University and Research Centre, 6708 PB Wageningen, The Netherlands
Interests: metabolomics analytics; data mining statistics; food science; data science; data Analysis; data visualization; exploratory data analysis; computational statistics
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
Biometris, Wageningen University & Research, Wageningen, The Netherlands
Interests: metabolomics; high-dimensional data analysis; statistics; regularization

Special Issue Information

Dear Colleagues,

In the last decade, the field of metabolomics has developed tremendously: it is now possible to routinely measure a wide range of metabolites for many specimens at reduced costs. This opens the door to many exciting experiments such as time resolved metabolomics, multi-sample and multi-species metabolomics, or cross-omics experiments, to name but a few. Data analysis is a crucial step to be able to extract meaningful information from the complex data thus acquired. Because of this, the rapid developments in powerful metabolomics experiments have to be matched with developments in statistical methodology for analysis of these experiments.  

This Special Issue is dedicated to the development or application of statistical methods for analyzing metabolomics data. We invite researchers to submit their manuscripts outlining novel data processing and data analysis methods for metabolomics. However, the scope of this Special Issue is not limited to this topic, but also includes experimental design, data acquisition methods, and applied metabolomics studies in which data analysis played an especially attractive role. 

Dr. Jos Hageman
Dr. Jasper Engel
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Metabolites is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2700 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • Univariate/multivariate statistics
  • Chemometrics
  • Data analysis
  • Experimental design

Published Papers (11 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Editorial

Jump to: Research, Review

2 pages, 151 KiB  
Editorial
Special Issue: Development and Application of Statistical Methods for Analyzing Metabolomics Data
by Jos Hageman and Jasper Engel
Metabolites 2021, 11(7), 451; https://0-doi-org.brum.beds.ac.uk/10.3390/metabo11070451 - 13 Jul 2021
Viewed by 1899
Abstract
In the last decade, the field of metabolomics has developed tremendously: it is now possible to routinely measure a wide range of metabolites for many specimens at reduced costs, opening the door to many exciting experiments [...] Full article

Research

Jump to: Editorial, Review

20 pages, 1640 KiB  
Article
Improved One-Class Modeling of High-Dimensional Metabolomics Data via Eigenvalue-Shrinkage
by Alberto Brini, Vahe Avagyan, Ric C. H. de Vos, Jack H. Vossen, Edwin R. van den Heuvel and Jasper Engel
Metabolites 2021, 11(4), 237; https://0-doi-org.brum.beds.ac.uk/10.3390/metabo11040237 - 13 Apr 2021
Cited by 3 | Viewed by 2334
Abstract
One-class modelling is a useful approach in metabolomics for the untargeted detection of abnormal metabolite profiles, when information from a set of reference observations is available to model “normal” or baseline metabolite profiles. Such outlying profiles are typically identified by comparing the distance [...] Read more.
One-class modelling is a useful approach in metabolomics for the untargeted detection of abnormal metabolite profiles, when information from a set of reference observations is available to model “normal” or baseline metabolite profiles. Such outlying profiles are typically identified by comparing the distance between an observation and the reference class to a critical limit. Often, multivariate distance measures such as the Mahalanobis distance (MD) or principal component-based measures are used. These approaches, however, are either not applicable to untargeted metabolomics data, or their results are unreliable. In this paper, five distance measures for one-class modeling in untargeted metabolites are proposed. They are based on a combination of the MD and five so-called eigenvalue-shrinkage estimators of the covariance matrix of the reference class. A simple cross-validation procedure is proposed to set the critical limit for outlier detection. Simulation studies are used to identify which distance measure provides the best performance for one-class modeling, in terms of type I error and power to identify abnormal metabolite profiles. Empirical evidence demonstrates that this method has better type I error (false positive rate) and improved outlier detection power than the standard (principal component-based) one-class models. The method is illustrated by its application to liquid chromatography coupled to mass spectrometry (LC-MS) and nuclear magnetic response spectroscopy (NMR) untargeted metabolomics data from two studies on food safety assessment and diagnosis of rare diseases, respectively. Full article
Show Figures

Figure 1

12 pages, 2865 KiB  
Article
OS-PCA: Orthogonal Smoothed Principal Component Analysis Applied to Metabolome Data
by Hiroyuki Yamamoto, Yasumune Nakayama and Hiroshi Tsugawa
Metabolites 2021, 11(3), 149; https://0-doi-org.brum.beds.ac.uk/10.3390/metabo11030149 - 05 Mar 2021
Cited by 4 | Viewed by 2671
Abstract
Principal component analysis (PCA) has been widely used in metabolomics. However, it is not always possible to detect phenotype-associated principal component (PC) scores. Previously, we proposed a smoothed PCA for samples acquired with a time course or rank order, but hypothesis testing to [...] Read more.
Principal component analysis (PCA) has been widely used in metabolomics. However, it is not always possible to detect phenotype-associated principal component (PC) scores. Previously, we proposed a smoothed PCA for samples acquired with a time course or rank order, but hypothesis testing to select significant metabolite candidates was not possible. Here, we modified the smoothed PCA as an orthogonal smoothed PCA (OS-PCA) so that statistical hypothesis testing in OS-PC loadings could be performed with the same PC projections provided by the smoothed PCA. Statistical hypothesis testing is especially useful in metabolomics because biological interpretations are made based on statistically significant metabolites. We applied the OS-PCA method to two real metabolome datasets, one for metabolic turnover analysis and the other for evaluating the taste of Japanese green tea. The OS-PCA successfully extracted similar PC scores as the smoothed PCA; these scores reflected the expected phenotypes. The significant metabolites that were selected using statistical hypothesis testing of OS-PC loading facilitated biological interpretations that were consistent with the results of our previous study. Our results suggest that OS-PCA combined with statistical hypothesis testing of OS-PC loading is a useful method for the analysis of metabolome data. Full article
Show Figures

Graphical abstract

15 pages, 1901 KiB  
Article
Ranking Metabolite Sets by Their Activity Levels
by Karen McLuskey, Joe Wandy, Isabel Vincent, Justin J. J. van der Hooft, Simon Rogers, Karl Burgess and Rónán Daly
Metabolites 2021, 11(2), 103; https://0-doi-org.brum.beds.ac.uk/10.3390/metabo11020103 - 11 Feb 2021
Cited by 12 | Viewed by 3350
Abstract
Related metabolites can be grouped into sets in many ways, e.g., by their participation in series of chemical reactions (forming metabolic pathways), or based on fragmentation spectral similarities or shared chemical substructures. Understanding how such metabolite sets change in relation to experimental factors [...] Read more.
Related metabolites can be grouped into sets in many ways, e.g., by their participation in series of chemical reactions (forming metabolic pathways), or based on fragmentation spectral similarities or shared chemical substructures. Understanding how such metabolite sets change in relation to experimental factors can be incredibly useful in the interpretation and understanding of complex metabolomics data sets. However, many of the available tools that are used to perform this analysis are not entirely suitable for the analysis of untargeted metabolomics measurements. Here, we present PALS (Pathway Activity Level Scoring), a Python library, command line tool, and Web application that performs the ranking of significantly changing metabolite sets over different experimental conditions. The main algorithm in PALS is based on the pathway level analysis of gene expression (PLAGE) factorisation method and is denoted as mPLAGE (PLAGE for metabolomics). As an example of an application, PALS is used to analyse metabolites grouped as metabolic pathways and by shared tandem mass spectrometry fragmentation patterns. A comparison of mPLAGE with two other commonly used methods (overrepresentation analysis (ORA) and gene set enrichment analysis (GSEA)) is also given and reveals that mPLAGE is more robust to missing features and noisy data than the alternatives. As further examples, PALS is also applied to human African trypanosomiasis, Rhamnaceae, and American Gut Project data. In addition, normalisation can have a significant impact on pathway analysis results, and PALS offers a framework to further investigate this. PALS is freely available from our project Web site. Full article
Show Figures

Figure 1

10 pages, 1509 KiB  
Article
Variable Selection in Untargeted Metabolomics and the Danger of Sparsity
by Gerjen H. Tinnevelt, Udo F.H. Engelke, Ron A. Wevers, Stefanie Veenhuis, Michel A. Willemsen, Karlien L.M. Coene, Purva Kulkarni and Jeroen J. Jansen
Metabolites 2020, 10(11), 470; https://0-doi-org.brum.beds.ac.uk/10.3390/metabo10110470 - 17 Nov 2020
Cited by 4 | Viewed by 2552
Abstract
The goal of metabolomics is to measure as many metabolites as possible in order to capture biomarkers that may indicate disease mechanisms. Variable selection in chemometric methods can be divided into the following two groups: (1) sparse methods that find the minimal set [...] Read more.
The goal of metabolomics is to measure as many metabolites as possible in order to capture biomarkers that may indicate disease mechanisms. Variable selection in chemometric methods can be divided into the following two groups: (1) sparse methods that find the minimal set of variables to discriminate between groups and (2) methods that find all variables important for discrimination. Such important variables can be summarized into metabolic pathways using pathway analysis tools like Mummichog. As a test case, we studied the metabolic effects of treatment with nicotinamide riboside, a form of vitamin B3, in a cohort of patients with ataxia–telangiectasia. Vitamin B3 is an important co-factor for many enzymatic reactions in the human body. Thus, the variable selection method was expected to find vitamin B3 metabolites and also other secondary metabolic changes during treatment. However, sparse methods did not select any vitamin B3 metabolites despite the fact that these metabolites showed a large difference when comparing intensity before and during treatment. Univariate analysis or significance multivariate correlation (sMC) in combination with pathway analysis using Mummichog were able to select vitamin B3 metabolites. Moreover, sMC analysis found additional metabolites. Therefore, in our comparative study, sMC displayed the best performance for selection of relevant variables. Full article
Show Figures

Figure 1

27 pages, 3320 KiB  
Article
Extraction and Integration of Genetic Networks from Short-Profile Omic Data Sets
by Jacopo Iacovacci, Alina Peluso, Timothy Ebbels, Markus Ralser and Robert C. Glen
Metabolites 2020, 10(11), 435; https://0-doi-org.brum.beds.ac.uk/10.3390/metabo10110435 - 29 Oct 2020
Cited by 5 | Viewed by 2336
Abstract
Mass spectrometry technologies are widely used in the fields of ionomics and metabolomics to simultaneously profile the intracellular concentrations of, e.g., amino acids or elements in genome-wide mutant libraries. These molecular or sub-molecular features are generally non-Gaussian and their covariance reveals patterns of [...] Read more.
Mass spectrometry technologies are widely used in the fields of ionomics and metabolomics to simultaneously profile the intracellular concentrations of, e.g., amino acids or elements in genome-wide mutant libraries. These molecular or sub-molecular features are generally non-Gaussian and their covariance reveals patterns of correlations that reflect the system nature of the cell biochemistry and biology. Here, we introduce two similarity measures, the Mahalanobis cosine and the hybrid Mahalanobis cosine, that enforce information from the empirical covariance matrix of omics data from high-throughput screening and that can be used to quantify similarities between the profiled features of different mutants. We evaluate the performance of these similarity measures in the task of inferring and integrating genetic networks from short-profile ionomics/metabolomics data through an analysis of experimental data sets related to the ionome and the metabolome of the model organism S. cerevisiae. The study of the resulting ionome–metabolome Saccharomyces cerevisiae multilayer genetic network, which encodes multiple omic-specific levels of correlations between genes, shows that the proposed measures can provide an alternative description of relations between biological processes when compared to the commonly used Pearson’s correlation coefficient and have the potential to guide the construction of novel hypotheses on the function of uncharacterised genes. Full article
Show Figures

Figure 1

13 pages, 2076 KiB  
Article
Towards Standardization of Data Normalization Strategies to Improve Urinary Metabolomics Studies by GC×GC-TOFMS
by Seo Lin Nam, A. Paulina de la Mata, Ryan P. Dias and James J Harynuk
Metabolites 2020, 10(9), 376; https://0-doi-org.brum.beds.ac.uk/10.3390/metabo10090376 - 19 Sep 2020
Cited by 25 | Viewed by 3908
Abstract
Urine is a popular biofluid for metabolomics studies due to its simple, non-invasive collection and its availability in large quantities, permitting frequent sampling, replicate analyses, and sample banking. The biggest disadvantage with using urine is that it exhibits significant variability in concentration and [...] Read more.
Urine is a popular biofluid for metabolomics studies due to its simple, non-invasive collection and its availability in large quantities, permitting frequent sampling, replicate analyses, and sample banking. The biggest disadvantage with using urine is that it exhibits significant variability in concentration and composition within an individual over relatively short periods of time (arising from various external factors and internal processes regulating the body’s water and solute content). In treating the data from urinary metabolomics studies, one must account for the natural variability of urine concentrations to avoid erroneous data interpretation. Amongst various proposed approaches to account for broadly varying urine sample concentrations, normalization to creatinine has been widely accepted and is most commonly used. MS total useful signal (MSTUS) is another normalization method that has been recently reported for mass spectrometry (MS)-based metabolomics studies. Herein, we explored total useful peak area (TUPA), a modification of MSTUS that is applicable to GC×GC-TOFMS (and data from other separations platforms), for sample normalization in urinary metabolomics studies. Performance of TUPA was compared to the two most common normalization approaches, creatinine adjustment and Total Peak Area (TPA) normalization. Each normalized dataset was evaluated using Principal Component Analysis (PCA). The results showed that TUPA outperformed alternative normalization methods to overcome urine concentration variability. Results also conclusively demonstrate the risks in normalizing data to creatinine. Full article
Show Figures

Figure 1

19 pages, 2105 KiB  
Article
A Multilevel Bayesian Approach to Improve Effect Size Estimation in Regression Modeling of Metabolomics Data Utilizing Imputation with Uncertainty
by Christopher E. Gillies, Theodore S. Jennaro, Michael A. Puskarich, Ruchi Sharma, Kevin R. Ward, Xudong Fan, Alan E. Jones and Kathleen A. Stringer
Metabolites 2020, 10(8), 319; https://0-doi-org.brum.beds.ac.uk/10.3390/metabo10080319 - 06 Aug 2020
Cited by 9 | Viewed by 2779
Abstract
To ensure scientific reproducibility of metabolomics data, alternative statistical methods are needed. A paradigm shift away from the p-value toward an embracement of uncertainty and interval estimation of a metabolite’s true effect size may lead to improved study design and greater reproducibility. [...] Read more.
To ensure scientific reproducibility of metabolomics data, alternative statistical methods are needed. A paradigm shift away from the p-value toward an embracement of uncertainty and interval estimation of a metabolite’s true effect size may lead to improved study design and greater reproducibility. Multilevel Bayesian models are one approach that offer the added opportunity of incorporating imputed value uncertainty when missing data are present. We designed simulations of metabolomics data to compare multilevel Bayesian models to standard logistic regression with corrections for multiple hypothesis testing. Our simulations altered the sample size and the fraction of significant metabolites truly different between two outcome groups. We then introduced missingness to further assess model performance. Across simulations, the multilevel Bayesian approach more accurately estimated the effect size of metabolites that were significantly different between groups. Bayesian models also had greater power and mitigated the false discovery rate. In the presence of increased missing data, Bayesian models were able to accurately impute the true concentration and incorporating the uncertainty of these estimates improved overall prediction. In summary, our simulations demonstrate that a multilevel Bayesian approach accurately quantifies the estimated effect size of metabolite predictors in regression modeling, particularly in the presence of missing data. Full article
Show Figures

Figure 1

7 pages, 3032 KiB  
Communication
Can We Trust Score Plots?
by Marta Bevilacqua and Rasmus Bro
Metabolites 2020, 10(7), 278; https://0-doi-org.brum.beds.ac.uk/10.3390/metabo10070278 - 08 Jul 2020
Cited by 19 | Viewed by 2992
Abstract
In this paper, we discuss the validity of using score plots of component models such as partial least squares regression, especially when these models are used for building classification models, and models derived from partial least squares regression for discriminant analysis (PLS-DA). Using [...] Read more.
In this paper, we discuss the validity of using score plots of component models such as partial least squares regression, especially when these models are used for building classification models, and models derived from partial least squares regression for discriminant analysis (PLS-DA). Using examples and simulations, it is shown that the currently accepted practice of showing score plots from calibration models may give misleading interpretations. It is suggested and shown that the problem can be solved by replacing the currently used calibrated score plots with cross-validated score plots. Full article
Show Figures

Figure 1

26 pages, 7385 KiB  
Article
On the Use of Correlation and MI as a Measure of Metabolite—Metabolite Association for Network Differential Connectivity Analysis
by Sanjeevan Jahagirdar and Edoardo Saccenti
Metabolites 2020, 10(4), 171; https://0-doi-org.brum.beds.ac.uk/10.3390/metabo10040171 - 24 Apr 2020
Cited by 19 | Viewed by 4242
Abstract
Metabolite differential connectivity analysis has been successful in investigating potential molecular mechanisms underlying different conditions in biological systems. Correlation and Mutual Information (MI) are two of the most common measures to quantify association and for building metabolite—metabolite association networks and to calculate differential [...] Read more.
Metabolite differential connectivity analysis has been successful in investigating potential molecular mechanisms underlying different conditions in biological systems. Correlation and Mutual Information (MI) are two of the most common measures to quantify association and for building metabolite—metabolite association networks and to calculate differential connectivity. In this study, we investigated the performance of correlation and MI to identify significantly differentially connected metabolites. These association measures were compared on (i) 23 publicly available metabolomic data sets and 7 data sets from other fields, (ii) simulated data with known correlation structures, and (iii) data generated using a dynamic metabolic model to simulate real-life observed metabolite concentration profiles. In all cases, we found more differentially connected metabolites when using correlation indices as a measure for association than MI. We also observed that different MI estimation algorithms resulted in difference in performance when applied to data generated using a dynamic model. We concluded that there is no significant benefit in using MI as a replacement for standard Pearson’s or Spearman’s correlation when the application is to quantify and detect differentially connected metabolites. Full article
Show Figures

Figure 1

Review

Jump to: Editorial, Research

18 pages, 552 KiB  
Review
Approaches to Integrating Metabolomics and Multi-Omics Data: A Primer
by Takoua Jendoubi
Metabolites 2021, 11(3), 184; https://0-doi-org.brum.beds.ac.uk/10.3390/metabo11030184 - 21 Mar 2021
Cited by 37 | Viewed by 7430
Abstract
Metabolomics deals with multiple and complex chemical reactions within living organisms and how these are influenced by external or internal perturbations. It lies at the heart of omics profiling technologies not only as the underlying biochemical layer that reflects information expressed by the [...] Read more.
Metabolomics deals with multiple and complex chemical reactions within living organisms and how these are influenced by external or internal perturbations. It lies at the heart of omics profiling technologies not only as the underlying biochemical layer that reflects information expressed by the genome, the transcriptome and the proteome, but also as the closest layer to the phenome. The combination of metabolomics data with the information available from genomics, transcriptomics, and proteomics offers unprecedented possibilities to enhance current understanding of biological functions, elucidate their underlying mechanisms and uncover hidden associations between omics variables. As a result, a vast array of computational tools have been developed to assist with integrative analysis of metabolomics data with different omics. Here, we review and propose five criteria—hypothesis, data types, strategies, study design and study focus— to classify statistical multi-omics data integration approaches into state-of-the-art classes under which all existing statistical methods fall. The purpose of this review is to look at various aspects that lead the choice of the statistical integrative analysis pipeline in terms of the different classes. We will draw particular attention to metabolomics and genomics data to assist those new to this field in the choice of the integrative analysis pipeline. Full article
Show Figures

Graphical abstract

Back to TopTop