Models and Methods in Bioinformatics: Theory and Applications

A special issue of Mathematics (ISSN 2227-7390). This special issue belongs to the section "Probability and Statistics".

Deadline for manuscript submissions: closed (30 April 2022) | Viewed by 20035

Special Issue Editors


E-Mail Website
Guest Editor
Department of Statistics and Operations Research, University of Murcia, CEIR Campus Mare Nostrum, IMIB-Arrixaca, 30100 Murcia, Spain
Interests: lifetime distributions; reliability systems; biostatistics; data analysis; stochastic models; statistical inference; classification; statistical software

E-Mail Website
Guest Editor
Department of Statistics and Operations Research, University of Murcia, IMIB-Arrixaca, 30100 Murcia, Spain
Interests: Statistical distributions; survival models; biostatistics; data analysis; statistical modelling; estimation methods; clustering methods; meta-analysis; feature selection

Special Issue Information

Dear Colleagues,

You are kindly invited to contribute to this Special Issue on "Models and Methods in Bioinformatics: Theory and Applications" with an original research article or review paper.

The purpose of this Special Issue is to provide a collection of articles concerning on all aspects of the theoretical research and novel applications of computational and statistical methods in bioinformatics, by including for modelling and analysis of all kinds of biological data, as well as others areas of this multidisciplinary field.

Topics of interest include but are not limited to the followings: computational methods, mathematical modelling, data analysis, big data, stochastic models, classification, algorithms in bioinformatics, estimation methods, hypotheses testing, machine learning, variable selection methods, analysis of biomarkers, and applications in bioinformatics.

Prof. Dr. Manuel Franco
Prof. Dr. Juana María Vivo
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Mathematics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • computational methods
  • mathematical modelling
  • data analysis
  • big data
  • stochastic models
  • classification
  • algorithms in bioinformatics
  • estimation methods
  • hypotheses testing
  • machine learning
  • variable selection methods
  • analysis of biomarkers
  • applications in bioinformatics.

Published Papers (9 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Jump to: Review

18 pages, 803 KiB  
Article
Thermodynamic Modelling of Transcriptional Control: A Sensitivity Analysis
by Manuel Cambón and Óscar Sánchez
Mathematics 2022, 10(13), 2169; https://0-doi-org.brum.beds.ac.uk/10.3390/math10132169 - 22 Jun 2022
Viewed by 1214
Abstract
Modelling is a tool used to decipher the biochemical mechanisms involved in transcriptional control. Experimental evidence in genetics is usually supported by theoretical models in order to evaluate the effects of all the possible interactions that can occur in these complicated processes. Models [...] Read more.
Modelling is a tool used to decipher the biochemical mechanisms involved in transcriptional control. Experimental evidence in genetics is usually supported by theoretical models in order to evaluate the effects of all the possible interactions that can occur in these complicated processes. Models derived from the thermodynamic method are critical in this labour because they are able to take into account multiple mechanisms operating simultaneously at the molecular micro-scale and relate them to transcriptional initiation at the tissular macro-scale. This work is devoted to adapting computational techniques to this context in order to theoretically evaluate the role played by several biochemical mechanisms. The interest of this theoretical analysis relies on the fact that it can be contrasted against those biological experiments where the response to perturbations in the transcriptional machinery environment is evaluated in terms of genetically activated/repressed regions. The theoretical reproduction of these experiments leads to a sensitivity analysis whose results are expressed in terms of the elasticity of a threshold function determining those activated/repressed regions. The study of this elasticity function in thermodynamic models already proposed in the literature reveals that certain modelling approaches can alter the balance between the biochemical mechanisms considered, and this can cause false/misleading outcomes. The reevaluation of classical thermodynamic models gives us a more accurate and complete picture of the interactions involved in gene regulation and transcriptional control, which enables more specific predictions. This sensitivity approach provides a definite advantage in the interpretation of a wide range of genetic experimental results. Full article
(This article belongs to the Special Issue Models and Methods in Bioinformatics: Theory and Applications)
Show Figures

Figure 1

23 pages, 4470 KiB  
Article
Explainable Machine Learning for Longitudinal Multi-Omic Microbiome
by Paula Laccourreye, Concha Bielza and Pedro Larrañaga
Mathematics 2022, 10(12), 1994; https://0-doi-org.brum.beds.ac.uk/10.3390/math10121994 - 09 Jun 2022
Cited by 3 | Viewed by 2556
Abstract
Over the years, research studies have shown there is a key connection between the microbial community in the gut, genes, and immune system. Understanding this association may help discover the cause of complex chronic idiopathic disorders such as inflammatory bowel disease. Even though [...] Read more.
Over the years, research studies have shown there is a key connection between the microbial community in the gut, genes, and immune system. Understanding this association may help discover the cause of complex chronic idiopathic disorders such as inflammatory bowel disease. Even though important efforts have been put into the field, the functions, dynamics, and causation of dysbiosis state performed by the microbial community remains unclear. Machine learning models can help elucidate important connections and relationships between microbes in the human host. Our study aims to extend the current knowledge of associations between the human microbiome and health and disease through the application of dynamic Bayesian networks to describe the temporal variation of the gut microbiota and dynamic relationships between taxonomic entities and clinical variables. We develop a set of preprocessing steps to clean, filter, select, integrate, and model informative metagenomics, metatranscriptomics, and metabolomics longitudinal data from the Human Microbiome Project. This study accomplishes novel network models with satisfactory predictive performance (accuracy = 0.648) for each inflammatory bowel disease state, validating Bayesian networks as a framework for developing interpretable models to help understand the basic ways the different biological entities (taxa, genes, metabolites) interact with each other in a given environment (human gut) over time. These findings can serve as a starting point to advance the discovery of novel therapeutic approaches and new biomarkers for precision medicine. Full article
(This article belongs to the Special Issue Models and Methods in Bioinformatics: Theory and Applications)
Show Figures

Figure 1

26 pages, 605 KiB  
Article
Analysing the Protein-DNA Binding Sites in Arabidopsis thaliana from ChIP-seq Experiments
by Ginés Almagro-Hernández, Juana-María Vivo, Manuel Franco and Jesualdo Tomás Fernández-Breis
Mathematics 2021, 9(24), 3239; https://0-doi-org.brum.beds.ac.uk/10.3390/math9243239 - 14 Dec 2021
Cited by 1 | Viewed by 1996
Abstract
Computational genomics aim at supporting the discovery of how the functionality of the genome of the organism under study is affected both by its own sequence and structure, and by the network of interaction between this genome and different biological or physical factors. [...] Read more.
Computational genomics aim at supporting the discovery of how the functionality of the genome of the organism under study is affected both by its own sequence and structure, and by the network of interaction between this genome and different biological or physical factors. In this work, we focus on the analysis of ChIP-seq data, for which many methods have been proposed in the recent years. However, to the best of our knowledge, those methods lack an appropriate mathematical formalism. We have developed a method based on multivariate models for the analysis of the set of peaks obtained from a ChIP-seq experiment. This method can be used to characterize an individual experiment and to compare different experiments regardless of where and when they were conducted. The method is based on a multivariate hypergeometric distribution, which fits the complexity of the biological data and is better suited to deal with the uncertainty generated in this type of experiments than the dichotomous models used by the state of the art methods. We have validated this method with Arabidopsis thaliana datasets obtained from the Remap2020 database, obtaining results in accordance with the original study of these samples. Our work shows a novel way for analyzing ChIP-seq data. Full article
(This article belongs to the Special Issue Models and Methods in Bioinformatics: Theory and Applications)
Show Figures

Figure 1

22 pages, 544 KiB  
Article
Techniques to Deal with Off-Diagonal Elements in Confusion Matrices
by Inmaculada Barranco-Chamorro and Rosa M. Carrillo-García
Mathematics 2021, 9(24), 3233; https://0-doi-org.brum.beds.ac.uk/10.3390/math9243233 - 14 Dec 2021
Cited by 8 | Viewed by 2075
Abstract
Confusion matrices are numerical structures that deal with the distribution of errors between different classes or categories in a classification process. From a quality perspective, it is of interest to know if the confusion between the true class A and the class labelled [...] Read more.
Confusion matrices are numerical structures that deal with the distribution of errors between different classes or categories in a classification process. From a quality perspective, it is of interest to know if the confusion between the true class A and the class labelled as B is not the same as the confusion between the true class B and the class labelled as A. Otherwise, a problem with the classifier, or of identifiability between classes, may exist. In this paper two statistical methods are considered to deal with this issue. Both of them focus on the study of the off-diagonal cells in confusion matrices. First, McNemar-type tests to test the marginal homogeneity are considered, which must be followed from a one versus all study for every pair of categories. Second, a Bayesian proposal based on the Dirichlet distribution is introduced. This allows us to assess the probabilities of misclassification in a confusion matrix. Three applications, including a set of omic data, have been carried out by using the software R. Full article
(This article belongs to the Special Issue Models and Methods in Bioinformatics: Theory and Applications)
Show Figures

Figure 1

24 pages, 3816 KiB  
Article
Deciphering Genomic Heterogeneity and the Internal Composition of Tumour Activities through a Hierarchical Factorisation Model
by José Carbonell-Caballero, Antonio López-Quílez, David Conesa and Joaquín Dopazo
Mathematics 2021, 9(21), 2833; https://0-doi-org.brum.beds.ac.uk/10.3390/math9212833 - 08 Nov 2021
Viewed by 1409
Abstract
Genomic heterogeneity constitutes one of the most distinctive features of cancer diseases, limiting the efficacy and availability of medical treatments. Tumorigenesis emerges as a strongly stochastic process, producing a variable landscape of genomic configurations. In this context, matrix factorisation techniques represent a suitable [...] Read more.
Genomic heterogeneity constitutes one of the most distinctive features of cancer diseases, limiting the efficacy and availability of medical treatments. Tumorigenesis emerges as a strongly stochastic process, producing a variable landscape of genomic configurations. In this context, matrix factorisation techniques represent a suitable approach for modelling such complex patterns of variability. In this work, we present a hierarchical factorisation model conceived from a systems biology point of view. The model integrates the topology of molecular pathways, allowing to simultaneously factorise genes and pathways activity matrices. The protocol was evaluated by using simulations, showing a high degree of accuracy. Furthermore, the analysis with a real cohort of breast cancer patients depicted the internal composition of some of the most relevant altered biological processes in the disease, describing gene and pathway level strategies and their observed combinations in the population of patients. We envision that this kind of approaches will be essential to better understand the hallmarks of cancer. Full article
(This article belongs to the Special Issue Models and Methods in Bioinformatics: Theory and Applications)
Show Figures

Figure 1

20 pages, 617 KiB  
Article
Evaluating the Performances of Biomarkers over a Restricted Domain of High Sensitivity
by Manuel Franco and Juana-María Vivo
Mathematics 2021, 9(21), 2826; https://0-doi-org.brum.beds.ac.uk/10.3390/math9212826 - 07 Nov 2021
Cited by 2 | Viewed by 2922
Abstract
The burgeoning advances in high-throughput technologies have posed a great challenge to the identification of novel biomarkers for diagnosing, by contemporary models and methods, through bioinformatics-driven analysis. Diagnostic performance metrics such as the partial area under the ROC ( [...] Read more.
The burgeoning advances in high-throughput technologies have posed a great challenge to the identification of novel biomarkers for diagnosing, by contemporary models and methods, through bioinformatics-driven analysis. Diagnostic performance metrics such as the partial area under the ROC (pAUC) indexes exhibit limitations to analysing genomic data. Among other issues, the inability to differentiate between biomarkers whose ROC curves cross each other with the same pAUC value, the inappropriate expression of non-concave ROC curves, and the lack of a convenient interpretation, restrict their use in practice. Here, we have proposed the fitted partial area index (FpAUC), which is computable through an algorithm valid for any ROC curve shape, as an alternative performance summary for the evaluation of highly sensitive biomarkers. The proposed approach is based on fitter upper and lower bounds of the pAUC in a high-sensitivity region. Through variance estimates, simulations, and case studies for diagnosing leukaemia, and ovarian and colon cancers, we have proven the usefulness of the proposed metric in terms of restoring the interpretation and improving diagnostic accuracy. It is robust and feasible even when the ROC curve shows hooks, and solves performance ties between competitive biomarkers. Full article
(This article belongs to the Special Issue Models and Methods in Bioinformatics: Theory and Applications)
Show Figures

Figure 1

18 pages, 5215 KiB  
Article
Ordering of Omics Features Using Beta Distributions on Montecarlo p-Values
by Angela L. Riffo-Campos, Guillermo Ayala and Juan Domingo
Mathematics 2021, 9(11), 1307; https://0-doi-org.brum.beds.ac.uk/10.3390/math9111307 - 07 Jun 2021
Viewed by 1653
Abstract
The current trend in genetic research is the study of omics data as a whole, either combining studies or omics techniques. This raises the need for new robust statistical methods that can integrate and order the relevant biological information. A good way to [...] Read more.
The current trend in genetic research is the study of omics data as a whole, either combining studies or omics techniques. This raises the need for new robust statistical methods that can integrate and order the relevant biological information. A good way to approach the problem is to order the features studied according to the different kinds of data so a key point is to associate good values to the features that permit us a good sorting of them. These values are usually the p-values corresponding to a hypothesis which has been tested for each feature studied. The Montecarlo method is certainly one of the most robust methods for hypothesis testing. However, a large number of simulations is needed to obtain a reliable p-value, so the method becomes computationally infeasible in many situations. We propose a new way to order genes according to their differential features by using a score defined from a beta distribution fitted to the generated p-values. Our approach has been tested using simulated data and colorectal cancer datasets from Infinium methylationEPIC array, Affymetrix gene expression array and Illumina RNA-seq platforms. The results show that this approach allows a proper ordering of genes using a number of simulations much lower than with the Montecarlo method. Furthermore, the score can be interpreted as an estimated p-value and compared with Montecarlo and other approaches like the p-value of the moderated t-tests. We have also identified a new expression pattern of eighteen genes common to all colorectal cancer microarrays, i.e., 21 datasets. Thus, the proposed method is effective for obtaining biological results using different datasets. Our score shows a slightly smaller type I error for small sizes than the Montecarlo p-value. The type II error of Montecarlo p-value is lower than the one obtained with the proposed score and with a moderated p-value, but these differences are highly reduced for larger sample sizes and higher false discovery rates. Similar performances from type I and II errors and the score enable a clear ordering of the features being evaluated. Full article
(This article belongs to the Special Issue Models and Methods in Bioinformatics: Theory and Applications)
Show Figures

Figure 1

Review

Jump to: Research

26 pages, 2838 KiB  
Review
In Search of Complex Disease Risk through Genome Wide Association Studies
by Lorena Alonso, Ignasi Morán, Cecilia Salvoro and David Torrents
Mathematics 2021, 9(23), 3083; https://0-doi-org.brum.beds.ac.uk/10.3390/math9233083 - 30 Nov 2021
Viewed by 2212
Abstract
The identification and characterisation of genomic changes (variants) that can lead to human diseases is one of the central aims of biomedical research. The generation of catalogues of genetic variants that have an impact on specific diseases is the basis of Personalised Medicine, [...] Read more.
The identification and characterisation of genomic changes (variants) that can lead to human diseases is one of the central aims of biomedical research. The generation of catalogues of genetic variants that have an impact on specific diseases is the basis of Personalised Medicine, where diagnoses and treatment protocols are selected according to each patient’s profile. In this context, the study of complex diseases, such as Type 2 diabetes or cardiovascular alterations, is fundamental. However, these diseases result from the combination of multiple genetic and environmental factors, which makes the discovery of causal variants particularly challenging at a statistical and computational level. Genome-Wide Association Studies (GWAS), which are based on the statistical analysis of genetic variant frequencies across non-diseased and diseased individuals, have been successful in finding genetic variants that are associated to specific diseases or phenotypic traits. But GWAS methodology is limited when considering important genetic aspects of the disease and has not yet resulted in meaningful translation to clinical practice. This review presents an outlook on the study of the link between genetics and complex phenotypes. We first present an overview of the past and current statistical methods used in the field. Next, we discuss current practices and their main limitations. Finally, we describe the open challenges that remain and that might benefit greatly from further mathematical developments. Full article
(This article belongs to the Special Issue Models and Methods in Bioinformatics: Theory and Applications)
Show Figures

Figure 1

13 pages, 956 KiB  
Review
The FMM Approach to Analyze Biomedical Signals: Theory, Software, Applications and Future
by Cristina Rueda, Itziar Fernández, Yolanda Larriba and Alejandro Rodríguez-Collado
Mathematics 2021, 9(10), 1145; https://0-doi-org.brum.beds.ac.uk/10.3390/math9101145 - 19 May 2021
Cited by 2 | Viewed by 2452
Abstract
Oscillatory systems arise in the different biological and medical fields. Mathematical and statistical approaches are fundamental to deal with these processes. The Frequency Modulated Mobiüs approach (FMM), reviewed in this paper, is one of these approaches. Little known as it has been recently [...] Read more.
Oscillatory systems arise in the different biological and medical fields. Mathematical and statistical approaches are fundamental to deal with these processes. The Frequency Modulated Mobiüs approach (FMM), reviewed in this paper, is one of these approaches. Little known as it has been recently developed, it solves a variety of exciting questions with real data; some of them, such as the decomposition of the signal into components and their multiple uses, are of general application, others are specific. Among the exciting specific applications is the automatic interpretation of the electrocardiogram signal. In this paper, a summary of the theoretical, statistical and computational properties of the FMM approach are revised. Additionally, as a novelty, the FMM approach’s usefulness for the analysis of blood pressure signals is shown. For the latter, a new robust estimation algorithm is proposed using FMM models with restrictions. The paper ends with a view about challenges for the future. Full article
(This article belongs to the Special Issue Models and Methods in Bioinformatics: Theory and Applications)
Show Figures

Figure 1

Back to TopTop