Next Article in Journal
Phage Engineering for Targeted Multidrug-Resistant Escherichia coli
Next Article in Special Issue
Signatures of Co-Deregulated Genes and Their Transcriptional Regulators in Kidney Cancers
Previous Article in Journal
Orofacial Migraine or Neurovascular Orofacial Pain from Pathogenesis to Treatment
Previous Article in Special Issue
AR Expression Correlates with Distinctive Clinicopathological and Genomic Features in Breast Cancer Regardless of ESR1 Expression Status
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Review

Big Data in Gastroenterology Research

by
Madeline Alizadeh
1,
Natalia Sampaio Moura
2,
Alyssa Schledwitz
2,
Seema A. Patil
2,
Jacques Ravel
1 and
Jean-Pierre Raufman
2,3,4,5,*
1
The Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201, USA
2
Department of Medicine, Division of Gastroenterology and Hepatology, University of Maryland School of Medicine, Baltimore, MD 21201, USA
3
Veterans Affairs Maryland Healthcare System, Baltimore, MD 21201, USA
4
Marlene and Stewart Greenebaum Cancer Center, University of Maryland Medical Center, Baltimore, MD 21201, USA
5
Department of Biochemistry and Molecular Biology, University of Maryland School of Medicine, Baltimore, MD 21201, USA
*
Author to whom correspondence should be addressed.
Int. J. Mol. Sci. 2023, 24(3), 2458; https://0-doi-org.brum.beds.ac.uk/10.3390/ijms24032458
Submission received: 30 December 2022 / Revised: 18 January 2023 / Accepted: 20 January 2023 / Published: 27 January 2023
(This article belongs to the Special Issue Data Science in Cancer Genomics and Precision Medicine)

Abstract

:
Studying individual data types in isolation provides only limited and incomplete answers to complex biological questions and particularly falls short in revealing sufficient mechanistic and kinetic details. In contrast, multi-omics approaches to studying health and disease permit the generation and integration of multiple data types on a much larger scale, offering a comprehensive picture of biological and disease processes. Gastroenterology and hepatobiliary research are particularly well-suited to such analyses, given the unique position of the luminal gastrointestinal (GI) tract at the nexus between the gut (mucosa and luminal contents), brain, immune and endocrine systems, and GI microbiome. The generation of ‘big data’ from multi-omic, multi-site studies can enhance investigations into the connections between these organ systems and organisms and more broadly and accurately appraise the effects of dietary, pharmacological, and other therapeutic interventions. In this review, we describe a variety of useful omics approaches and how they can be integrated to provide a holistic depiction of the human and microbial genetic and proteomic changes underlying physiological and pathophysiological phenomena. We highlight the potential pitfalls and alternatives to help avoid the common errors in study design, execution, and analysis. We focus on the application, integration, and analysis of big data in gastroenterology and hepatobiliary research.

1. Introduction

Unquestionably, the availability of massive datasets, so-called ‘big data’, has revolutionized biomedical research. While the contents of large datasets vary greatly, their defining characteristic is size, containing anywhere from hundreds of thousands to millions of entries. As described herein, handling such large, diverse, high-throughput datasets requires a set of analytical and biostatistical skills that have developed as separate disciplines in concert with the growth and complexity of such datasets. Understanding the utility, applications, and drawbacks of each type of dataset is crucial to their proper analysis and integration, and necessary to avoid pitfalls in their handling, storage, and interpretation.
In this comprehensive review, we focus on understanding how the use of big data has become an indispensable feature of gastroenterology and hepatobiliary research. These fields are particularly well-suited to the use of big data because of their unique positioning at the nexus between the gut (mucosa and luminal contents), the brain (enteric nervous system), the immune and endocrine (enteroendocrine cells) systems, and the vast GI microbiome. We believe such an analysis is timely given the emergence of novel, more powerful, less costly tools that provide exquisite genetic and epigenetic and proteomic detail at the cellular and subcellular level. Of course, as investigators learn to navigate this complex technological landscape, the potential for misapplication, misinterpretation, and misuse of such techniques, particularly by those unsophisticated or untutored in their practice, is a concern. Thus, our goal is to review the use and analysis of common types of research datasets comprising genetic, genomic, epigenetic, epigenomic, transcriptomic, proteomic, and metabolomic information. In this context, we also address specific issues concerning the analysis of the gut microbiome and the use of the clinical data derived from electronic health records and medical imaging. We describe and illustrate a representative study design that integrates these features in the context of inflammatory bowel disease (IBD) research.
Although the reader is encouraged to delve more deeply into these issues, perhaps using the citations as a starting point, we discuss the relevant aspects of the data distribution, preferred analytical approaches, common pitfalls, and advanced statistical methods that, in many cases, were developed specifically to address the complexity and integration of big data. To avoid the common pitfalls and mistakes in the use of such complex information requires the standardization of the analytical and statistical methods and the expanded training of a cadre of bioinformaticians adept in their use; this is clearly team science.

2. Common Large and High-Throughput Datasets

Multiple types of large datasets exist and are used within the realm of GI and hepatobiliary research. An overview of the most common data types is discussed below, and the prominent benefits and limitations to their use are addressed in Table 1.

2.1. Human Genetics and Genomics

The traditionally heritable component of GI diseases can be explored in many ways, which are largely characterized as genetic or genomic approaches. Genetic approaches focus on single genes, whereas genomic approaches take a broader view to study the interplay between large swaths of the genome. Commonly, genetic studies focus on monogenic contributions to disease, involving in vitro [16,17] and/or animal models [18]. These approaches are used to characterize rare diseases, such as monogenic autoinflammatory disorders [19], as well as common diseases with a monogenic etiology, such as Maturity Onset Diabetes of the Young (MODY) and familial hypercholesterolemia [20]. The impact of this research extends beyond individual diseases. For example, the identification and characterization of the MEFV gene and its role in Familial Mediterranean Fever (an autoinflammatory disorder) elucidated the roles of the inflammasome and dysregulated innate immunity in health and disease [21].
Genome Wide Association Studies (GWAS) enroll tens to hundreds of thousands of participants, followed by wide-scale genomic analysis to seek the variations in the genome associated with outcomes or diseases in a predictive (as opposed to causative) manner [1]. Several bioinformatic and statistical steps are employed to assess these variations, including prioritization of the single nucleotide polymorphisms (SNPs) that may be related to the process under consideration [22] and corrections for multiple comparisons [23,24]. The additional steps required to pool data for meta-analysis [25] offer the ability to study specific SNPs in more diverse contexts [26]. GWAS have been conducted to seek the links between SNPs and atherosclerotic disease [27], the monocyte–leukocyte ratio (often used as a predictor of the response to infectious disease) [28], and even suicidal behaviors [29,30]. GWAS are also useful in characterizing unknown mechanisms by identifying which genes are playing a role in a given process or condition [31,32] and, in this context, can even be used to identify drug targets [33].

2.2. Epigenetics and Epigenomics

Epigenetic modifications are non-heritable changes caused by covalent “tags” or other modifications of the genome. Similar to genetics and genomics, the fields of epigenetics and epigenomics examine specific and broader patterns of epigenome changes, respectively [34]. Epigenetic tags include methylation, which commonly suppresses gene expression, and acetylation, which decreases histone binding, thus freeing portions of the genome to increase expression; epigenetic tags modulate cell type-specific patterns of gene regulation and expression. CpG islands, or regions with multiple CG repeats, are common sites of gene methylation, often guiding development and cell differentiation; heavily condensed regions are referred to as heterochromatin. A well-described developmental example is X chromosome inactivation [35,36]. More changes occur in somatic cells than initially believed, and various exposures and diseases can also induce changes in these patterns [36,37]. Histone acetylation often correlates with demethylation; the resulting open or free DNA is referred to as euchromatin [35].
Both epigenetics and epigenomics are studied using a variety of techniques which take advantage of covalent modifications to characterize methylation and acetylation patterns. These include sequencing-based methods, such as bisulfite sequencing, ChIP-seq, and ATAC-seq [38]. The role of epigenetics and epigenomics in understanding disease causation and identifying potential therapeutic targets has been explored in cancer [39,40], aging [41], and weight regulation and metabolism [42,43], as first demonstrated by the Dutch famine study [44]. Understanding the role of epigenetic modifications in weight gain and weight-related disorders (such as metabolic syndrome) is of particular importance to the fields of endocrinology and metabolism, as well as GI and hepatology; over the past 20 years, obesity-induced liver steatosis, fibrosis, cirrhosis, and cancer have reached alarming proportions.

2.3. Transcriptomics

Transcriptomics describes the study of RNA expression, which includes microRNAs (mRNAs), non-coding RNAs (small and long ncRNAs), interfering RNAs, and small nuclear and nucleolar RNAs (sn/snoRNAs), amongst others. RNA-seq or RNA sequencing can be used to sequence specific subsets of RNA; while previously popular array-based assays targeted expression analysis, most high-throughput methods are RNA-seq-based (e.g., mRNA-seq, total RNA-seq, targeted RNA-seq, single cell (sc) and single nucleus (sn) RNA-seq, and small RNA-seq) and can capture all expression data in a sample, permitting the exploration of new and unknown RNAs [45,46]. These approaches can be used to answer different questions and are unique in their strengths and weaknesses: for example, unlike scRNA-seq, snRNA-seq can be applied to frozen samples; this avoids dissociation bias, but it is limited to providing information about nuclear transcripts [47].
Bulk or total RNA-seq is the most common transcriptome sequencing method and can cost-effectively offer great insight into physiological processes. The technique is performed on a homogenized sample of tissue, serum, or some other sort, and sequencing is performed on the entirety of the sample in a non-selective, unbiased manner. This technique is often used to compare physiologic differences under varying conditions, such as exposures and treatments, disorders and diseases, and in the context of genetic variation. However, the existence of multiple cell types often complicates analysis; the additional use of cell-type markers in analysis as a means of identifying cell-type abundance changes has been used to estimate such changes [48]. However, non-marker changes cannot be attributed to a specific cell type and thus require further mechanistic characterization to identify cellular involvement.
Bulk RNA-seq, the oldest and least costly RNA-seq approach, is quickly being supplanted by more sophisticated approaches. scRNA-seq provides cell-specific gene expression data. Many investigators have now shifted to the scRNA-seq-based interrogation of transcriptomes, but cost may be limiting—reduced sample size requirements for accurate analysis may help offset these costs. Furthermore, the access to machines for data generation is somewhat limited, and despite the requirement for fresh samples, the options for preservation media may be inadequate. In scRNA-seq, individual cells from a tissue sample are separated and, after sequencing, reads are mapped to a reference transcriptome to estimate the RNA expression levels; following normalization, the expression levels of marker genes are analyzed for each cell type, though this process can be thorny [49]. Multiple attribution tools and the increasing use of machine learning-based methods can help to optimize cell type assignments [50,51]. Common concerns include the limited identification of rarer cell types [52,53], the impact of high dimensionality on unsupervised clustering [54], and the mixed expression patterns displayed by some cells (i.e., multiple responses to the same signal) [55]. A lack of adequate quality control at every step can make the results untrustworthy [56,57]. Furthermore, the inappropriate input and use of classifiers can adversely affect the downstream results and interpretation [58,59]. Examples of scRNA-seq versatility are the finding of new neuronal subtypes [60] and hormone-producing cell types [61] and the characterizing of sex-specific transcription patterns at the cellular level [62]. Relevant to GI research, this approach has identified the axial spondylarthritis immune cell changes specific to IBD [63].

2.4. Proteomics

Large-scale, protein-level proteomics can complement DNA- and RNA-level studies or be used independently to seek the potentially targetable mechanisms underlying disease. The many steps between the translation and the production of proteins include folding, signal sequence cleavage, functional group additions such as glycosylation, and protein–protein binding to form complexes; this makes it difficult to predict the biological consequences of transcriptional changes alone without protein-level confirmation. Nonetheless, it is important to consider the potential reasons for the discordance between the RNA and protein expression patterns [64]. Instead of protein microchips with limited identification abilities [65], chromatography and mass spectrometry methods are often used to explore protein presence and abundance [66]. Protein folding and interactions can be assessed using chemical crosslinking, affinity purification, protein correlation profiling, and proximity labeling among others [66]; this characterization permits the recognition of protein networks, with pathway- and context-dependent functional annotation [67] that may identify disease biomarkers and drug targets. The popularity of proteomics studies has been especially evident in oncology, where biomarkers with high specificity and sensitivity are difficult to identify [68]; an example is the recategorization and subtyping of breast cancer, which impacts treatment stratification [69,70,71,72].

2.5. Microbiome

The microbiome represents a collection of microbial (primarily bacterial) species that inhabit the human body in a largely tissue-specific manner elucidated by the Human Microbiome Project [73]. The individual taxa or genus can be assessed using 16S rRNA amplicon sequencing or shotgun metagenomics; while 16S amplicon sequencing amplifies variable regions of the 16S gene to assess taxa abundance, metagenomic sequencing relies on Nextgen Sequencing, and read assignments are used to assess taxa abundance and genetic functionality. The intestinal microbiome has a deeply intertwined and dynamic relationship with several GI disorders. Other sites of microbial colonization include the oral cavity, skin, nasopharynx, and vagina, which have site-specific characteristics. A “healthy” vaginal microenvironment is characterized by Lactobacillus predominance with one of four Lactobacillus species [74], whereas the colon microbiome is considered to be ‘healthier’ when more diverse [75,76,77].
Defining a “normal” microbiome can be difficult [8] and depends on the technique used to assess composition [7]; for example, swabs or stool samples may yield varying results when compared to tissue biopsy [9], and metagenomic sequencing approximates compositionality differently than 16S rRNA amplicon sequencing [78]. Yet, many studies have successfully identified and distinguished differences in the microbiome in human disease [79]; the intestinal microbiome in particular has been associated with many diseases, including arthritis, IBD, autism, and cancer, to name only a few [9,80,81,82]. Some causative and mechanistic studies have been performed, demonstrating the physiologic changes that occur in response to the introduction of certain bacteria; however, these studies are largely limited to animal models, which differ in microbiota composition and the potential systemic impact in comparison to humans [7]. Much work in humans is correlative rather than mechanistic [83]; to define the precise role of the microbiome in health and disease, longitudinal, interventional, and mechanistic studies are needed.

2.6. Metabolomics

Metabolomics, or the high-throughput study of metabolites present in a sample, is an important potential addition to any multi-omic study of GI health and disease. Metabolites include both endogenous and exogenous compounds with molecular mass < 1500 kDa [84]. Similar to proteomics, metabolic profiles offer a deeper glimpse into the mechanistic underpinnings of biological processes and can be used to study the impact of genome and epigenome, microbiome, transcriptome, and even proteome differences and compositional changes for low molecular weight molecules [85]. Given its large size and the uncountable number of different compound classes, measuring the metabolome requires a range of mass spectrometry methods, including LC-MS, GC-MS, CE-MS, and IMS-MS, as well as NMR, which is less sensitive but provides nondestructive sample measurements [85,86]. Commonly, third-party companies generate the data used by researchers; this may complicate the analysis and interpretation since the raw data are usually not released and missing values are more difficult to interpret [87]. This is a particular concern for investigative teams who lack the training and knowledge to interpret and critically analyze such outsourced data.
Metabolomics studies have provided useful insights into rheumatoid arthritis [88], ischemic heart disease [89], infectious diseases [90], and cancer [91]. Metabolomics research has also been used to explore the dynamic relationship between host and microbiome metabolism [92], an ideal approach to gaining mechanistic insights into digestive diseases.

2.7. Medical Informatics

Medical informatics describes the study of patient clinical data, including laboratory and other clinical results, using a variety of statistical methods. Modeling is an umbrella term used to describe some of these processes, including various forms of regression [93,94,95], whereby a mathematical description (i.e., an equation) is derived to predict relationships among a set of variables, in this case clinical variables. For example, modeling clinical and demographic data can be used to predict outcomes. Depending on the outcome of interest and the format and type of data available, different methodologies and techniques can be used to predict associations. While smaller datasets can be used for these analyses, the robustness and accuracy of the model, as well as its predictive power, tend to increase with more input data. Thus, large datasets are especially useful in finding trends that are distinguished by small but potentially meaningful differences. Large datasets of clinical and demographic data also allow for the use of multiple predictor variables in tandem [96], identifying temporal associations in clinical outcomes [97], and applying machine learning tools to enhance predictive power [98]. In recent years, such datasets have become more accessible as a result, primarily, of medical institutions mandating the use of integrated electronic health records (EHR) [99] and the development of large national and other databases geared towards collecting extensive human data [100]. However, EHR do not generally provide a comprehensive assessment and are, therefore, susceptible to a broad range of biases, which must be kept in mind when analyzing these data. Nevertheless, in tandem with enhanced biostatistical approaches, the exploitation of these extensive datasets can enhance confidence in the generalizability and potential applicability of the resulting findings [101], which are crucial to achieving meaningful clinical impact.

2.8. Imaging Data

Diagnostic imaging, using multiple modalities, is widespread and provides another data-rich resource. As with any tool requiring human interpretation, there is potential for error; efforts are underway to increase diagnostic accuracy using machine learning and artificial intelligence to correctly identify anomalies [102,103], reduce noise, and enhance the visualization of data prone to motion sensitivity [104]. Bioinformatic tools have been developed for these purposes [105,106,107] and will be key for the widespread use and incorporation of large bioimaging datasets.

3. Big Data in Gastroenterology and Hepatobiliary Research

GI disorders, influenced by genetic and environmental factors, are difficult to treat because of their long duration, recurrence, and persistence. Outstanding questions remain within the field, such as: what are the genetic underpinnings of different GI pathologies, such as colorectal cancer (CRC)? How do our internal and external environments impact their development? How can we use omics studies to understand these diseases and identify novel biomarkers to inform clinical decision making? Such questions need to be addressed in a multi-faceted approach. The use of high-throughput omics stands at the breakthrough of such discoveries (Figure 1). In this section, we examine how big data derived from various omics techniques are applied to gain a more comprehensive understanding and to advance the field of gastroenterology. We explore the application of human genetics and genomics, epigenomics, transcriptomics, proteomics, microbiome, and metabolomics to characterize multifactorial diseases such as IBD, to advance biomarker discovery, to generate novel ways to detect GI cancers and their progression earlier, and to develop precision medicine. Furthermore, we examine examples of how different technologies within medical informatics can integrate big data for useful analysis, and how imaging data can revolutionize procedures by assisting with the diagnosis and outcome predictions.

3.1. Human Genetics and Genomics

Genetic and genomic studies can enhance the mechanistic understanding of GI and hepatobiliary diseases, provide novel diagnostic tools, and pave the way to targeted interventions. Gastroenterologists generally obtain a basic understanding of the genetics of GI disorders at the start of their training from textbooks and lectures that focus on, among other topics, the role of genetic mismatch repair defects in Lynch syndrome [108], the importance of KRAS and BRAF mutations in colorectal cancer [109], and HFE testing for hereditary hemochromatosis. However, the impact of genetics in the GI field is not restricted to these diseases. Whole exome sequencing and GWAS can identify possible causal genes for other, sometimes undefined, syndromes and can help to elucidate their pathogenesis. For example, genomic studies associated the MEFV mutation with recurrent abdominal pain and fever in FMF, a rare monogenic disease [110]. A more common condition, celiac disease, has considerable genetic influence; compared to 30% of the general population, 90-100% of patients possess either class II HLA-DQ2 or -DQ8 [111,112]. Serology testing for tissue transglutaminase-immunoglobulin A (TG2-IgA) remains a crucial part of celiac disease diagnosis, but HLA typing has value in the excluding of celiac disease in seronegative persons [111].
GWAS advanced the understanding of the susceptibility alleles in multifactorial diseases such as IBD and Behcet’s disease. Over a decade ago, Jostins et al. reported more than 163 genetic loci associated with susceptibility to the two major forms of IBD, ulcerative colitis (UC) and Crohn’s disease (CD) [113]; since then, at least an additional 80 IBD-associated genetic loci have been reported. Of these, approximately two-thirds are shared with other complex diseases or traits, including primary immunodeficiencies and mycobacterial disease. As an example, NOD2, the first susceptibility gene for CD, is also associated with Blau syndrome, an autoinflammatory disease [114,115], highlighting the observation that genomics can the unravel the underlying mechanisms between seemingly distinct pathologies. Behcet’s disease, in turn, is an autoinflammatory disease characterized by painful mucocutaneous ulceration that can be associated with intestinal inflammation [116,117], and can even mimic CD.
Genomics also paved the way for genotype–phenotype correlations, improved diagnostic testing and disease screening, and targeted treatments. Recently, the International Society for Gastrointestinal Hereditary Tumors (InSiGHT) applied a standardized classification scheme to a database containing variants of the MLH1, MSH2, MSH6, and PMS2 genes associated with Lynch Syndrome and tied this information to clinical recommendations [108]. Furthermore, after Samadder et al. reported increased heritable variants in solid cancers by multigene vs. targeted testing [118] and Uson et al. reported that universal multigene panel testing increased the detection of the heritable germline mutations associated with CRC (15.5% harbored 62 pathogenic variants with MSH2 among the most common genes) [119], the National Comprehensive Cancer Network recommended germline multigene panel testing for all individuals younger than 50 years with CRC. This highlights the contribution of genetics and genomics to understanding disease etiology, recognizing susceptibility alleles, and enhancing screening and early detection.

3.2. Epigenomics

Epigenetic modifications such as DNA methylation are characteristically stable, making them a useful sensor of disease risk and progression. Their impact in the pathogenesis of diseases of the GI tract, such as colorectal and gastric cancer and IBD, highlight their complexity. Epigenomics can also be leveraged for the early detection of colorectal adenomas and cancers. In GI neoplasia, tumor suppressor genes may be inactivated by mutation or promoter and/or CpG island hypermethylation. Hypermethylated genes can be involved in cell cycle regulation, DNA repair, growth regulation, apoptosis, cell attachment, and signal transduction. MLH1, a mismatch repair gene, is an important example. In sporadic colorectal and gastric cancer, altered MLH1 leads to microsatellite instability and can confer resistance to chemotherapeutics [120,121]. Arnold et al. demonstrated that the demethylation of hMLH1 in hypermethylated cell lines led to the re-expression of the protein and reversed resistance to 5- fluorouracil [121]. Less commonly, mutation-negative families with suspected Lynch syndrome due to silenced MLH1 expression in tumors may harbor constitutional epimutation, wherein hypermethylation at the promoter of one allele silences its expression in major somatic tissues [122]. Aberrant patterns of histone modification play a role in CRC by leading to transcriptional changes. In gastric cancer, the genes involved in cancer-related pathways are more frequently affected by epigenetic rather than genetic alterations [120].
Genome-wide DNA methylation profiling revealed changes in early neoplasia that can be leveraged as biomarkers using blood and feces [123]. Because they account for tumor heterogeneity, biomarker panels derived from genome-wide studies may be more advantageous than single markers. Two biomarker panels with high sensitivity and specificity in the detection of CRC are reported. One panel screens for promoter hypermethylation of CNRIP1, FBN1, INA, MAL, SNCA, and SPG20. For CRC, the combined sensitivity and specificity of at least two positives among these six genes are 94% and 98%, respectively [124]. The second panel discovered that CDO1, DCLK1, SFRP1, and ZSCAN18 were frequently methylated in 71–92% of colorectal, gastric, and pancreatic cancers [125] with a combined sensitivity and specificity of 95% and 98%, respectively.
Altered DNA methylation in IBD identified potential intestinal-specific inflammation biomarkers and predictors of colitis-associated CRC [126,127]. Current research on the DNA methylome in IBD distinguished the methylation changes due to environmental cues from those dependent on genetic susceptibility. Agliata et al. discovered that IBD intestinal methylome abnormalities are related to upstream genetic variants [128], suggesting an effector role for DNA methylation. Differentially methylated positions were also enriched in inflammation-related pathways downstream of cytokine signaling. This suggests that DNA methylation enhances the response to anti- and pro-inflammatory signals and that the IBD methylome comprises combined environmental and genetic cues [128]. Altered methylation can be used as a biomarker to predict the risk of CRC prior to the presence of dysplasia in colonic biopsies of high-risk IBD patients. When performing a genome-wide DNA methylation analysis via Illumina Human Methylation 450K BeadChip in biopsies from high-risk vs. low-risk patients, the prevalence of methylation was higher in the high-risk IBD patient population. To complement the current standard of care, a DNA methylation signature focused on five genes (SLIT2, EYA4, FLI1, SND1, USP44) may enhance the predictive value of surveillance biopsies for colitis-associated colon cancer [129]. This highlights how epigenomic studies may unravel GI disease pathogenesis and impact clinical practice.

3.3. Transcriptomics

In recent years, the advent of diverse transcriptomics methods such as scRNA-seq has enabled enormous advances in GI and liver research. One area that has flourished in the wake of the evolving omics techniques is the study of GI and liver cancers. Transcriptomics was vital to uncovering four consensus molecular subtypes (CMS) of CRC. This is perhaps the most theoretically sound classification of CRC to date and its development was highly dependent on genomic, epigenomic, transcriptomic, and proteomic analyses [130]. Using RNA-seq in conjunction with available CMS data, others identified correlations between these CRC subtypes and the presence of bacterial species (identified by 16S rRNA amplicon sequencing) [131]. Similarly, molecular subtypes were identified and validated for gastric cancer, a malignancy known for high levels of heterogeneity even within the same patient [132,133]. Clinical applications to exploit these recent advances are under investigation, with investigators using transcriptomic methods, particularly scRNA-seq, to identify markers of prognosis and therapeutic response [4,5,131].
scRNA-seq profiling of colon epithelial cells permitted the identification of previously unknown epithelial cell subtypes relevant to inflammation, as well as potential therapeutic targets. Parikh et al. compared colon epithelial cells from immunotherapy-naïve IBD patients and healthy controls. Among other findings, scRNA-seq identified spatially segregated cells in different stages of differentiation and revealed a novel absorptive colonocyte that regulates luminal pH. These investigators also observed differences in goblet cell location and transcriptional activity between samples from UC patients compared to control normal colon tissue; additional types of inflammation-associated goblet cells were found in colon tissue from subjects with IBD. Notably, WFDC2, an anti-protease expressed by goblet cells, was downregulated in those with active UC, suggesting that it plays a role in maintaining mucosal barrier integrity [134]. Whereas there has long been support for the “leaky gut” hypothesis of IBD (i.e., the idea that the primary insult of an inappropriately permeable intestinal epithelial barrier introduces luminal antigens to the innate immune system and a cascading series of events leads to intestinal autoimmunity and inflammation), this study identified a potential therapeutic target in the pathogenesis leading to an inflamed intestinal barrier [135]. Others implemented combined GWAS and scRNA-seq to identify potential risk genes and infer their functions based on cell type-specific transcriptomes in UC [136,137]. Similarly, a meta-analysis comparing non-alcoholic fatty liver disease and alcoholic liver disease (NAFLD and ALD) was able to identify a subset of similarities in differentially regulated pathways, such as the regulation of fibrosis via COL1A1 and COL3A1, which may be responsible for their similar and highly overlapping clinical phenotype [138]. Such approaches leveraging both genomic and transcriptomic methods may be particularly useful in the study of polygenic diseases such as IBD and fatty liver disease (NAFLD and ALD), which have many clinical phenotypes.
Perhaps the most revolutionary aspect of transcriptomics’ growing presence in GI research is its potential for personalized or precision medicine. As the barriers to affordability and implementation of RNA-seq are addressed, we can begin to uncover the extent of the pathophysiological diversity in GI and hepatobiliary disorders and discover new therapeutic targets.

3.4. Proteomics

Because the protein makeup of a system is so closely linked to its physiologic function, the clinical importance of proteomics is evident as we progress toward personalized medicine. The use of proteomics technologies has led to great advances in GI research, especially in IBD [139]. One great challenge in the treatment of IBD in the era of biologic drugs is finding the most suitable therapy for a particular patient, while avoiding the adverse events related to medications and the overtreatment of patients whose disease may be milder. The application of proteomics to the predicting and monitoring treatment response in IBD is new, and most of the proposed predictive markers for more commonly used biologics are of limited clinical value [139,140]. Generally, the monitoring of patient response to a newly initiated therapy is accomplished by assessing trends in inflammatory markers in serum, such as C-reactive protein, and in stool, such as fecal calprotectin [141]. Although the trending of objective biomarkers alongside patient symptoms has been shown to be superior to clinical management alone in achieving the timely escalation of care to help IBD patients achieve mucosal healing, there is a need for more specific indicators of treatment response [142]. An ongoing, large multicenter prospective cohort study may shed light on this topic [143]. Pierre et al. used a shotgun approach to identify candidate proteomic biomarkers for CD relapse either less or more than 6 months after discontinuing infliximab, as short-term relapse occurs by a different mechanism than mid/long-term relapse [144]. The panels of 15 and 17 proteins, respectively, had higher predictive capability of disease relapse than C-reactive protein or fecal calprotectin. As more predictors of disease activity and response to specific drugs are identified by proteomics methods, information like this could be applied in the clinical setting to aid in the decision making regarding the choice of treatment and its continuation or withdrawal. Because of its high-throughput capability, proteomics is also likely to be useful in drug discovery and mechanistic research, as evidenced by a recent study on potential drug targets for infliximab-resistant UC, the first reported use of proteomics in drug discovery [145].
In treating CD, it is important to be mindful of the disease phenotype, including the presence of perianal disease, and the structuring or penetrating behavior, as this can impact the disease progression and decision making related to medical treatment and surgical intervention [146]. Several recent groundbreaking studies have shown that the application of proteomics may help to uncover the mechanisms underlying the presentation of these disease phenotypes and also to predict which CD patients are at risk of developing related complications. In a pilot study by Townsend et al., eight serum protein biomarkers were identified which could differentiate the CD patients who had a surgically resected intestinal stricture from the CD patients without stricturing behavior, with an area under the curve (AUC) > 0.9; additional biomarkers distinguished them from patients with UC. Those proteins differentiating stricturing from non-stricturing CD included those involved in complement activation, fibrinolytic pathways, and lymphocyte adhesion, although correlating the proteins with a mechanism of stricturing was beyond the scope of the study [147]. In a multicenter prospective observational study, Wu et al. implemented both proteomic analysis and the conventional enzyme-linked immunosorbent assay (ELISA) to identify the biomarkers that may help predict whether a pediatric patient with the inflammatory phenotype of CD will convert to a stricturing phenotype within three years. Of the 10 candidate serum biomarkers, 4 were significantly different in abundance at the time of CD diagnosis in patients who would convert to stricturing behavior within three years versus those who would not: extracellular matrix protein 1 (ECM1), cartilage oligomeric matrix protein, matrix metalloproteinase 9, and fibronectin. High levels of ECM1 (>3900 ng/mL) at the time of diagnosis proved to be the strongest predictor of conversion to the stricturing phenotype, with an accuracy of 75.4% and a sensitivity and specificity of 80.0% and 70.7%, respectively [148]. The rapid evolution of technologies such as machine learning also benefits the application of big data techniques in GI. In another prospective cohort of pediatric CD patients, Ungaro et al. used ensemble machine learning methodology to identify a panel of serum protein biomarkers measured at diagnosis which together could predict the time to the development of a stricturing or penetrating phenotype of CD with greater reliability than serologic studies [149].
Although endoscopy with histologic evaluation of mucosal biopsy samples provides the gold standard for diagnosing IBD, non-invasive proteomic biomarkers may become useful in the classification and stratification of IBD, and potentially even diagnosis. Many studies have examined differences in protein expression profiles in the intestinal tissue and serum of patients with IBD vs. healthy controls [150,151] and of UC vs. CD [147,150]. Starr et al. studied IBD in a pediatric population, where the traditional diagnostic features are more difficult to discern than in adult patients and found that a panel of five proteins could be used to differentiate active IBD from healthy colon tissue samples with an AUC of 1.0. In the same study, a panel of 12 proteins could differentiate pediatric patients with UC from those with CD with an AUC of 0.95 in the discovery cohort; in the validation cohort of patients, the panel accurately classified 80% of patients [152]. The additional validation of proteome findings is required before they can be used as diagnostic biomarkers or to distinguish between IBD and other potential causes of symptoms that may be mistakenly attributed to IBD, particularly for those with IBD in remission. Cytomegalovirus (CMV)-induced colitis is an example of such a potential confounder. In the future, these methods may become clinically useful when it is difficult to differentiate between IBD and another form of colitis or when endoscopy is contraindicated.
In addition to IBD, proteomics may be applied to other areas of GI research. Many studies have identified various panels of proteins relevant to GI cancer progression. Li et al. examined tissue samples of precancerous gastric lesions and gastric cancer, creating a model that could predict the progression of a gastric lesion to cancer with an AUC of 0.88, outperforming the prediction based on clinical risk factors alone [153]. Research is ongoing to identify the appropriate biomarkers in irritable bowel syndrome (IBS), as the vast range of clinical presentations and the overlap with other functional disorders presents challenges to the diagnosis, treatment, and outcome monitoring of patients with IBS [154]. Some studies reported differences in the serum or intestinal proteome composition of normal controls vs. IBS patients with diarrhea, constipation, or alternating diarrhea and constipation as the predominant symptoms; these studies were limited by small sample sizes [155,156].

3.5. Microbiome

The gut microbiome represents the trillions of symbiotic bacteria, archaea, viruses, and eukaryotes, as well as their genes, residing in the GI tract that interact with the host to impact human physiology and disease [7]. The mechanisms whereby the microbiome influences host physiology is a topic of ongoing investigation, but likely includes activation of the innate immune system by microbial antigens and epigenetic modifications of host cell DNA and histones [157,158]. Because there is great inter- and intrapersonal diversity in gut microbiome composition, and it changes throughout life, a useful definition for the “healthy gut microbiome” remains elusive [7,158,159]. The key to understanding the scope of the microbiome’s role in GI physiology lies in the use of big data, which complements the culture-based methods. In monozygotic twins of concordant body size (i.e., body mass index > 30 kg/m2 or 18.5-24.9 kg/m2) and shared childhood environment, fecal 16s rRNA sequencing identified a common set of gut microbial genes shared between the twins rather than a common set of species [160].
Dysbiosis is one of many factors implicated in the etiology of IBD, but a direct causal relationship has not yet been elucidated. Proinflammatory gut microbes are associated with increased mucosal permeability, possibly through the release of toxins or the disruption of the pre-existing microbial community, thereby activating the innate immune system and inflammation [161]. Individuals with IBD have a less diverse gut microbiome compared to unaffected but otherwise matched controls, although there are conflicting reports regarding which species demonstrate altered abundance [9,161]. There is evidence that those with IBD have a lower abundance of Firmicutes in both stool and mucosal samples, which is an interesting finding given the phylum’s ability to metabolize dietary fiber to produce anti-inflammatory short-chain fatty acids [9,161,162,163]. Fewer studies found a higher abundance of mucolytic bacteria in IBD, including some Ruminococcus species, although this has been contested by others [164,165]. The roles of non-bacterial microbes in the gut should not be overlooked. The overgrowth of Candida albicans and other fungal species has been observed in IBD, particularly during a symptomatic flare; this is consistent with the proposed mechanism that C. albicans causes a Th17-mediated immune response during a disease flare [166,167]. Gut bacteriophages appear to drive both innate and adaptive immunocyte expansion through IFN-γ and Toll-like receptor signaling, and bacteriophages isolated from the feces of patients with active UC are more adept at inducing the CD4+ T cell production of IFN-γ compared to those from persons with inactive UC or healthy controls, suggesting the complicity of phages [168].
The gut microbiome has been implicated in the development of diverticular disease, but the mechanism remains unclear. Metagenomic studies attempted to profile the differences in resident microbiota in diverticular disease, but the results vary widely regarding which taxa, if any, are over- or under-represented in patients (reviewed by Ticinesi et al. [169]) [170,171,172,173]. With the added use of 1H nuclear magnetic resonance to examine metabolomes, Barbara et al. identified potential microbiota-related biomarkers relevant to diverticular disease, including higher urinary concentrations of kynureine, quinolate, and certain carbohydrates in individuals with asymptomatic diverticular disease versus controls, and lower urinary concentrations of hippurate in symptomatic diverticular disease [172]. Although probiotics are commonly prescribed to treat diverticular disease, there are insufficient data to support their use [174,175,176].
The liver also interacts with the gut microbiome as it receives microbiota-derived nutrients and toxins via hepatic circulation and regulates the innate immune response (reviewed by Wang et al. [177]). In an international, multi-center observational study, the bacterial compositions of the gut microbiomes of patients with alcohol-associated hepatitis differed from those of the healthy controls, with decreased diversity at the genus level based on the 16s rRNA sequencing of fecal samples. The investigators also found disproportionately low levels of Akkermansia and high abundances of Veillonella, both of which correlated with the model for end-stage liver disease (MELD) scores [178]. Other studies found increased relative abundances of Veillonella species in alcohol-associated cirrhosis compared to other causes of cirrhosis and in cirrhotics with hepatic encephalopathy (HE) versus those without HE [179,180]. Species of the genus Akkermansia, which support intestinal mucosal integrity and reduce hepatic fatty acid synthesis, were depleted in persons with alcohol use disorder even without cirrhosis and their quantity negatively correlated with the serum levels of pro-inflammatory cytokines and chemokines [181,182]. In mouse models, the ethanol consumption-induced depletion of Akkermansia was prevented by oral Akkermansia supplementation, leading to reductions in serum alanine transaminase, hepatic IL-1 β, and hepatic neutrophil infiltration compared to the non-supplemented mice with the same ethanol intake [183]. These studies suggest that gut microbiota play an early role in the pathogenesis of alcohol-associated liver disease (ALD) and that these microbes might be harnessed for therapeutic purposes.
One of the most well-known examples of the use of microbial supplementation as therapy is that of fecal microbiota transplant (FMT) for recurrent pseudomembranous colitis caused by Clostridioides difficile infection (CDI). In FMT, microbes are donated by transplanting the stool of a healthy donor into the intestinal tract of a recipient. There are multiple known mechanisms by which the donor microbiota restore eubiosis, including promoting the bacterial conversion of primary bile acids into secondary bile acids and re-introducing the bacterial species that produce anti-inflammatory short-chain fatty acids (reviewed by Soveral et al. [184]) [185,186]. The first case report of FMT for the treatment of CDI was published in 1983, although historical documents describe the use of FMT as early as the 4th century C.E.; with the U.S. Food and Drug Administration’s first approval of an FMT product in November 2022, we will likely experience an increase in the clinical use of and research regarding FMT [187,188]. FMT is being studied for potential application in other GI illnesses, including IBD, IBS, nonalcoholic steatohepatitis, and hepatic encephalopathy, as well as several metabolic, autoimmune, autoinflammatory, neurodegenerative, and psychiatric diseases [189,190]

3.6. Metabolomics

Metabolomic studies can provide insights into the effects of diet and the environment on GI diseases. The imbalance of intestinal microorganisms due to GI disease can lead to inflammation and the consequent metabolic disturbances, primarily mediated by intestinal microbial metabolites such as bile acids, short chain fatty acids (SCFAs), and amino acids. Metabolomics integrates these diverse signals to elucidate pathway interactions, allowing metabolites to serve as molecular readouts of cell status [191]. To understand the differences and changes in the metabolome of host cells and intestinal microorganisms and to inform potential biomarkers, metabolic profiling can be applied to an array of diseases such as IBD, IBS, gastritis, celiac disease, and fatty liver disease (NAFLD and ALD). Metabolomics can also be leveraged to understand why those with IBD and IBS continue to experience visceral hypersensitivity despite disease remission. A shift in the microbiome in IBD results in the predominance of Gram-negative organisms and a decrease in colon microbiome diversity. In mice, compared to the controls, elevated SCFA levels were detected after the induction and recovery of colitis [192]. Thus, metabolic profiling can provide insights into these diseases. In individuals with IBD vs. the healthy controls, studies have identified aberrant metabolites associated with amino acids and bile acids, creatinine, alpha-glucose, and membrane components that can serve as biomarkers in plasma, colon biopsies, or feces [193,194]. There is less research on IBS metabolomic profiles, but some studies identified differences in volatile organic compounds in fecal samples and increased esters in diarrhea-predominant IBS [195,196]. Differences in the fecal profiles of those with ALD, including decreased SCFA production due to changes in microbiome composition [197], also reveal possible disease targets.
Visceral hypersensitivity is characterized by chronic pain in the absence of inflammation. Microbial metabolites can mimic neurotransmitters by binding nociceptors in neurons, thereby modulating visceral hypersensitivity [192,198]; microbial metabolites can directly stimulate nociceptors via TRP channels [199] and increased transient receptor potential vanilloid-1 receptor (TRPV1) expression in the rectum correlates with the severity of abdominal pain [192,200]. After incubation with cultured naïve dorsal root ganglion neurons, increased SCFAs increase TRPV1 sensitization, suggesting a pro-nociceptive role for microbial metabolites [192]. Leveraging metabolomics to study disease pathogenesis and identify potential biomarkers in diseases of the GI system is likely to enhance the mechanistic understanding of these disorders. A study of the effects of exercise in NAFLD took this a step further and examined the metabolomic profiles of adipose tissue, plasma, urine, and fecal samples, allowing for tissue-specific pattern identification. Babu et al. found that adipose tissue profile shifts correlated most closely with changes in the clinical parameters, offering an example of the type of finding necessary for therapeutic targeting [201].

3.7. Medical Informatics

Medical informatics utilizes the intricacies of biomedical data derived from multi-omic studies and electronic health records (EHRs) to answer specific scientific questions relevant to the gastroenterology field. The tools within medical informatics, which include data modeling, text mining and natural language processing, and machine learning [202], can be used to create disease-based cohorts, draw associations, and conduct quality-of-care standard assessments. The EHR may contain extremely useful patient information meant to provide well-rounded patient care, allow communication between physicians, and serve a data reservoir. It comprises structured data elements, such as patient demographics and lab values, interpreted by common data models, and unstructured data elements, such as free text or narratives [203]. Through text mining and natural language processing, the unstructured data can be computerized and interpreted. The use of natural language processing has the potential to increase the accuracy of EHR case definitions in IBD, for example, because it integrates the disease information (i.e., pathologic findings) buried in narrative reports rather than solely relying on billing codes for diagnostic purposes [204]. This allows for the more efficient development of disease cohorts and opportunities for translational research [204,205]. Furthermore, machine learning approaches can be used in EHRs and omics data to unravel the underlying factors in multifactorial diseases. It has been used to predict genetic markers in IBD through GWAS [206], to classify CD patients based on their genetic signatures [207], or to integrate multi-omics data to unravel the factors that mediate intestinal dysbiosis in IBD [208]. Nonetheless, as pointed out in Section 2.7, physicians and others documenting in the EHRs commonly note typical symptoms or exposures and do not provide a comprehensive assessment of potential exposures (the ‘exposome’); this assessment is therefore subject to a broad range of biases, and the results must be interpreted carefully with a large measure of skepticism (i.e., ‘garbage in, garbage out’).
Medical informatics can also inform procedural quality standards and can create predictive risk severity models for different gastrointestinal pathologies. Mehrotra et al. used natural language processing to analyze a total of 24,157 free-text colonoscopy reports to determine provider performance based on different quality measures. Interestingly, they found a high variation of colonoscopy performances in an academic hospital, highlighting the need for better methods to assess the quality of colonoscopy procedures and consequently improve institutional quality [209]. Additionally, machine learning can be leveraged to create predictive scoring models for complex syndromes currently lacking a well-rounded clinical risk scoring system, as is the case for acute pancreatitis and upper gastrointestinal bleeding. By examining 15 markers routinely assessed in the blood from 300 patients with acute pancreatitis, Jin et al. demonstrated the machine learning model of multilayer perception—the artificial neural network reliably predicted disease severity and informed physicians in early management decisions [210]. Similarly, a machine learning model was validated and deemed superior to existing predictive risk scores in identifying the need for hospital-based interventions in very low-risk patients with upper gastrointestinal bleeding [211]. All in all, medical informatics clearly allow for the improvement and integration of clinical with multi-omics data, paving the way for the creation of GI-based large cohort studies that can improve clinical care.

3.8. Imaging Data

As large collections of medical imaging data become available, there is much interest in the applications of machine learning to the development of computer-assisted diagnostic tools. The computer-assisted detection of abnormalities in screening colonoscopy or endoscopy has the potential to impact the practice of gastroenterology, given the wide range of adenoma detection rates dependent on endoscopist and situational factors [212,213]. In an early report of real-time, video-based automated polyp detection during colonoscopy published in 2016, Tajbakhsh et al. used a machine learning system to detect polyps based on their shape and by using visual context to filter out non-polyp background structures in the images, doing so with a latency of 0.3 s, a sensitivity ranging from 48% to 88%, and 0.05 to 0.1 false positives per frame of video [214]. More recently, Klare et al. conducted a clinical trial to determine the feasibility of real-time computer-assisted polyp detection during 55 screening colonoscopies and reported no adverse clinical events. In that trial, the detection rates of the polyps and adenomas by the computer-assisted system were approximately 51% and 29%, respectively, compared to 56% and 31% by routine colonoscopy. However, the study was limited by poor detection of small and flat polyps due to the system’s training being primarily on images of large polyps [215].
The literature on real-time, computer-assisted polyp detection in colonoscopy has flourished. Wittenberg and Raithel propose two main driving forces: the growing number of publicly available colonoscopy image and video datasets and the advent of “deep learning” capabilities by artificial neural networks that allow them to be trained using those datasets [216]. For example, Billah et al. used a linear support vector machine trained to detect polyps using over 14,000 images from screening colonoscopy videos, making it one of the largest studies of its kind at the time, achieving impressive accuracy, sensitivity, and specificity (each of approximately 99%) [217]. In general, deep neural networks trained using more colonoscopy sequence images appeared to achieve higher polyp detection rates, many with over 90% accuracy [218,219]. Systematic reviews of randomized controlled trials comparing computer-assisted polyp detection systems to the standard of care found that using computer-assisted detection significantly increased the detection rates for adenomas of diverse sizes and morphologies, and more adenomas and polyps were detected [220,221,222]. Although withdrawal time during computer-assisted colonoscopies is prolonged compared to routine colonoscopy, this difference appears to be due to the time spent on mucosal biopsies [222]. Such robust evidence supporting the use of these approaches has resulted in their rapid commercialization and adoption by many clinical practices.

4. Multi-Omics Data Integration

In contrast to descriptive, single data-type approaches, integrating multi-omics data may uncover more detailed mechanistic information that can be validated with wet laboratory experiments (Figure 1). This is especially important when studying complex diseases where a singular etiology may not be responsible, and a goal is to apply the findings to precision medicine. IBD offers a quintessential, oft-studied example of how this can apply to a prevalent digestive disease. Several studies applied transcriptomics, metagenomics, and metabolomics, in the context of genetic variants associated with IBD [223,224,225,226,227], and identified potential therapeutic targets. Non-genetic associations between microbes, metabolites, and transcriptomics [228,229,230] uncovered promising diagnostic and therapeutic targets. Multi-omics approaches are an important aid for diseases involving multiple organ systems. For example, rat models of non-bacterial prostatitis revealed microbial changes associated with transcriptome and methylome changes at a single timepoint that indicated a potential role for intestinal immunity and inflammation [231]. Another example is diabetic kidney disease (DKD), where fecal microbes and metabolites correlated with increased levels of specific serum metabolites are associated with more rapid disease progression [232].
To ensure multi-omic studies are appropriately designed and conducted, multiple considerations must be weighed, most importantly the data type and sample size. Disease subtyping may have different requirements than those of biomarker identification [233], and these may differ based on the sample type characterized. Sample collection methods can impact sample size—for example, within the microbiome realm, swabs will have fewer human cells than mucosal biopsies, impacting the requirements to attain similar sequencing depth; increased depth permits smaller sample sizes. Sample collection for microbiome analysis may also influence results—microbial profiles from mucosa-based samples differ from those in stool [10,234], particularly in disease [10,235], and metabolome profiles may differ based on stool vs. serum assessment [236]. Likewise, based on the time of collection and other features, the results from rectal swabs differ from those of stool samples [237]. While these considerations apply to single-data type studies, they are particularly important for multi-omic studies, where the relationships are complex and often non-linear [238,239].
Several methods exist to integrate multi-omics data and the sample size, data type, and other study characteristics which are considerations when deciding which tools to use [233,240]. Most straightforward are simple correlation analyses, while more complex distance metrics and associated ordination methodologies are also commonly used. Network analyses take associations a step further and permit the linkage of whole pathways and groups of elements (microbes, genes, etc.), as opposed to identifying relationships at the individual level. This allows for deeper characterization, which is more likely to result in pathway identification [241,242]. Various mathematical methods exist for implementing these analyses, and each has strengths and biases. Matrix factorization methods (e.g., k-means clustering) are increasingly popular [243], with heavy reliance on artificial intelligence to eke out complex relationships; however, large sample sizes are required for this to be an effective tool [244,245]. Other clustering methods, such as hierarchical clustering, are often employed as well, particularly for taxonomic data, though other datatypes such as metabolome data are amenable to this analysis as well [246]. As described by Subramanian et al. [233], many tools have been designed for multi-omic integration such as an integrated multi-omic pipeline out of MIT [247], and others will likely be developed and refined as datasets become increasingly complex.

5. Challenges to Employing Big Data Research in GI

While the generation of large datasets for gastroenterology and hepatobiliary research increases, several hurdles must still be overcome for translational relevance. Broadly, these include the appropriate study design, analysis, and integration of data and the production of technologies that can aid in data generation, particularly at the multi-omic level. In the following sections, we provide suggested approaches to overcoming the remaining challenges, and in Figure 2, we offer an example of how a thoughtful study design can help mitigate some of these issues.

5.1. Working with Sparse Datasets

While the mathematical and interpretive implications differ across data types, sparse data are a potential concern for nearly every data type discussed in this review. Many omics data are referred to as sparse datasets due to gaps in non-zero intervals of data, and these data require tailored statistical approaches to analysis, including normalizing data to relative abundance [248]. The compositional nature of the data requires methods of analysis such as sparse CCA for correlational [249,250] and PERMANOVA for multivariate [251] comparisons. However, sparsity is not limited to omics data and can occur in clinical data too, particularly when the descriptor variables are not well distributed across categories, or there are few occurrences of the outcome of interest [252]. Across all data types, bias often overestimates likelihood (such as risk or effect modification) [252], but steps can be taken to adjust for this—the most straightforward is to simplify a model and avoid overfitting or to utilize conditional logistic regression when appropriate [253,254]. Additionally, penalized likelihood estimators can adjust for bias that may be difficult to avoid otherwise [255,256,257], though lower convergence rates may pose a problem [258]. Furthermore, a sparse data bias can be propagated by combinatory methods used for meta-analysis [259], but adjustments such as continuity correction can be applied to overcome these biases [260,261].

5.2. Accounting for Missing Data

Missing data pose a separate challenge—most statistical methodologies require complete datasets, and large datasets are often missing data. Both omics and clinical data commonly deal with this by filtration of low data variables from the dataset and missing value imputation [262]; filtration is straightforward, whereas imputation can be more difficult. Some datasets, such as microbiome data, rely so heavily on relative abundance normalization that zero-based imputation for microbes that are not expected to be present or abundant is reasonable. This is not the case for metabolomics and other datasets, where missing data can imply immeasurability rather than absence. Difficult to measure factors include volatile compounds, which are thought to play a strong role in the mechanism underlying many digestive disorders and are often missing from intestinal metabolomic data due to rapid degradation [11,12]. Others include compounds with a wide range of endogenous availability, where either side of the range may not be appropriately measured (i.e., the levels are too high or too low and a blank value is returned) [263]. Local imputation methods, such as k-nearest neighbors (KNN) and regression-based algorithms [262], as well as global clustering-based approaches [264], are common, and multiple R and Python packages can provide assistance [265,266]. Using data-appropriate algorithms is key to obtaining useful, reliable results [267], and care must be taken to consider which approaches are suitable, not only for a given data type, but also for specific datasets.

5.3. The Role of Technological Advances

Over the last decade, sequencing and other data generation technologies have rapidly improved, thereby transforming the omics and big data landscape. These developments have made the generation of large swaths of data feasible, allowing more data to be analyzed in a high-throughput fashion. To address gaps in the quality of the data produced, the analytical capability can be improved with technological advances, including improvements in the affordability of higher-throughput methods, multi-omics data generation methods, and portable devices permitting onsite data acquisition and analysis. Balancing the cost of data generation has always been of concern, particularly with high-throughput technologies [268]. As technology advances, a cost-prohibitive barrier often accompanies it. This is, in large part, to recover the costs of development. However, as technologies progress from being considered state-of-the-art to routine and widespread, they tend to become more affordable. The costs of higher-throughput data with increased depth are increasingly affordable, allowing choices such as metagenomic over 16S rRNA amplicon sequencing to become more common in microbiome analysis. Still, the importance of funding research that proposes newer, more innovative, and more expensive techniques cannot be understated. A recent “Request for Applications” (RFA) from the National Human Genome Research Institute (NHGRI; RFA-HG-22-008) seeks to fund the exploration of some of these questions more generally, while other RFAs are targeted to specific areas (e.g., RFA-AI-22-038 and multi-omics in HIV treatment and vaccination). Although academic biomedical research relies heavily on NIH support, industry funding provides an alternative revenue source, and several big biotechnology companies have shifted their funding towards projects exploring multi-omics as a means of diagnostic and therapeutic target identification and characterization. This can be seen both in the direct funding opportunities [269,270] and in the development of analysis platforms [271] and masterclasses for multi-omics analysis [272], geared towards encouraging multi-omics research in the broader bio-medical community.
Noise introduced by sample-site variation represents another barrier. Traditionally, multiple individual samples were used for varying types of data generation, and even when one sample was split into multiple components, different portions of the sample were used to generate a specific data type. When different types of data are pooled in multi-omics analysis, the use of different sampling locations can be a confounder due to differences in immediately adjacent tissues or sample sites such as blood. A well-known example in GI research is the datasets based on tissue obtained from liver biopsy; differences in liver biopsy histopathology not due to interobserver variation are common across multiple types of liver disease, and especially in cirrhosis [273,274,275]. Biopsies taken from different parts of the same cirrhotic liver may reveal substantial differences in histopathology; they can also display different transcriptional, epigenomic, metabolomic, and proteomic profiles. Another example is single vs. multi-site blood culture, where sampling methods can affect factors such as the infection positivity rate [276]. Thus, being able to utilize the same sample site for multiple types of data analysis is key to removing site variation-related noise and improving data resolution and statistical power.
Fortunately, the technologies under development will permit the collection and generation of multiple types of data at once from a single sample. Visium spatial gene expression analyzers utilize hybridization for RNA and protein co-detection, while their single-cell technologies permit simultaneous cell-surface protein analysis [277]. NanoString offers similar services with CosMx and GeoMx spatial molecular imager technologies [278]. These machines, which are compatible with several tissue types, permit flexibility in sample choice as opposed to requiring fresh samples preserved with specific protocols. However, the development of more technologies that permit co-analysis of multiple data types is needed, and this poses a hurdle that, once overcome, will revolutionize the omics world.
Lastly, sample processing techniques strongly impact data output, and delayed processing can substantially influence the data types prone to rapid fluctuations, including RNA-based, metabolite, protein, and even microbial data. RNA is prone to rapid degradation, as are certain volatile metabolic compounds, and while preservatives are frequently used, they may also prevent the use of techniques downstream. Portable devices, such as the MinION Oxford Nanopore Sequencer, permit the immediate analysis of data, thus removing the need for preservation [279] and allowing field sequencing. Swallowed “Smart Pills” can collect images or be used for sensory data collection in the GI tract [280]. Unfortunately, accuracy is often traded for convenience. Nanopore sequencers still have a high error rate [281], especially for RNA sequencing [282], making reference-free transcriptome analysis and strain detection very difficult. Furthermore, “Smart Pills” are limited in what and how many compounds they can sample, making it difficult to collect large volumes of data. Further development of portable devices will permit expanded capabilities and accuracy.

5.4. The Importance of Infrastructure and Multi-Center Initiatives

The context and potential application of findings using these methods and tools also warrant close consideration. The expansion of the infrastructure and computational ability inherent in electronic health records (EHRs) promises a new potential to use big data to augment the understanding of disease, but fragmented health care, numerous platforms, and lack of standardization of EHR data places significant limitations on this possibility. EHR platforms vary between services, clinical settings, institutions, regions, and countries, and are rarely linked in a functional manner that allows the effective collection and communication of information. Integrating data from multiple independent systems often requires labor-intensive and error-prone methods of manual data mining. In addition, bias can be introduced in subject inclusion and in the selection of relevant data points within these restrictions. Gastroenterologists assess disease activity and response to therapy in IBD using patient reports of symptoms, serum and fecal biochemical markers, endoscopic appearance, histopathology, and diagnostic imaging. The ability to analyze data points from distinct clinical, endoscopic, and radiologic systems is an essential precursor to employing methods such as machine learning to guide personalized and precise prognosis predictions and therapy selection [283]. The efforts to maximize cross-platform communication should be championed but will require notable investment in infrastructure and attention to security and privacy.
In parallel to improving the accessibility and links between health information platforms, we must endeavor to standardize the quality of data entered into these systems. This encompasses the definition of relevant data points, the frequency at which they are collected, and the accuracy of their assessment. For example, in the aforementioned multifactorial assessment of IBD disease activity, visual images are interpreted by human operators to determine the degree of inflammation present, which is subject to significant interobserver variability. Computer-aided scoring systems can offer a more objective, reliable, and reproducible assessment of endoscopic, radiologic, and histopathologic images [284,285,286]. Adhering to the collection of information in a standardized manner is another step in the direction of optimizing data across the sprawling infrastructure of health information systems.
Critical to our understanding of the power of these methods of research is awareness of the potential shortcomings. It is established that >75% of included subjects in GWAS studies are of European ancestry [287]. Lack of diversity limits our understanding of the genetic basis of disease, interferes with the generalizability of the findings, and can exacerbate health inequities. Differences in linkage disequilibrium, specific gene effects within certain populations, and gene–environmental interactions greatly impact the ability to replicate identified genetic associations across diverse populations [288]. Genetic variation among populations can affect the efficacy and safety of drugs. Thiopurines have been used to treat IBD for decades and assessing thiopurine methyltransferase (TPMT) enzyme activity prior to initiation has long been standard practice. TPMT enzyme mutants with reduced activity are associated with life-threatening thiopurine-induced leukopenia in approximately 5% of patients of European descent [289]. However, despite lower TPMT variant frequency, a higher prevalence of thiopurine-induced leukopenia was observed in Asian populations [290,291]. A GWAS conducted in the Korean population identified a NUDT15 polymorphism associated with thiopurine-induced leukopenia. The effect size of this variant was greater than that for the TPMT variants in Koreans but is found in <1% of Caucasians [292]. This illustrates the importance of including subjects diverse in ethnicity and ancestry. In addition, dissimilarities in phenotype and cultural biases across global populations can impact the presentation and measurement of complex diseases. This necessitates diverse and multidisciplinary teams in the design and implementation of these studies, as well as the interpretation and application of their findings.

5.5. The Use (or Misuse) of Longitudinal Data

The popularization of longitudinal sampling is an important next step in the big data and omics realm. Longitudinal sampling allows for the minimization of clinical heterogeneity, the characterization of sequential events, and the identification of dynamic relationships that cannot be observed in single-timepoint studies. Altogether, this fosters more insightful and mechanistically oriented and less generally descriptive research. The ability to meaningfully integrate multiple data types is key to tracking their relationships over time. Longitudinal data were used to better characterize GI [293] and non-GI [294,295] disorders and will become more widespread with increasing popularity.
The appropriate study design and integration of longitudinal multi-omics data is required for proper interpretation [296]. Bodein et al. [297] pointed out that having access to longitudinal data does not guarantee insights that are immediately obvious from superficial analyses. By re-analyzing three previously published longitudinal datasets, they demonstrated how longitudinal data can be used to characterize dynamic and even potentially causal relationships (albeit requiring mechanistic validation). Furthermore, they recommend the use of pipelines that can discern functionality using multi-layer relationships from the data, as described in their “timeOmics” package [298]. As an organ, the luminal GI tract is especially amenable to the collection of multiple, longitudinal samples. Stool samples can be collected daily or as frequently as bowel movements, and samples can be used for multiple purposes; the collection of equivalent data in sera can also be obtained daily. When applicable, multiple colon biopsies can be taken on repeat colonoscopies with minimal to no increase in adverse events [299]. If designed with foresight and using questions that are well defined, studies of the GI tract are quite amenable to longitudinal multi-omic studies to answer the complex questions described above.

5.6. Technique and Pipeline Standardization in Big Data Analysis

Lack of standardization and consistency in both technique and analytical pipelines is perhaps one of the most difficult hurdles to overcome in big data analysis. Non-standardized protocols often increase variance between study results based on assay type [300], sampling method and population [301], and research group experience [302]—even relatively minor differences in experimental conditions [303] that may be lab-specific [304], and even due to chance, both in the big data realm [305] and in broader biomedicine research [306]. For instance, intestinal microbiome and transcriptome composition assessed using stool samples can vary greatly depending on lysis methods [307]. Thus, due to inconsistent data collection and generation techniques and pipelines, reproducible results may not be attained [240,308] and, in fact, might play a larger role than currently appreciated; Xuan et al. reported that consistent results in data generation across multiple centers could be achieved when technical and analytical pipelines were heavily standardized [309]. The challenge, however, is that some degree of variability is needed to tailor experimental designs to the sample; for instance, while one set of kits may work best for samples stored in one buffer another kit may be compatible only with other buffers, and the buffers may have varying stabilities at room temperature (prior to freezing), and depending on study constraints, this may impact storage choices and thus kit choice. Another example is infection, which manifests differently in each animal model and even individual host and can make experimental reproducibility difficult [310]. Analytical pipeline choice is almost entirely dependent on dataset characteristics, study design, and other study specific factors, making it difficult, if not impossible, to obtain complete formal standardization. Nonetheless, some aspects can be more formally standardized than at present. As a field, GI and hepatology investigators can standardize several steps in the pipeline, including the following.
The collection of metadata and clinical details, when specified, can control for confounders and the comparing of differences in study populations with a greater degree of precision [311]. For instance, always including a Bristol stool score when stool samples are used or consistently including information on the presence or absence of common disorders improves comparability between studies and the ability to control for confounding variables in meta-analyses.
The standardization of serum, tissue, and other sample collection protocols can greatly improve reproducibility. For example, the sampling location can greatly affect both the microbiome [312] and the metabolome [313], depending on the storage conditions and analytical methods (e.g., 16S rRNA sequencing vs. metagenomic sequencing), etc., and homogenization of the stool should always be performed prior to sample processing and data generation. This is particularly pertinent to GI research where the stool is commonly collected and assessed.
For omics data, quality control of the data is fundamental to their reproducibility and trustworthiness [314]. Minimum standards must be more universally emphasized and enforced for publishing omics data.
The use of workflow management systems allows for the semi-standardization of analytical pipelines and decreases the major analytical variability that can drastically change interpretation [315,316,317]. These are useful both for omics [318] and clinical datasets [319].
The sample storage conditions and the technical and analytical processing steps should always be recorded in abundant detail to allow comparison between experiments. Minimizing the confounding introduced by other variables reduces dataset variability overall, thus augmenting the accuracy of the results. Large datasets or experiments with multiple replicates reduce noise while multicenter studies improve the ability to home in on true signals, for both clinical and omics data. The standardization of research pipelines can greatly improve the quality of big data in GI and hepatobiliary research [320] and is an important step in facilitating the widespread usability and reproducibility of omics and large dataset-based research.

5.7. Recognizing the Shifting Role of Clinical Trials

The use of big data has the potential to revolutionize the design of clinical research. Although randomized controlled trials (RCT) are considered the gold standard and are able to control for non-randomized confounders, they are time- and manpower-intensive, costly, and have limited ability to detect rare or long-term effects [321]. In addition, they may not reflect the real-life effects of diseases or therapeutics and cannot be conducted in some circumstances due to ethical limitations [322]. The rigors and restrictions in recruiting subjects can limit generalizability and present challenges in studying populations that are vulnerable and/or underrepresented in clinical research. Studies using pre-existing swaths of data have the advantage of analyzing datasets that are easily and rapidly accessible, depend more upon computer software than manpower, and can take into account comorbidities and medical diversity that may impact outcomes [323]. The volume and variety of data enable the investigation of uncommon events over long periods of time and can include observations in diverse populations. Additionally, the capacity to simultaneously investigate a multitude of variables and conduct multiple sensitivity analyses allows for a more robust examination of the secondary endpoints and sub-groups that are often ignored in RCTs [324].
There are concerns regarding the residual or unmeasured confounders that can result in the inability to distinguish between association and causation [323]. In this context, considering the distribution of missing data and the potential for the misclassification of confounders or even variables of interest is important [323]. The inclusion of RCT datasets, the application of the Bradford Hill criteria, and the analysis of unstructured data within EHRs with natural language processing have been proposed as potential solutions [325]. The optimal approach to harnessing the advantages and minimizing the disadvantages of each approach to research may be to apply them sequentially. RCTs can confirm or validate hypotheses generated from the analysis of big data, and large clinical dataset analysis can explore RCT findings more deeply [324].

6. Conclusions and Future Directions

A variety of holistic multi-omics approaches can be used to depict the variety of human and microbial genetic, transcriptomic, proteomic, and metabolomic events that underly physiological and pathophysiological phenomena. Thus, generating big data from multi-omic, multi-site studies has the extraordinary capability to enhance comprehensive investigations into normal and diseased GI and hepatobiliary function and, thereby, uncover important connections between these organ systems and neural, immune, and endocrine cells and gut microbial organisms. We provided specific examples to show how these approaches can be integrated to study the longitudinal effects of dietary, pharmacological, and other interventions.
Although the power of these approaches to advancing our knowledge is unprecedented, the complexity of big data use and analysis lends itself to numerous potential pitfalls and the introduction of errors in study design, execution, and analysis that can compromise both the study results and their interpretation. Although we focus on the application, integration, and analysis of big data in gastroenterology and hepatobiliary research, these issues have widespread implications. In addition to expanding and improving the technology and computerization needed to collect, curate, and interpret the reams of multi-omics data with even greater precision, accuracy, and detail, future directions must include approaches that prevent mistakes and the misuse of such complex information. Such approaches include formalizing the standardization of big data derivation and analysis—even small changes in the study protocols and research and analytical designs can result in major differences in outcome. In this context, for example, we should encourage efforts to benchmark and improve the consistency of the approaches to performing single-cell genomic studies [326,327]. The expanded training of a large cadre of bioinformaticians adept in handling, storing, analyzing, and interpreting big data must be prioritized.

Author Contributions

Conceptualization, M.A. and J.-P.R.; writing—original draft preparation, M.A., N.S.M., A.S., S.A.P., J.R. and J.-P.R.; writing—review and editing, M.A. and J.-P.R.; supervision, M.A. and J.-P.R.; project administration, M.A. and J.-P.R.; funding acquisition, J.-P.R. All authors read and agreed to the published version of the manuscript.

Funding

M.A. and N.S.M. were supported by the National Institutes of Health and the National Institute of Diabetes and Digestive and Kidney Diseases, grant number T32 DK067872-18.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Pierce, S.E.; Booms, A.; Prahl, J.; van der Schans, E.J.C.; Tyson, T.; Coetzee, G.A. Post-GWAS knowledge gap: The how, where, and when. NPJ Park. Dis. 2020, 6, 23. [Google Scholar] [CrossRef]
  2. Hullar, M.A.; Fu, B.C. Diet, the gut microbiome, and epigenetics. Cancer J. 2014, 20, 170–175. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  3. Bashiardes, S.; Zilberman-Schapira, G.; Elinav, E. Use of Metatranscriptomics in Microbiome Research. Bioinform. Biol. Insights 2016, 10, 19–25. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  4. Zhou, Y.; Guo, Y.; Wang, Y. Identification and validation of a seven-gene prognostic marker in colon cancer based on single-cell transcriptome analysis. IET Syst. Biol. 2022, 16, 72–83. [Google Scholar] [CrossRef]
  5. Zheng, L.; Yang, Y.; Cui, X. Establishing and Validating an Aging-Related Prognostic Four-Gene Signature in Colon Adenocarcinoma. Biomed. Res. Int. 2021, 2021, 4682589. [Google Scholar] [CrossRef]
  6. Brücher, B.L.D.M.; Li, Y.; Schnabel, P.; Daumer, M.; Wallace, T.J.; Kube, R.; Zilberstein, B.; Steele, S.; Voskuil, J.L.A.; Jamall, I.S. Genomics, microRNA, epigenetics, and proteomics for future diagnosis, treatment and monitoring response in upper GI cancers. Clin. Transl. Med. 2016, 5, 13. [Google Scholar] [CrossRef] [Green Version]
  7. Ursell, L.K.; Metcalf, J.L.; Parfrey, L.W.; Knight, R. Defining the human microbiome. Nutr. Rev. 2012, 70, S38–S44. [Google Scholar] [CrossRef] [Green Version]
  8. Lloyd-Price, J.; Abu-Ali, G.; Huttenhower, C. The healthy human microbiome. Genome Med. 2016, 8, 51. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  9. Lloyd-Price, J.; Arze, C.; Ananthakrishnan, A.N.; Schirmer, M.; Avila-Pacheco, J.; Poon, T.W.; Andrews, E.; Ajami, N.J.; Bonham, K.S.; Brislawn, C.J.; et al. Multi-omics of the gut microbial ecosystem in inflammatory bowel diseases. Nature 2019, 569, 655–662. [Google Scholar] [CrossRef] [PubMed]
  10. Lo Presti, A.; Zorzi, F.; Del Chierico, F.; Altomare, A.; Cocca, S.; Avola, A.; De Biasio, F.; Russo, A.; Cella, E.; Reddel, S.; et al. Fecal and Mucosal Microbiota Profiling in Irritable Bowel Syndrome and Inflammatory Bowel Disease. Front. Microbiol. 2019, 10, 1655. [Google Scholar] [CrossRef]
  11. Batterman, S.; Su, F.C.; Li, S.; Mukherjee, B.; Jia, C. Personal exposure to mixtures of volatile organic compounds: Modeling and further analysis of the RIOPA data. Res. Rep. Health Eff. Inst. 2014, 181, 3–63. [Google Scholar]
  12. Smirnov, K.S.; Maier, T.V.; Walker, A.; Heinzmann, S.S.; Forcisi, S.; Martinez, I.; Walter, J.; Schmitt-Kopplin, P. Challenges of metabolomics in human gut microbiota research. Int. J. Med. Microbiol. 2016, 306, 266–279. [Google Scholar] [CrossRef]
  13. Imperiale, T.F.; Imler, T.D. Gastroenterology and medical informatics: An evolving collaboration for quality improvement. Clin Gastroenterol. Hepatol. 2013, 11, 79–80. [Google Scholar] [CrossRef]
  14. Catlow, J.; Bray, B.; Morris, E.; Rutter, M. Power of big data to improve patient care in gastroenterology. Frontline Gastroenterol. 2022, 13, 237–244. [Google Scholar] [CrossRef]
  15. Olivera, P.; Danese, S.; Jay, N.; Natoli, G.; Peyrin-Biroulet, L. Big data in IBD: A look into the future. Nat. Rev. Gastroenterol. Hepatol. 2019, 16, 312–321. [Google Scholar] [CrossRef] [PubMed]
  16. Haseman, J.K.; Clark, A.-M.; Holden, H.E. Carcinogenicity results for 114 laboratory animal studies used to assess the predictivity of four in vitro genetic toxicity assays for rodent carcinogenicity. Environ. Mol. Mutagen. 1990, 16, 15–31. [Google Scholar] [CrossRef] [PubMed]
  17. Scott, M.M.; Wylie, C.J.; Lerch, J.K.; Murphy, R.; Lobur, K.; Herlitze, S.; Jiang, W.; Conlon, R.A.; Strowbridge, B.W.; Deneris, E.S. A genetic approach to access serotonin neurons for in vivo and in vitro studies. Proc. Natl. Acad. Sci. USA 2005, 102, 16472–16477. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  18. Chae, J.J.; Cho, Y.-H.; Lee, G.-S.; Cheng, J.; Liu, P.P.; Feigenbaum, L.; Katz, S.I.; Kastner, D.L. Gain-of-Function Pyrin Mutations Induce NLRP3 Protein-Independent Interleukin-1β Activation and Severe Autoinflammation in Mice. Immunity 2011, 34, 755–768. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  19. Henderson, C.; Goldbach-Mansky, R. Monogenic IL-1 mediated autoinflammatory and immunodeficiency syndromes: Finding the right balance in response to danger signals. Clin. Immunol. 2010, 135, 210–222. [Google Scholar] [CrossRef] [Green Version]
  20. Peltonen, L.; Perola, M.; Naukkarinen, J.; Palotie, A. Lessons from studying monogenic disease for common disease. Hum. Mol. Genet. 2006, 15, R67–R74. [Google Scholar] [CrossRef] [Green Version]
  21. Manthiram, K.; Zhou, Q.; Aksentijevich, I.; Kastner, D.L. The monogenic autoinflammatory diseases define new pathways in human innate immunity and inflammation. Nat. Immunol. 2017, 18, 832–842. [Google Scholar] [CrossRef] [PubMed]
  22. Lutz, M.W.; Chiba-Falek, O. Bioinformatics pipeline to guide late-onset Alzheimer’s disease (LOAD) post-GWAS studies: Prioritizing transcription regulatory variants within LOAD-associated regions. Alzheimer Dement. Transl. Res. Clin. Interv. 2022, 8, e12244. [Google Scholar] [CrossRef] [PubMed]
  23. Johnson, R.C.; Nelson, G.W.; Troyer, J.L.; Lautenberger, J.A.; Kessing, B.D.; Winkler, C.A.; O’Brien, S.J. Accounting for multiple comparisons in a genome-wide association study (GWAS). BMC Genom. 2010, 11, 724. [Google Scholar] [CrossRef] [Green Version]
  24. Fadista, J.; Manning, A.K.; Florez, J.C.; Groop, L. The (in)famous GWAS P-value threshold revisited and updated for low-frequency variants. Eur. J. Hum. Genet. 2016, 24, 1202–1205. [Google Scholar] [CrossRef] [Green Version]
  25. Mitha, F.; Herodotou, H.; Borisov, N.; Jiang, C.; Yoder, J.; Owzar, K. SNPpy—Database Management for SNP Data from GWAS Studies. Duke Biostat. Bioinform. BB No. Pap. 2011, 14, 19. [Google Scholar]
  26. Begum, F.; Ghosh, D.; Tseng, G.C.; Feingold, E. Comprehensive literature review and statistical considerations for GWAS meta-analysis. Nucleic Acids Res. 2012, 40, 3777–3784. [Google Scholar] [CrossRef] [Green Version]
  27. Zhang, X.; Johnson, A.D.; Hendricks, A.E.; Hwang, S.-J.; Tanriverdi, K.; Ganesh, S.K.; Smith, N.L.; Peyser, P.A.; Freedman, J.E.; O’Donnell, C.J. Genetic associations with expression for genes implicated in GWAS studies for atherosclerotic cardiovascular disease and blood phenotypes. Hum. Mol. Genet. 2013, 23, 782–795. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  28. Lin, B.D.; Willemsen, G.; Fedko, I.O.; Jansen, R.; Penninx, B.; de Geus, E.; Kluft, C.; Hottenga, J.; Boomsma, D.I. Heritability and GWAS Studies for Monocyte–Lymphocyte Ratio. Twin Res. Hum. Genet. 2017, 20, 97–107. [Google Scholar] [CrossRef]
  29. González-Castro, T.B.; Tovilla-Zárate, C.A.; Genis-Mendoza, A.D.; Juárez-Rojop, I.E.; Nicolini, H.; López-Narváez, M.L.; Martínez-Magaña, J.J. Identification of gene ontology and pathways implicated in suicide behavior: Systematic review and enrichment analysis of GWAS studies. Am. J. Med. Genet. Part B Neuropsychiatr. Genet. 2019, 180, 320–329. [Google Scholar] [CrossRef]
  30. Voskarides, K.; Chatzittofis, A. GWAS studies reveal a possible genetic link between cancer and suicide attempt. Sci. Rep. 2019, 9, 18290. [Google Scholar] [CrossRef] [Green Version]
  31. Farashi, S.; Kryza, T.; Clements, J.; Batra, J. Post-GWAS in prostate cancer: From genetic association to biological contribution. Nat. Rev. Cancer 2019, 19, 46–59. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  32. Freedman, M.L.; Monteiro, A.N.A.; Gayther, S.A.; Coetzee, G.A.; Risch, A.; Plass, C.; Casey, G.; De Biasi, M.; Carlson, C.; Duggan, D.; et al. Principles for the post-GWAS functional characterization of cancer risk loci. Nat. Genet. 2011, 43, 513–518. [Google Scholar] [CrossRef] [PubMed]
  33. Cao, C.; Moult, J. GWAS and drug targets. BMC Genom. 2014, 15, S5. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  34. Bernstein, B.E.; Stamatoyannopoulos, J.A.; Costello, J.F.; Ren, B.; Milosavljevic, A.; Meissner, A.; Kellis, M.; Marra, M.A.; Beaudet, A.L.; Ecker, J.R.; et al. The NIH Roadmap Epigenomics Mapping Consortium. Nat. Biotechnol. 2010, 28, 1045–1048. [Google Scholar] [CrossRef] [Green Version]
  35. Handy, D.E.; Castro, R.; Loscalzo, J. Epigenetic modifications: Basic mechanisms and role in cardiovascular disease. Circulation 2011, 123, 2145–2156. [Google Scholar] [CrossRef] [Green Version]
  36. Jirtle, R.L.; Skinner, M.K. Environmental epigenomics and disease susceptibility. Nat. Rev. Genet. 2007, 8, 253–262. [Google Scholar] [CrossRef] [Green Version]
  37. Suzuki, M.M.; Bird, A. DNA methylation landscapes: Provocative insights from epigenomics. Nat. Rev. Genet. 2008, 9, 465–476. [Google Scholar] [CrossRef]
  38. Callinan, P.A.; Feinberg, A.P. The emerging science of epigenomics. Hum. Mol. Genet. 2006, 15, R95–R101. [Google Scholar] [CrossRef]
  39. Florean, C.; Schnekenburger, M.; Grandjenette, C.; Dicato, M.; Diederich, M. Epigenomics of leukemia: From mechanisms to therapeutic applications. Epigenomics 2011, 3, 581–609. [Google Scholar] [CrossRef]
  40. Fraga, M.F.; Ballestar, E.; Villar-Garea, A.; Boix-Chornet, M.; Espada, J.; Schotta, G.; Bonaldi, T.; Haydon, C.; Ropero, S.; Petrie, K. Loss of acetylation at Lys16 and trimethylation at Lys20 of histone H4 is a common hallmark of human cancer. Nat. Genet. 2005, 37, 391–400. [Google Scholar] [CrossRef]
  41. Boyd-Kirkup, J.D.; Green, C.D.; Wu, G.; Wang, D.; Han, J.-D.J. Epigenomics and the regulation of aging. Epigenomics 2013, 5, 205–227. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  42. Huang, Y.T.; Maccani, J.Z.J.; Hawley, N.L.; Wing, R.R.; Kelsey, K.T.; McCaffery, J.M. Epigenetic patterns in successful weight loss maintainers: A pilot study. Int. J. Obes. 2015, 39, 865–868. [Google Scholar] [CrossRef] [PubMed]
  43. Contreras, R.E.; Schriever, S.C.; Pfluger, P.T. Physiological and Epigenetic Features of Yoyo Dieting and Weight Control. Front. Genet. 2019, 10, 1015. [Google Scholar] [CrossRef] [Green Version]
  44. Heijmans, B.T.; Tobi, E.W.; Stein, A.D.; Putter, H.; Blauw, G.J.; Susser, E.S.; Slagboom, P.E.; Lumey, L.H. Persistent epigenetic differences associated with prenatal exposure to famine in humans. Proc. Natl. Acad. Sci. USA 2008, 105, 17046–17049. [Google Scholar] [CrossRef] [Green Version]
  45. Wang, Z.; Gerstein, M.; Snyder, M. RNA-Seq: A revolutionary tool for transcriptomics. Nat. Rev. Genet. 2009, 10, 57–63. [Google Scholar] [CrossRef]
  46. Wilhelm, B.T.; Landry, J.-R. RNA-Seq—Quantitative measurement of expression through massively parallel RNA-sequencing. Methods 2009, 48, 249–257. [Google Scholar] [CrossRef] [PubMed]
  47. Koenitzer, J.R.; Wu, H.; Atkinson, J.J.; Brody, S.L.; Humphreys, B.D. Single-nucleus RNA-sequencing profiling of mouse lung. Reduced dissociation bias and improved rare cell-type detection compared with single-cell RNA sequencing. Am. J. Respir. Cell Mol. Biol. 2020, 63, 739–747. [Google Scholar] [CrossRef]
  48. Clark, J.Z.; Chen, L.; Chou, C.-L.; Jung, H.J.; Lee, J.W.; Knepper, M.A. Representation and relative abundance of cell-type selective markers in whole-kidney RNA-Seq data. Kidney Int. 2019, 95, 787–796. [Google Scholar] [CrossRef]
  49. Diaz-Mejia, J.J.; Meng, E.C.; Pico, A.R.; MacParland, S.A.; Ketela, T.; Pugh, T.J.; Bader, G.D.; Morris, J.H. Evaluation of methods to assign cell type labels to cell clusters from single-cell RNA-sequencing data. F1000Res 2019, 8, 296. [Google Scholar] [CrossRef]
  50. Yap, M.; Johnston, R.L.; Foley, H.; MacDonald, S.; Kondrashova, O.; Tran, K.A.; Nones, K.; Koufariotis, L.T.; Bean, C.; Pearson, J.V.; et al. Verifying explainability of a deep learning tissue classifier trained on RNA-seq data. Sci. Rep. 2021, 11, 2641. [Google Scholar] [CrossRef]
  51. Aevermann, B.; Zhang, Y.; Novotny, M.; Keshk, M.; Bakken, T.; Miller, J.; Hodge, R.; Lelieveldt, B.; Lein, E.; Scheuermann, R.H. A machine learning method for the discovery of minimum marker gene combinations for cell type identification from single-cell RNA sequencing. Genome Res. 2021, 31, 1767–1780. [Google Scholar] [CrossRef]
  52. Schmid, M.W.; Schmidt, A.; Klostermeier, U.C.; Barann, M.; Rosenstiel, P.; Grossniklaus, U. A Powerful Method for Transcriptional Profiling of Specific Cell Types in Eukaryotes: Laser-Assisted Microdissection and RNA Sequencing. PLoS ONE 2012, 7, e29685. [Google Scholar] [CrossRef] [PubMed]
  53. Sicherman, J.; Newton, D.F.; Pavlidis, P.; Sibille, E.; Tripathy, S.J. Estimating and Correcting for Off-Target Cellular Contamination in Brain Cell Type Specific RNA-Seq Data. Front. Mol. Neurosci. 2021, 14, 637143. [Google Scholar] [CrossRef] [PubMed]
  54. Kiselev, V.Y.; Andrews, T.S.; Hemberg, M. Challenges in unsupervised clustering of single-cell RNA-seq data. Nat. Rev. Genet. 2019, 20, 273–282. [Google Scholar] [CrossRef] [PubMed]
  55. Kotliar, D.; Veres, A.; Nagy, M.A.; Tabrizi, S.; Hodis, E.; Melton, D.A.; Sabeti, P.C. Identifying gene expression programs of cell-type identity and cellular activity with single-cell RNA-Seq. Elife 2019, 8, e43803. [Google Scholar] [CrossRef]
  56. Sheng, Q.; Vickers, K.; Zhao, S.; Wang, J.; Samuels, D.C.; Koues, O.; Shyr, Y.; Guo, Y. Multi-perspective quality control of Illumina RNA sequencing data analysis. Brief. Funct. Genom. 2017, 16, 194–204. [Google Scholar] [CrossRef] [Green Version]
  57. Conesa, A.; Madrigal, P.; Tarazona, S.; Gomez-Cabrero, D.; Cervera, A.; McPherson, A.; Szcześniak, M.W.; Gaffney, D.J.; Elo, L.L.; Zhang, X.; et al. A survey of best practices for RNA-seq data analysis. Genome Biol. 2016, 17, 13. [Google Scholar] [CrossRef] [Green Version]
  58. Ma, W.; Su, K.; Wu, H. Evaluation of some aspects in supervised cell type identification for single-cell RNA-seq: Classifier, feature selection, and reference construction. Genome Biol. 2021, 22, 264. [Google Scholar] [CrossRef]
  59. Wang, C.; Gao, X.; Liu, J. Impact of data preprocessing on cell-type clustering based on single-cell RNA-seq data. BMC Bioinform. 2020, 21, 440. [Google Scholar] [CrossRef]
  60. Usoskin, D.; Furlan, A.; Islam, S.; Abdo, H.; Lönnerberg, P.; Lou, D.; Hjerling-Leffler, J.; Haeggström, J.; Kharchenko, O.; Kharchenko, P.V.; et al. Unbiased classification of sensory neuron types by large-scale single-cell RNA sequencing. Nat. Neurosci. 2015, 18, 145–153. [Google Scholar] [CrossRef]
  61. Cheung, L.Y.M.; George, A.S.; McGee, S.R.; Daly, A.Z.; Brinkmeier, M.L.; Ellsworth, B.S.; Camper, S.A. Single-Cell RNA Sequencing Reveals Novel Markers of Male Pituitary Stem Cells and Hormone-Producing Cell Types. Endocrinology 2018, 159, 3910–3924. [Google Scholar] [CrossRef] [Green Version]
  62. Lu, T.; Mar, J.C. Investigating transcriptome-wide sex dimorphism by multi-level analysis of single-cell RNA sequencing data in ten mouse cell types. Biol. Sex Differ. 2020, 11, 61. [Google Scholar] [CrossRef] [PubMed]
  63. Lefferts, A.R.; Regner, E.H.; Stahly, A.; O’Rourke, B.; Gerich, M.E.; Fennimore, B.P.; Scott, F.I.; Freeman, A.E.; Jones, K.; Kuhn, K.A. Circulating mature granzyme B+ T cells distinguish Crohn’s disease-associated axial spondyloarthritis from axial spondyloarthritis and Crohn’s disease. Arthritis Res. Ther. 2021, 23, 147. [Google Scholar] [CrossRef] [PubMed]
  64. Liu, Y.; Beyer, A.; Aebersold, R. On the dependency of cellular protein levels on mRNA abundance. Cell 2016, 165, 535–550. [Google Scholar] [CrossRef] [Green Version]
  65. Aslam, B.; Basit, M.; Nisar, M.A.; Khurshid, M.; Rasool, M.H. Proteomics: Technologies and Their Applications. J. Chromatogr. Sci. 2017, 55, 182–196. [Google Scholar] [CrossRef] [Green Version]
  66. Schubert, O.T.; Röst, H.L.; Collins, B.C.; Rosenberger, G.; Aebersold, R. Quantitative proteomics: Challenges and opportunities in basic and applied research. Nat. Protoc. 2017, 12, 1289–1294. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  67. Larance, M.; Lamond, A.I. Multidimensional proteomics for cell biology. Nat. Rev. Mol. Cell Biol. 2015, 16, 269–280. [Google Scholar] [CrossRef] [Green Version]
  68. Shruthi, B.S.; Vinodhkumar, P.; Selvamani. Proteomics: A new perspective for cancer. Adv. Biomed. Res. 2016, 5, 67. [Google Scholar] [CrossRef]
  69. Yanovich, G.; Agmon, H.; Harel, M.; Sonnenblick, A.; Peretz, T.; Geiger, T. Clinical Proteomics of Breast Cancer Reveals a Novel Layer of Breast Cancer Classification. Cancer Res. 2018, 78, 6001–6010. [Google Scholar] [CrossRef] [Green Version]
  70. Tyanova, S.; Albrechtsen, R.; Kronqvist, P.; Cox, J.; Mann, M.; Geiger, T. Proteomic maps of breast cancer subtypes. Nat. Commun. 2016, 7, 10259. [Google Scholar] [CrossRef] [Green Version]
  71. Beretov, J.; Wasinger, V.C.; Millar, E.K.A.; Schwartz, P.; Graham, P.H.; Li, Y. Proteomic Analysis of Urine to Identify Breast Cancer Biomarker Candidates Using a Label-Free LC-MS/MS Approach. PLoS ONE 2015, 10, e0141876. [Google Scholar] [CrossRef] [PubMed]
  72. Lawrence, R.T.; Perez, E.; Hernández, D.; Miller, C.P.; Haas, K.M.; Irie, H.Y.; Lee, S.-I.; Blau, C.A.; Villén, J. The Proteomic Landscape of Triple-Negative Breast Cancer. Cell Rep. 2015, 11, 630–644. [Google Scholar] [CrossRef]
  73. Methé, B.A.; Nelson, K.E.; Pop, M.; Creasy, H.H.; Giglio, M.G.; Huttenhower, C.; Gevers, D.; Petrosino, J.F.; Abubucker, S.; Badger, J.H.; et al. A framework for human microbiome research. Nature 2012, 486, 215–221. [Google Scholar] [CrossRef] [Green Version]
  74. Ravel, J.; Gajer, P.; Abdo, Z.; Schneider, G.M.; Koenig, S.S.K.; McCulle, S.L.; Karlebach, S.; Gorle, R.; Russell, J.; Tacket, C.O.; et al. Vaginal microbiome of reproductive-age women. Proc. Natl. Acad. Sci. USA 2011, 108, 4680–4687. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  75. Shreiner, A.B.; Kao, J.Y.; Young, V.B. The gut microbiome in health and in disease. Curr. Opin. Gastroenterol. 2015, 31, 69–75. [Google Scholar] [CrossRef] [PubMed]
  76. Turnbaugh, P.J.; Hamady, M.; Yatsunenko, T.; Cantarel, B.L.; Duncan, A.; Ley, R.E.; Sogin, M.L.; Jones, W.J.; Roe, B.A.; Affourtit, J.P.; et al. A core gut microbiome in obese and lean twins. Nature 2009, 457, 480–484. [Google Scholar] [CrossRef] [Green Version]
  77. Xu, Z.; Knight, R. Dietary effects on human gut microbiome diversity. Br. J. Nutr. 2015, 113, S1–S5. [Google Scholar] [CrossRef] [Green Version]
  78. Durazzi, F.; Sala, C.; Castellani, G.; Manfreda, G.; Remondini, D.; De Cesare, A. Comparison between 16S rRNA and shotgun sequencing data for the taxonomic characterization of the gut microbiota. Sci. Rep. 2021, 11, 3030. [Google Scholar] [CrossRef]
  79. Proctor, L.M.; Creasy, H.H.; Fettweis, J.M.; Lloyd-Price, J.; Mahurkar, A.; Zhou, W.; Buck, G.A.; Snyder, M.P.; Strauss, J.F.; Weinstock, G.M.; et al. The Integrative Human Microbiome Project. Nature 2019, 569, 641–648. [Google Scholar] [CrossRef] [Green Version]
  80. Yap, C.X.; Henders, A.K.; Alvares, G.A.; Wood, D.L.A.; Krause, L.; Tyson, G.W.; Restuadi, R.; Wallace, L.; McLaren, T.; Hansell, N.K.; et al. Autism-related dietary preferences mediate autism-gut microbiome associations. Cell 2021, 184, 5916–5931.e17. [Google Scholar] [CrossRef]
  81. Scher, J.U.; Abramson, S.B. The microbiome and rheumatoid arthritis. Nat. Rev. Rheumatol. 2011, 7, 569–578. [Google Scholar] [CrossRef] [PubMed]
  82. Elinav, E.; Garrett, W.S.; Trinchieri, G.; Wargo, J. The cancer microbiome. Nat. Rev. Cancer 2019, 19, 371–376. [Google Scholar] [CrossRef]
  83. Gilbert, J.A.; Blaser, M.J.; Caporaso, J.G.; Jansson, J.K.; Lynch, S.V.; Knight, R. Current understanding of the human microbiome. Nat. Med. 2018, 24, 392–400. [Google Scholar] [CrossRef]
  84. Wishart, D.S.; Tzur, D.; Knox, C.; Eisner, R.; Guo, A.C.; Young, N.; Cheng, D.; Jewell, K.; Arndt, D.; Sawhney, S. HMDB: The human metabolome database. Nucleic Acids Res. 2007, 35, D521–D526. [Google Scholar] [CrossRef] [PubMed]
  85. Wishart, D.S. Metabolomics for Investigating Physiological and Pathophysiological Processes. Physiol. Rev. 2019, 99, 1819–1875. [Google Scholar] [CrossRef] [PubMed]
  86. Emwas, A.-H.; Roy, R.; McKay, R.T.; Tenori, L.; Saccenti, E.; Gowda, G.A.N.; Raftery, D.; Alahmari, F.; Jaremko, L.; Jaremko, M.; et al. NMR Spectroscopy for Metabolomics Research. Metabolites 2019, 9, 123. [Google Scholar] [CrossRef] [Green Version]
  87. Alseekh, S.; Aharoni, A.; Brotman, Y.; Contrepois, K.; D’Auria, J.; Ewald, J.; Ewald, J.C.; Fraser, P.D.; Giavalisco, P.; Hall, R.D.; et al. Mass spectrometry-based metabolomics: A guide for annotation, quantification and best reporting practices. Nat. Methods 2021, 18, 747–756. [Google Scholar] [CrossRef]
  88. van Wietmarschen, H.A.; Dai, W.; van der Kooij, A.J.; Reijmers, T.H.; Schroën, Y.; Wang, M.; Xu, Z.; Wang, X.; Kong, H.; Xu, G.; et al. Characterization of Rheumatoid Arthritis Subtypes Using Symptom Profiles, Clinical Chemistry and Metabolomics Measurements. PLoS ONE 2012, 7, e44331. [Google Scholar] [CrossRef] [Green Version]
  89. Rasmiena, A.A.; Ng, T.W.; Meikle, P.J. Metabolomics and ischaemic heart disease. Clin. Sci. 2012, 124, 289–306. [Google Scholar] [CrossRef] [Green Version]
  90. Tounta, V.; Liu, Y.; Cheyne, A.; Larrouy-Maumus, G. Metabolomics in infectious diseases and drug discovery. Mol. Omics 2021, 17, 376–393. [Google Scholar] [CrossRef]
  91. Schmidt, D.R.; Patel, R.; Kirsch, D.G.; Lewis, C.A.; Vander Heiden, M.G.; Locasale, J.W. Metabolomics in cancer research and emerging applications in clinical oncology. CA Cancer J. Clin. 2021, 71, 333–358. [Google Scholar] [CrossRef]
  92. Vernocchi, P.; Del Chierico, F.; Putignani, L. Gut Microbiota Profiling: Metabolomics Based Approach to Unravel Compounds Affecting Human Health. Front. Microbiol. 2016, 7, 1144. [Google Scholar] [CrossRef] [PubMed]
  93. Harrell, F.E. Regression modeling strategies. Bios 2017, 330, 14. [Google Scholar]
  94. Gefen, D.; Straub, D.; Boudreau, M.-C. Structural equation modeling and regression: Guidelines for research practice. Commun. Assoc. Inf. Syst. 2000, 4, 7. [Google Scholar] [CrossRef] [Green Version]
  95. Nunez, E.; Steyerberg, E.W.; Nunez, J. Regression modeling strategies. Rev. Española De Cardiol. 2011, 64, 501–507. [Google Scholar]
  96. Nelson, P.T.; Abner, E.L.; Schmitt, F.A.; Kryscio, R.J.; Jicha, G.A.; Smith, C.D.; Davis, D.G.; Poduska, J.W.; Patel, E.; Mendiondo, M.S.; et al. Modeling the Association between 43 Different Clinical and Pathological Variables and the Severity of Cognitive Impairment in a Large Autopsy Cohort of Elderly Persons. Brain Pathol. 2010, 20, 66–79. [Google Scholar] [CrossRef] [Green Version]
  97. Hanauer, D.A.; Ramakrishnan, N. Modeling temporal relationships in large scale clinical associations. J. Am. Med. Inform. Assoc. 2012, 20, 332–341. [Google Scholar] [CrossRef] [Green Version]
  98. Shouval, R.; Bondi, O.; Mishan, H.; Shimoni, A.; Unger, R.; Nagler, A. Application of machine learning algorithms for clinical predictive modeling: A data-mining approach in SCT. Bone Marrow Transplant. 2014, 49, 332–337. [Google Scholar] [CrossRef]
  99. Gamal, A.; Barakat, S.; Rezk, A. Standardized electronic health record data modeling and persistence: A comparative review. J. Biomed. Inform. 2021, 114, 103670. [Google Scholar] [CrossRef]
  100. Wu, W.-T.; Li, Y.-J.; Feng, A.-Z.; Li, L.; Huang, T.; Xu, A.-D.; Lyu, J. Data mining in clinical big data: The frequently used databases, steps, and methodological models. Mil. Med. Res. 2021, 8, 44. [Google Scholar] [CrossRef]
  101. Johnson, S.B. Generic data modeling for clinical repositories. J. Am. Med. Inf. Assoc. 1996, 3, 328–339. [Google Scholar] [CrossRef] [Green Version]
  102. Aiello, M.; Cavaliere, C.; D’Albore, A.; Salvatore, M. The Challenges of Diagnostic Imaging in the Era of Big Data. J. Clin. Med. 2019, 8, 316. [Google Scholar] [CrossRef] [Green Version]
  103. Cao, L.; Juan, P.; Zhang, Y. Real-Time Deconvolution with GPU and Spark for Big Imaging Data Analysis. In Proceedings of the Algorithms and Architectures for Parallel Processing, Cham, Zhangjiajie, China, 18–20 November 2015; pp. 240–250. [Google Scholar]
  104. Mitra, P.P.; Pesaran, B. Analysis of Dynamic Brain Imaging Data. Biophys. J. 1999, 76, 691–708. [Google Scholar] [CrossRef] [Green Version]
  105. Giovannucci, A.; Friedrich, J.; Gunn, P.; Kalfon, J.; Brown, B.L.; Koay, S.A.; Taxidis, J.; Najafi, F.; Gauthier, J.L.; Zhou, P.; et al. CaImAn an open source tool for scalable calcium imaging data analysis. eLife 2019, 8, e38173. [Google Scholar] [CrossRef] [PubMed]
  106. Peng, H. Bioimage informatics: A new area of engineering biology. Bioinformatics 2008, 24, 1827–1836. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  107. Pietro, C.; Luciano, C.; Giuseppe Lucio, C.; Francesco, D.; Pasquale, I.; Miriam, I.; Pierangelo, V.; Ester, Z. Bioinformatics Solutions for Image Data Processing. In Medical and Biological Image Analysis; Robert, K., Ed.; IntechOpen: Rijeka, Croatia, 2018. [Google Scholar] [CrossRef] [Green Version]
  108. Thompson, B.A.; Spurdle, A.B.; Plazzer, J.P.; Greenblatt, M.S.; Akagi, K.; Al-Mulla, F.; Bapat, B.; Bernstein, I.; Capellá, G.; den Dunnen, J.T.; et al. Application of a 5-tiered scheme for standardized classification of 2,360 unique mismatch repair gene variants in the InSiGHT locus-specific database. Nat. Genet 2014, 46, 107–115. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  109. Dinu, D.; Dobre, M.; Panaitescu, E.; Bîrlă, R.; Iosif, C.; Hoara, P.; Caragui, A.; Boeriu, M.; Constantinoiu, S.; Ardeleanu, C. Prognostic significance of KRAS gene mutations in colorectal cancer--preliminary study. J. Med. Life 2014, 7, 581–587. [Google Scholar] [PubMed]
  110. Ahmed, M.H.; El Henawy, O.; ElShennawy, E.M.; Mahros, A.M. Clinical and genetic characterization of familial Mediterranean fever among a cohort of Egyptian patients. Prz Gastroenterol. 2022, 17, 240–244. [Google Scholar] [CrossRef]
  111. Husby, S.; Murray, J.A.; Katzka, D.A. AGA Clinical Practice Update on Diagnosis and Monitoring of Celiac Disease-Changing Utility of Serology and Histologic Measures: Expert Review. Gastroenterology 2019, 156, 885–889. [Google Scholar] [CrossRef]
  112. Doolan, A.; Donaghue, K.; Fairchild, J.; Wong, M.; Williams, A.J. Use of HLA Typing in Diagnosing Celiac Disease in Patients With Type 1 Diabetes. Diabetes Care 2005, 28, 806–809. [Google Scholar] [CrossRef] [Green Version]
  113. Jostins, L.; Ripke, S.; Weersma, R.K.; Duerr, R.H.; McGovern, D.P.; Hui, K.Y.; Lee, J.C.; Schumm, L.P.; Sharma, Y.; Anderson, C.A.; et al. Host-microbe interactions have shaped the genetic architecture of inflammatory bowel disease. Nature 2012, 491, 119–124. [Google Scholar] [CrossRef] [Green Version]
  114. McGovern, D.P.; van Heel, D.A.; Ahmad, T.; Jewell, D.P. NOD2 (CARD15), the first susceptibility gene for Crohn’s disease. Gut 2001, 49, 752–754. [Google Scholar] [CrossRef]
  115. Yao, Q.; Zhou, L.; Cusumano, P.; Bose, N.; Piliang, M.; Jayakar, B.; Su, L.C.; Shen, B. A new category of autoinflammatory disease associated with NOD2 gene mutations. Arthritis Res. 2011, 13, R148. [Google Scholar] [CrossRef]
  116. Bakirli, H.; Bakirova, G.; Alhwaymel, N.; Jaber, M.; Jezovit, M.; Issaoui, D.; Nemcek, M.; Bakirli, I.; Sami, M.; Bakirov, I. Concomitant Presentation of Acute Acalculous Cholecystitis and Acute Colitis in a Patient with Behcet’s Disease. Cureus 2022, 14, e31295. [Google Scholar] [CrossRef]
  117. Ferrante, A.; Ciccia, F.; Principato, A.; Giardina, A.R.; Impastato, R.; Peralta, S.; Triolo, G. A Th1 but not a Th17 response is present in the gastrointestinal involvement of Behçet’s disease. Clin. Exp. Rheumatol. 2010, 28, S27–S30. [Google Scholar]
  118. Samadder, N.J.; Riegert-Johnson, D.; Boardman, L.; Rhodes, D.; Wick, M.; Okuno, S.; Kunze, K.L.; Golafshar, M.; Uson, P.L.S., Jr.; Mountjoy, L.; et al. Comparison of Universal Genetic Testing vs. Guideline-Directed Targeted Testing for Patients With Hereditary Cancer Syndrome. JAMA Oncol. 2021, 7, 230–237. [Google Scholar] [CrossRef]
  119. Uson, P.L.S., Jr.; Riegert-Johnson, D.; Boardman, L.; Kisiel, J.; Mountjoy, L.; Patel, N.; Lizaola-Mayo, B.; Borad, M.J.; Ahn, D.; Sonbol, M.B.; et al. Germline Cancer Susceptibility Gene Testing in Unselected Patients With Colorectal Adenocarcinoma: A Multicenter Prospective Study. Clin. Gastroenterol. Hepatol. 2022, 20, e508–e528. [Google Scholar] [CrossRef] [PubMed]
  120. Yoda, Y.; Takeshima, H.; Niwa, T.; Kim, J.G.; Ando, T.; Kushima, R.; Sugiyama, T.; Katai, H.; Noshiro, H.; Ushijima, T. Integrated analysis of cancer-related pathways affected by genetic and epigenetic alterations in gastric cancer. Gastric Cancer 2015, 18, 65–76. [Google Scholar] [CrossRef]
  121. Arnold, C.N.; Goel, A.; Boland, C.R. Role of hMLH1 promoter hypermethylation in drug resistance to 5-fluorouracil in colorectal cancer cell lines. Int. J. Cancer 2003, 106, 66–73. [Google Scholar] [CrossRef] [PubMed]
  122. Ward, R.L.; Dobbins, T.; Lindor, N.M.; Rapkins, R.W.; Hitchins, M.P. Identification of constitutional MLH1 epimutations and promoter variants in colorectal cancer patients from the Colon Cancer Family Registry. Genet. Med. 2013, 15, 25–35. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  123. Hanley, M.P.; Hahn, M.A.; Li, A.X.; Wu, X.; Lin, J.; Wang, J.; Choi, A.H.; Ouyang, Z.; Fong, Y.; Pfeifer, G.P.; et al. Genome-wide DNA methylation profiling reveals cancer-associated changes within early colonic neoplasia. Oncogene 2017, 36, 5035–5044. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  124. Lind, G.E.; Danielsen, S.A.; Ahlquist, T.; Merok, M.A.; Andresen, K.; Skotheim, R.I.; Hektoen, M.; Rognum, T.O.; Meling, G.I.; Hoff, G.; et al. Identification of an epigenetic biomarker panel with high sensitivity and specificity for colorectal cancer and adenomas. Mol. Cancer 2011, 10, 85. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  125. Vedeld, H.M.; Andresen, K.; Eilertsen, I.A.; Nesbakken, A.; Seruca, R.; Gladhaug, I.P.; Thiis-Evensen, E.; Rognum, T.O.; Boberg, K.M.; Lind, G.E. The novel colorectal cancer biomarkers CDO1, ZSCAN18 and ZNF331 are frequently methylated across gastrointestinal cancers. Int. J. Cancer 2015, 136, 844–853. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  126. McDermott, E.; Ryan, E.J.; Tosetto, M.; Gibson, D.; Burrage, J.; Keegan, D.; Byrne, K.; Crowe, E.; Sexton, G.; Malone, K.; et al. DNA Methylation Profiling in Inflammatory Bowel Disease Provides New Insights into Disease Pathogenesis. J. Crohns. Colitis 2016, 10, 77–86. [Google Scholar] [CrossRef]
  127. Ventham, N.T.; Kennedy, N.A.; Adams, A.T.; Kalla, R.; Heath, S.; O’Leary, K.R.; Drummond, H.; Wilson, D.C.; Gut, I.G.; Nimmo, E.R.; et al. Integrative epigenome-wide analysis demonstrates that DNA methylation may mediate genetic risk in inflammatory bowel disease. Nat. Commun. 2016, 7, 13507. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  128. Agliata, I.; Fernandez-Jimenez, N.; Goldsmith, C.; Marie, J.C.; Bilbao, J.R.; Dante, R.; Hernandez-Vargas, H. The DNA methylome of inflammatory bowel disease (IBD) reflects intrinsic and extrinsic factors in intestinal mucosal cells. Epigenetics 2020, 15, 1068–1082. [Google Scholar] [CrossRef] [Green Version]
  129. Azuara, D.; Aussó, S.; Rodriguez-Moranta, F.; Guardiola, J.; Sanjuan, X.; Lobaton, T.; Boadas, J.; Piqueras, M.; Monfort, D.; Guinó, E.; et al. New Methylation Biomarker Panel for Early Diagnosis of Dysplasia or Cancer in High-Risk Inflammatory Bowel Disease Patients. Inflamm. Bowel Dis. 2018, 24, 2555–2564. [Google Scholar] [CrossRef] [PubMed]
  130. Guinney, J.; Dienstmann, R.; Wang, X.; de Reyniès, A.; Schlicker, A.; Soneson, C.; Marisa, L.; Roepman, P.; Nyamundanda, G.; Angelino, P.; et al. The consensus molecular subtypes of colorectal cancer. Nat. Med. 2015, 21, 1350–1356. [Google Scholar] [CrossRef]
  131. Purcell, R.V.; Visnovska, M.; Biggs, P.J.; Schmeier, S.; Frizelle, F.A. Distinct gut microbiome patterns associate with consensus molecular subtypes of colorectal cancer. Sci. Rep. 2017, 7, 11590. [Google Scholar] [CrossRef] [Green Version]
  132. Cristescu, R.; Lee, J.; Nebozhyn, M.; Kim, K.M.; Ting, J.C.; Wong, S.S.; Liu, J.; Yue, Y.G.; Wang, J.; Yu, K.; et al. Molecular analysis of gastric cancer identifies subtypes associated with distinct clinical outcomes. Nat. Med. 2015, 21, 449–456. [Google Scholar] [CrossRef]
  133. Kumar, V.; Ramnarayanan, K.; Sundar, R.; Padmanabhan, N.; Srivastava, S.; Koiwa, M.; Yasuda, T.; Koh, V.; Huang, K.K.; Tay, S.T.; et al. Single-Cell Atlas of Lineage States, Tumor Microenvironment, and Subtype-Specific Expression Programs in Gastric Cancer. Cancer Discov. 2022, 12, 670–691. [Google Scholar] [CrossRef] [PubMed]
  134. Parikh, K.; Antanaviciute, A.; Fawkner-Corbett, D.; Jagielowicz, M.; Aulicino, A.; Lagerholm, C.; Davis, S.; Kinchen, J.; Chen, H.H.; Alham, N.K.; et al. Colonic epithelial cell diversity in health and inflammatory bowel disease. Nature 2019, 567, 49–55. [Google Scholar] [CrossRef] [PubMed]
  135. Cleynen, I.; Laukens, D. Cellular diversity in the colon: Another brick in the wall. Nat. Rev. Gastroenterol. Hepatol. 2019, 16, 391–392. [Google Scholar] [CrossRef] [PubMed]
  136. Smillie, C.S.; Biton, M.; Ordovas-Montanes, J.; Sullivan, K.M.; Burgin, G.; Graham, D.B.; Herbst, R.H.; Rogel, N.; Slyper, M.; Waldman, J.; et al. Intra- and Inter-cellular Rewiring of the Human Colon during Ulcerative Colitis. Cell 2019, 178, 714–730.e22. [Google Scholar] [CrossRef] [PubMed]
  137. Bigaeva, E.; Uniken Venema, W.T.C.; Weersma, R.K.; Festen, E.A.M. Understanding human gut diseases at single-cell resolution. Hum. Mol. Genet. 2020, 29, R51–R58. [Google Scholar] [CrossRef]
  138. Wruck, W.; Adjaye, J. Meta-analysis reveals up-regulation of cholesterol processes in non-alcoholic and down-regulation in alcoholic fatty liver disease. World J. Hepatol. 2017, 9, 443–454. [Google Scholar] [CrossRef]
  139. Gisbert, J.P.; Chaparro, M. Clinical Usefulness of Proteomics in Inflammatory Bowel Disease: A Comprehensive Review. J. Crohns. Colitis 2019, 13, 374–384. [Google Scholar] [CrossRef]
  140. Gisbert, J.P.; Chaparro, M. Predictors of Primary Response to Biologic Treatment [Anti-TNF, Vedolizumab, and Ustekinumab] in Patients With Inflammatory Bowel Disease: From Basic Science to Clinical Practice. J. Crohns. Colitis 2020, 14, 694–709. [Google Scholar] [CrossRef]
  141. Cushing, K.; Higgins, P.D.R. Management of Crohn Disease: A Review. JAMA 2021, 325, 69–80. [Google Scholar] [CrossRef]
  142. Colombel, J.F.; Panaccione, R.; Bossuyt, P.; Lukas, M.; Baert, F.; Vaňásek, T.; Danalioglu, A.; Novacek, G.; Armuzzi, A.; Hébuterne, X.; et al. Effect of tight control management on Crohn’s disease (CALM): A multicentre, randomised, controlled phase 3 trial. Lancet 2017, 390, 2779–2789. [Google Scholar] [CrossRef]
  143. Zhao, M.; Bendtsen, F.; Petersen, A.M.; Larsen, L.; Dige, A.; Hvas, C.; Seidelin, J.B.; Burisch, J. Predictors of response and disease course in patients with inflammatory bowel disease treated with biological therapy-the Danish IBD Biobank Project: Protocol for a multicentre prospective cohort study. BMJ Open 2020, 10, e035756. [Google Scholar] [CrossRef] [Green Version]
  144. Pierre, N.; Baiwir, D.; Huynh-Thu, V.A.; Mazzucchelli, G.; Smargiasso, N.; De Pauw, E.; Bouhnik, Y.; Laharie, D.; Colombel, J.F.; Meuwis, M.A.; et al. Discovery of biomarker candidates associated with the risk of short-term and mid/long-term relapse after infliximab withdrawal in Crohn’s patients: A proteomics-based study. Gut 2020, 70, 1450–1457. [Google Scholar] [CrossRef] [PubMed]
  145. Liu, L.; Pu, D.; Wang, D.; Zhang, M.; Zhou, C.; Zhang, Z.; Feng, B. Proteomic Analysis of Potential Targets for Non-Response to Infliximab in Patients With Ulcerative Colitis. Front. Pharm. 2022, 13, 905133. [Google Scholar] [CrossRef] [PubMed]
  146. Sacramento, C.; Motta, M.P.; Alves, C.O.; Mota, J.A.; Codes, L.M.G.; Ferreira, R.F.; Silva, P.A.; Palmiro, L.D.P.; Barbosa, R.M.; Andrade, M.N.; et al. Variables associated with progression of moderate-to-severe Crohn’s disease. BMJ Open Gastroenterol 2022, 9, e001016. [Google Scholar] [CrossRef]
  147. Townsend, P.; Zhang, Q.; Shapiro, J.; Webb-Robertson, B.J.; Bramer, L.; Schepmoes, A.A.; Weitz, K.K.; Mallette, M.; Moniz, H.; Bright, R.; et al. Serum Proteome Profiles in Stricturing Crohn’s Disease: A Pilot Study. Inflamm. Bowel Dis. 2015, 21, 1935–1941. [Google Scholar] [CrossRef] [Green Version]
  148. Wu, J.; Lubman, D.M.; Kugathasan, S.; Denson, L.A.; Hyams, J.S.; Dubinsky, M.C.; Griffiths, A.M.; Baldassano, R.N.; Noe, J.D.; Rabizadeh, S.; et al. Serum Protein Biomarkers of Fibrosis Aid in Risk Stratification of Future Stricturing Complications in Pediatric Crohn’s Disease. Am. J. Gastroenterol. 2019, 114, 777–785. [Google Scholar] [CrossRef] [PubMed]
  149. Ungaro, R.C.; Hu, L.; Ji, J.; Nayar, S.; Kugathasan, S.; Denson, L.A.; Hyams, J.; Dubinsky, M.C.; Sands, B.E.; Cho, J.H. Machine learning identifies novel blood protein predictors of penetrating and stricturing complications in newly diagnosed paediatric Crohn’s disease. Aliment Pharm. 2021, 53, 281–290. [Google Scholar] [CrossRef]
  150. Wasinger, V.C.; Yau, Y.; Duo, X.; Zeng, M.; Campbell, B.; Shin, S.; Luber, R.; Redmond, D.; Leong, R.W. Low Mass Blood Peptides Discriminative of Inflammatory Bowel Disease (IBD) Severity: A Quantitative Proteomic Perspective. Mol. Cell Proteom. 2016, 15, 256–265. [Google Scholar] [CrossRef] [Green Version]
  151. Basso, D.; Padoan, A.; D’Incà, R.; Arrigoni, G.; Scapellato, M.L.; Contran, N.; Franchin, C.; Lorenzon, G.; Mescoli, C.; Moz, S.; et al. Peptidomic and proteomic analysis of stool for diagnosing IBD and deciphering disease pathogenesis. Clin. Chem. Lab Med. 2020, 58, 968–979. [Google Scholar] [CrossRef]
  152. Starr, A.E.; Deeke, S.A.; Ning, Z.; Chiang, C.K.; Zhang, X.; Mottawea, W.; Singleton, R.; Benchimol, E.I.; Wen, M.; Mack, D.R.; et al. Proteomic analysis of ascending colon biopsies from a paediatric inflammatory bowel disease inception cohort identifies protein biomarkers that differentiate Crohn’s disease from UC. Gut 2017, 66, 1573–1583. [Google Scholar] [CrossRef]
  153. Li, X.; Zheng, N.R.; Wang, L.H.; Li, Z.W.; Liu, Z.C.; Fan, H.; Wang, Y.; Dai, J.; Ni, X.T.; Wei, X.; et al. Proteomic profiling identifies signatures associated with progression of precancerous gastric lesions and risk of early gastric cancer. EBioMedicine 2021, 74, 103714. [Google Scholar] [CrossRef]
  154. Tsigaridas, A.; Papanikolaou, I.S.; Vaiopoulou, A.; Anagnostopoulos, A.K.; Viazis, N.; Karamanolis, G.; Karamanolis, D.G.; Tsangaris, G.T.; Mantzaris, G.J.; Gazouli, M. Proteomics and irritable bowel syndrome. Expert Rev. Proteom. 2017, 14, 461–468. [Google Scholar] [CrossRef] [PubMed]
  155. Tsigaridas, A.; Anagnostopoulos, A.K.; Papadopoulou, A.; Ioakeim, S.; Vaiopoulou, A.; Papanikolaou, I.S.; Viazis, N.; Karamanolis, G.; Mantzaris, G.J.; Tsangaris, G.T.; et al. Identification of serum proteome signature of irritable bowel syndrome: Potential utility of the tool for early diagnosis and patient’s stratification. J. Proteom. 2018, 188, 167–172. [Google Scholar] [CrossRef] [PubMed]
  156. Chai, Y.N.; Qin, J.; Li, Y.L.; Tong, Y.L.; Liu, G.H.; Wang, X.R.; Liu, C.Y.; Peng, M.H.; Qin, C.Z.; Xing, Y.R. TMT proteomics analysis of intestinal tissue from patients of irritable bowel syndrome with diarrhea: Implications for multiple nutrient ingestion abnormality. J. Proteom. 2021, 231, 103995. [Google Scholar] [CrossRef]
  157. de Vos, W.M.; Tilg, H.; Van Hul, M.; Cani, P.D. Gut microbiome and health: Mechanistic insights. Gut 2022, 71, 1020–1032. [Google Scholar] [CrossRef]
  158. Peery, R.C.; Pammi, M.; Claud, E.; Shen, L. Epigenome—A mediator for host-microbiome crosstalk. Semin Perinatol. 2021, 45, 151455. [Google Scholar] [CrossRef] [PubMed]
  159. Rinninella, E.; Raoul, P.; Cintoni, M.; Franceschi, F.; Miggiano, G.A.D.; Gasbarrini, A.; Mele, M.C. What is the Healthy Gut Microbiota Composition? A Changing Ecosystem across Age, Environment, Diet, and Diseases. Microorganisms 2019, 7, 14. [Google Scholar] [CrossRef] [Green Version]
  160. Turnbaugh, P.J.; Hamady, M.; Yatsunenko, T.; Cantarel, B.L.; Duncan, A.; Ley, R.E.; Sogin, M.L.; Jones, W.J.; Roe, B.A.; Affourtit, J.P.; et al. A core gut microbiome in obese and lean twins. Nature 2009, 457, 480–484. [Google Scholar] [CrossRef] [Green Version]
  161. Nishida, A.; Inoue, R.; Inatomi, O.; Bamba, S.; Naito, Y.; Andoh, A. Gut microbiota in the pathogenesis of inflammatory bowel disease. Clin. J. Gastroenterol. 2018, 11, 1–10. [Google Scholar] [CrossRef] [Green Version]
  162. Ananthakrishnan, A.N. Microbiome-Based Biomarkers for IBD. Inflamm. Bowel Dis. 2020, 26, 1463–1469. [Google Scholar] [CrossRef]
  163. Tremaroli, V.; Bäckhed, F. Functional interactions between the gut microbiota and host metabolism. Nature 2012, 489, 242–249. [Google Scholar] [CrossRef]
  164. Png, C.W.; Lindén, S.K.; Gilshenan, K.S.; Zoetendal, E.G.; McSweeney, C.S.; Sly, L.I.; McGuckin, M.A.; Florin, T.H. Mucolytic bacteria with increased prevalence in IBD mucosa augment in vitro utilization of mucin by other bacteria. Am. J. Gastroenterol. 2010, 105, 2420–2428. [Google Scholar] [CrossRef]
  165. Schirmer, M.; Garner, A.; Vlamakis, H.; Xavier, R.J. Microbial genes and pathways in inflammatory bowel disease. Nat. Rev. Microbiol. 2019, 17, 497–511. [Google Scholar] [CrossRef]
  166. Sokol, H.; Leducq, V.; Aschard, H.; Pham, H.P.; Jegou, S.; Landman, C.; Cohen, D.; Liguori, G.; Bourrier, A.; Nion-Larmurier, I.; et al. Fungal microbiota dysbiosis in IBD. Gut 2017, 66, 1039–1048. [Google Scholar] [CrossRef] [PubMed]
  167. Zhang, F.; Aschenbrenner, D.; Yoo, J.Y.; Zuo, T. The gut mycobiome in health, disease, and clinical applications in association with the gut bacterial microbiome assembly. Lancet Microbe 2022, 3, e969–e983. [Google Scholar] [CrossRef] [PubMed]
  168. Gogokhia, L.; Buhrke, K.; Bell, R.; Hoffman, B.; Brown, D.G.; Hanke-Gogokhia, C.; Ajami, N.J.; Wong, M.C.; Ghazaryan, A.; Valentine, J.F.; et al. Expansion of Bacteriophages Is Linked to Aggravated Intestinal Inflammation and Colitis. Cell Host Microbe 2019, 25, 285–299.e8. [Google Scholar] [CrossRef] [Green Version]
  169. Ticinesi, A.; Nouvenne, A.; Corrente, V.; Tana, C.; Di Mario, F.; Meschi, T. Diverticular Disease: A Gut Microbiota Perspective. J. Gastrointestin. Liver Dis. 2019, 28, 327–337. [Google Scholar] [CrossRef]
  170. Kvasnovsky, C.L.; Leong, L.E.X.; Choo, J.M.; Abell, G.C.J.; Papagrigoriadis, S.; Bruce, K.D.; Rogers, G.B. Clinical and symptom scores are significantly correlated with fecal microbiota features in patients with symptomatic uncomplicated diverticular disease: A pilot study. Eur. J. Gastroenterol. Hepatol. 2018, 30, 107–112. [Google Scholar] [CrossRef]
  171. Ponziani, F.R.; Scaldaferri, F.; Petito, V.; Paroni Sterbini, F.; Pecere, S.; Lopetuso, L.R.; Palladini, A.; Gerardi, V.; Masucci, L.; Pompili, M.; et al. The Role of Antibiotics in Gut Microbiota Modulation: The Eubiotic Effects of Rifaximin. Dig. Dis. 2016, 34, 269–278. [Google Scholar] [CrossRef] [PubMed]
  172. Barbara, G.; Scaioli, E.; Barbaro, M.R.; Biagi, E.; Laghi, L.; Cremon, C.; Marasco, G.; Colecchia, A.; Picone, G.; Salfi, N.; et al. Gut microbiota, metabolome and immune signatures in patients with uncomplicated diverticular disease. Gut 2017, 66, 1252–1261. [Google Scholar] [CrossRef] [PubMed]
  173. Lopetuso, L.R.; Petito, V.; Graziani, C.; Schiavoni, E.; Paroni Sterbini, F.; Poscia, A.; Gaetani, E.; Franceschi, F.; Cammarota, G.; Sanguinetti, M.; et al. Gut Microbiota in Health, Diverticular Disease, Irritable Bowel Syndrome, and Inflammatory Bowel Diseases: Time for Microbial Marker of Gastrointestinal Disorders. Dig. Dis. 2018, 36, 56–65. [Google Scholar] [CrossRef]
  174. Lahner, E.; Bellisario, C.; Hassan, C.; Zullo, A.; Esposito, G.; Annibale, B. Probiotics in the Treatment of Diverticular Disease. A Systematic Review. J. Gastrointestin Liver Dis. 2016, 25, 79–86. [Google Scholar] [CrossRef] [PubMed]
  175. Ubaldi, E.; Grattagliano, I.; Lapi, F.; Pecchioli, S.; Cricelli, C. Overview on the management of diverticular disease by Italian General Practitioners. Dig. Liver Dis. 2019, 51, 63–67. [Google Scholar] [CrossRef] [PubMed]
  176. Tursi, A.; Picchio, M.; Elisei, W.; Di Mario, F.; Scarpignato, C.; Brandimarte, G. Current Management of Patients With Diverticulosis and Diverticular Disease: A Survey From the 2nd International Symposium on Diverticular Disease. J. Clin. Gastroenterol. 2016, 50 (Suppl. S1), S97–S100. [Google Scholar] [CrossRef] [PubMed]
  177. Wang, R.; Tang, R.; Li, B.; Ma, X.; Schnabl, B.; Tilg, H. Gut microbiome, liver immunology, and liver diseases. Cell Mol. Immunol. 2021, 18, 4–17. [Google Scholar] [CrossRef]
  178. Lang, S.; Fairfied, B.; Gao, B.; Duan, Y.; Zhang, X.; Fouts, D.E.; Schnabl, B. Changes in the fecal bacterial microbiota associated with disease severity in alcoholic hepatitis patients. Gut Microbes 2020, 12, 1785251. [Google Scholar] [CrossRef]
  179. Kakiyama, G.; Hylemon, P.B.; Zhou, H.; Pandak, W.M.; Heuman, D.M.; Kang, D.J.; Takei, H.; Nittono, H.; Ridlon, J.M.; Fuchs, M.; et al. Colonic inflammation and secondary bile acids in alcoholic cirrhosis. Am. J. Physiol. Gastrointest Liver Physiol. 2014, 306, G929–G937. [Google Scholar] [CrossRef] [Green Version]
  180. Bajaj, J.S.; Hylemon, P.B.; Ridlon, J.M.; Heuman, D.M.; Daita, K.; White, M.B.; Monteith, P.; Noble, N.A.; Sikaroodi, M.; Gillevet, P.M. Colonic mucosal microbiome differs from stool microbiome in cirrhosis and hepatic encephalopathy and is linked to cognition and inflammation. Am. J. Physiol. Gastrointest Liver Physiol. 2012, 303, G675–G685. [Google Scholar] [CrossRef]
  181. Zhao, S.; Liu, W.; Wang, J.; Shi, J.; Sun, Y.; Wang, W.; Ning, G.; Liu, R.; Hong, J. Akkermansia muciniphila improves metabolic profiles by reducing inflammation in chow diet-fed mice. J. Mol. Endocrinol. 2017, 58, 1–14. [Google Scholar] [CrossRef]
  182. Addolorato, G.; Ponziani, F.R.; Dionisi, T.; Mosoni, C.; Vassallo, G.A.; Sestito, L.; Petito, V.; Picca, A.; Marzetti, E.; Tarli, C.; et al. Gut microbiota compositional and functional fingerprint in patients with alcohol use disorder and alcohol-associated liver disease. Liver Int. 2020, 40, 878–888. [Google Scholar] [CrossRef]
  183. Grander, C.; Adolph, T.E.; Wieser, V.; Lowe, P.; Wrzosek, L.; Gyongyosi, B.; Ward, D.V.; Grabherr, F.; Gerner, R.R.; Pfister, A.; et al. Recovery of ethanol-induced Akkermansia muciniphila depletion ameliorates alcoholic liver disease. Gut 2018, 67, 891–901. [Google Scholar] [CrossRef] [PubMed]
  184. Soveral, L.F.; Korczaguin, G.G.; Schmidt, P.S.; Nunes, I.S.; Fernandes, C.; Zárate-Bladés, C.R. Immunological mechanisms of fecal microbiota transplantation in recurrent Clostridioides difficile infection. World J. Gastroenterol. 2022, 28, 4762–4772. [Google Scholar] [CrossRef] [PubMed]
  185. Weingarden, A.R.; Chen, C.; Bobr, A.; Yao, D.; Lu, Y.; Nelson, V.M.; Sadowsky, M.J.; Khoruts, A. Microbiota transplantation restores normal fecal bile acid composition in recurrent Clostridium difficile infection. Am. J. Physiol. Gastrointest Liver Physiol. 2014, 306, G310–G319. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  186. Seekatz, A.M.; Theriot, C.M.; Rao, K.; Chang, Y.M.; Freeman, A.E.; Kao, J.Y.; Young, V.B. Restoration of short chain fatty acid and bile acid metabolism following fecal microbiota transplantation in patients with recurrent Clostridium difficile infection. Anaerobe 2018, 53, 64–73. [Google Scholar] [CrossRef]
  187. Joseph, J.; Saha, S.; Greenberg-Worisek, A.J. Fecal Microbiota Transplantation: An Ambiguous Translational Pathway for a Promising Treatment. Clin. Transl. Sci. 2019, 12, 206–208. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  188. FDA. FDA Approves First Fecal Microbiota Product; FDA: Silver Spring, MD, USA, 2022.
  189. Bloom, P.P.; Young, V.B. Microbiome therapeutics for the treatment of recurrent Clostridioides difficile infection. Expert Opin. Biol. Ther. 2022, 23, 89–101. [Google Scholar] [CrossRef]
  190. Kumar, V.; Fischer, M. Expert opinion on fecal microbiota transplantation for the treatment of Clostridioides difficile infection and beyond. Expert Opin. Biol. 2020, 20, 73–81. [Google Scholar] [CrossRef]
  191. De Preter, V.; Verbeke, K. Metabolomics as a diagnostic tool in gastroenterology. World J. Gastrointest Pharm. 2013, 4, 97–107. [Google Scholar] [CrossRef]
  192. Esquerre, N.; Basso, L.; Defaye, M.; Vicentini, F.A.; Cluny, N.; Bihan, D.; Hirota, S.A.; Schick, A.; Jijon, H.B.; Lewis, I.A.; et al. Colitis-Induced Microbial Perturbation Promotes Postinflammatory Visceral Hypersensitivity. Cell Mol. Gastroenterol. Hepatol. 2020, 10, 225–244. [Google Scholar] [CrossRef] [PubMed]
  193. Jansson, J.; Willing, B.; Lucio, M.; Fekete, A.; Dicksved, J.; Halfvarson, J.; Tysk, C.; Schmitt-Kopplin, P. Metabolomics reveals metabolic biomarkers of Crohn’s disease. PLoS ONE 2009, 4, e6386. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  194. Balasubramanian, K.; Kumar, S.; Singh, R.R.; Sharma, U.; Ahuja, V.; Makharia, G.K.; Jagannathan, N.R. Metabolism of the colonic mucosa in patients with inflammatory bowel diseases: An in vitro proton magnetic resonance spectroscopy study. Magn. Reson. Imaging 2009, 27, 79–86. [Google Scholar] [CrossRef]
  195. Ahmed, I.; Greenwood, R.; Costello Bde, L.; Ratcliffe, N.M.; Probert, C.S. An investigation of fecal volatile organic metabolites in irritable bowel syndrome. PLoS ONE 2013, 8, e58204. [Google Scholar] [CrossRef] [PubMed]
  196. Ponnusamy, K.; Choi, J.N.; Kim, J.; Lee, S.Y.; Lee, C.H. Microbial community and metabolomic comparison of irritable bowel syndrome faeces. J. Med. Microbiol. 2011, 60, 817–827. [Google Scholar] [CrossRef] [Green Version]
  197. Kirpich, I.A.; Parajuli, D.; McClain, C.J. Microbiome in NAFLD and ALD. Clin. Liver Dis. 2015, 6, 55–58. [Google Scholar] [CrossRef] [PubMed]
  198. Shute, A.; Bihan, D.G.; Lewis, I.A.; Nasser, Y. Metabolomics: The Key to Unraveling the Role of the Microbiome in Visceral Pain Neurotransmission. Front Neurosci. 2022, 16, 917197. [Google Scholar] [CrossRef]
  199. Meseguer, V.; Alpizar, Y.A.; Luis, E.; Tajada, S.; Denlinger, B.; Fajardo, O.; Manenschijn, J.A.; Fernández-Peña, C.; Talavera, A.; Kichko, T.; et al. TRPA1 channels mediate acute neurogenic inflammation and pain produced by bacterial endotoxins. Nat. Commun. 2014, 5, 3125. [Google Scholar] [CrossRef] [Green Version]
  200. Akbar, A.; Yiangou, Y.; Facer, P.; Brydon, W.G.; Walters, J.R.; Anand, P.; Ghosh, S. Expression of the TRPV1 receptor differs in quiescent inflammatory bowel disease with or without abdominal pain. Gut 2010, 59, 767–774. [Google Scholar] [CrossRef] [PubMed]
  201. Babu, A.F.; Csader, S.; Männistö, V.; Tauriainen, M.-M.; Pentikäinen, H.; Savonen, K.; Klåvus, A.; Koistinen, V.; Hanhineva, K.; Schwab, U. Effects of exercise on NAFLD using non-targeted metabolomics in adipose tissue, plasma, urine, and stool. Sci. Rep. 2022, 12, 6485. [Google Scholar] [CrossRef]
  202. Singh, S. Big Dreams With Big Data! Use of Clinical Informatics to Inform Biomarker Discovery. Clin. Transl. Gastroenterol. 2019, 10, e00018. [Google Scholar] [CrossRef]
  203. Sun, W.; Cai, Z.; Li, Y.; Liu, F.; Fang, S.; Wang, G. Data Processing and Text Mining Technologies on Electronic Medical Records: A Review. J. Healthc. Eng. 2018, 2018, 4302425. [Google Scholar] [CrossRef] [Green Version]
  204. Ananthakrishnan, A.N.; Cai, T.; Savova, G.; Cheng, S.C.; Chen, P.; Perez, R.G.; Gainer, V.S.; Murphy, S.N.; Szolovits, P.; Xia, Z.; et al. Improving case definition of Crohn’s disease and ulcerative colitis in electronic medical records using natural language processing: A novel informatics approach. Inflamm. Bowel. Dis. 2013, 19, 1411–1420. [Google Scholar] [CrossRef] [PubMed]
  205. Liao, K.P.; Cai, T.; Gainer, V.; Goryachev, S.; Zeng-treitler, Q.; Raychaudhuri, S.; Szolovits, P.; Churchill, S.; Murphy, S.; Kohane, I.; et al. Electronic medical records for discovery research in rheumatoid arthritis. Arthritis Care Res. 2010, 62, 1120–1127. [Google Scholar] [CrossRef]
  206. Wei, Z.; Wang, W.; Bradfield, J.; Li, J.; Cardinale, C.; Frackelton, E.; Kim, C.; Mentch, F.; Van Steen, K.; Visscher, P.M.; et al. Large sample size, wide variant spectrum, and advanced machine-learning technique boost risk prediction for inflammatory bowel disease. Am. J. Hum. Genet 2013, 92, 1008–1012. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  207. Romagnoni, A.; Jégou, S.; Van Steen, K.; Wainrib, G.; Hugot, J.P. Comparative performances of machine learning methods for classifying Crohn Disease patients using genome-wide genotyping data. Sci. Rep. 2019, 9, 10351. [Google Scholar] [CrossRef]
  208. Häsler, R.; Sheibani-Tezerji, R.; Sinha, A.; Barann, M.; Rehman, A.; Esser, D.; Aden, K.; Knecht, C.; Brandt, B.; Nikolaus, S.; et al. Uncoupling of mucosal gene regulation, mRNA splicing and adherent microbiota signatures in inflammatory bowel disease. Gut 2017, 66, 2087–2097. [Google Scholar] [CrossRef]
  209. Mehrotra, A.; Dellon, E.S.; Schoen, R.E.; Saul, M.; Bishehsari, F.; Farmer, C.; Harkema, H. Applying a natural language processing tool to electronic health records to assess performance on colonoscopy quality measures. Gastrointest Endosc. 2012, 75, 1233–1239.e14. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  210. Jin, X.; Ding, Z.; Li, T.; Xiong, J.; Tian, G.; Liu, J. Comparison of MPL-ANN and PLS-DA models for predicting the severity of patients with acute pancreatitis: An exploratory study. Am. J. Emerg. Med. 2021, 44, 85–91. [Google Scholar] [CrossRef]
  211. Shung, D.L.; Au, B.; Taylor, R.A.; Tay, J.K.; Laursen, S.B.; Stanley, A.J.; Dalton, H.R.; Ngu, J.; Schultz, M.; Laine, L. Validation of a Machine Learning Model That Outperforms Clinical Risk Scoring Systems for Upper Gastrointestinal Bleeding. Gastroenterology 2020, 158, 160–167. [Google Scholar] [CrossRef]
  212. Ishibashi, F.; Fukushima, K.; Kobayashi, K.; Kawakami, T.; Tanaka, R.; Kato, J.; Sato, A.; Konda, K.; Sugihara, K.; Baba, S. Individual feedback and monitoring of endoscopist performance improves the adenoma detection rate in screening colonoscopy: A prospective case-control study. Surg. Endosc. 2021, 35, 2566–2575. [Google Scholar] [CrossRef]
  213. Klair, J.S.; Ashat, M.; Johnson, D.; Arora, S.; Onteddu, N.; Machain Palacio, J.G.; Samuel, R.; Bilal, M.; Buddam, A.; Gupta, A.; et al. Serrated polyp detection rate and advanced adenoma detection rate from a US multicenter cohort. Endoscopy 2020, 52, 61–67. [Google Scholar] [CrossRef]
  214. Tajbakhsh, N.; Gurudu, S.R.; Liang, J. Automated Polyp Detection in Colonoscopy Videos Using Shape and Context Information. IEEE Trans. Med. Imaging 2016, 35, 630–644. [Google Scholar] [CrossRef] [PubMed]
  215. Klare, P.; Sander, C.; Prinzen, M.; Haller, B.; Nowack, S.; Abdelhafez, M.; Poszler, A.; Brown, H.; Wilhelm, D.; Schmid, R.M.; et al. Automated polyp detection in the colorectum: A prospective study (with videos). Gastrointest Endosc. 2019, 89, 576–582.e1. [Google Scholar] [CrossRef]
  216. Wittenberg, T.; Raithel, M. Artificial Intelligence-Based Polyp Detection in Colonoscopy: Where Have We Been, Where Do We Stand, and Where Are We Headed? Visc. Med. 2020, 36, 428–438. [Google Scholar] [CrossRef] [PubMed]
  217. Billah, M.; Waheed, S.; Rahman, M.M. An Automatic Gastrointestinal Polyp Detection System in Video Endoscopy Using Fusion of Color Wavelet and Convolutional Neural Network Features. Int. J. Biomed. Imaging 2017, 2017, 9545920. [Google Scholar] [CrossRef] [PubMed]
  218. Urban, G.; Tripathi, P.; Alkayali, T.; Mittal, M.; Jalali, F.; Karnes, W.; Baldi, P. Deep Learning Localizes and Identifies Polyps in Real Time with 96% Accuracy in Screening Colonoscopy. Gastroenterology 2018, 155, 1069–1078.e8. [Google Scholar] [CrossRef]
  219. Shin, Y.; Qadir, H.A.; Aabakken, L.; Bergsland, J.; Balasingham, I. Automatic Colon Polyp Detection Using Region Based Deep CNN and Post Learning Approaches. IEEE Access 2018, 6, 40950–40962. [Google Scholar] [CrossRef]
  220. Hassan, C.; Spadaccini, M.; Iannone, A.; Maselli, R.; Jovani, M.; Chandrasekar, V.T.; Antonelli, G.; Yu, H.; Areia, M.; Dinis-Ribeiro, M.; et al. Performance of artificial intelligence in colonoscopy for adenoma and polyp detection: A systematic review and meta-analysis. Gastrointest Endosc. 2021, 93, 77–85.e6. [Google Scholar] [CrossRef]
  221. Shah, S.; Park, N.; Chehade, N.E.H.; Chahine, A.; Monachese, M.; Tiritilli, A.; Moosvi, Z.; Ortizo, R.; Samarasena, J. Effect of computer-aided colonoscopy on adenoma miss rates and polyp detection: A systematic review and meta-analysis. J. Gastroenterol. Hepatol. 2022. [Google Scholar] [CrossRef]
  222. Deliwala, S.S.; Hamid, K.; Barbarawi, M.; Lakshman, H.; Zayed, Y.; Kandel, P.; Malladi, S.; Singh, A.; Bachuwa, G.; Gurvits, G.E.; et al. Artificial intelligence (AI) real-time detection vs. routine colonoscopy for colorectal neoplasia: A meta-analysis and trial sequential analysis. Int. J. Color. Dis. 2021, 36, 2291–2303. [Google Scholar] [CrossRef]
  223. Grasberger, H.; Magis, A.T.; Sheng, E.; Conomos, M.P.; Zhang, M.; Garzotto, L.S.; Hou, G.; Bishu, S.; Nagao-Kitamoto, H.; El-Zaatari, M.; et al. DUOX2 variants associate with preclinical disturbances in microbiota-immune homeostasis and increased inflammatory bowel disease risk. J. Clin. Investig. 2021, 131, e141676. [Google Scholar] [CrossRef]
  224. Moustafa, A.; Li, W.; Anderson, E.L.; Wong, E.H.M.; Dulai, P.S.; Sandborn, W.J.; Biggs, W.; Yooseph, S.; Jones, M.B.; Venter, J.C.; et al. Genetic risk, dysbiosis, and treatment stratification using host genome and gut microbiome in inflammatory bowel disease. Clin. Transl. Gastroenterol. 2018, 9, e132. [Google Scholar] [CrossRef] [PubMed]
  225. Di’Narzo, A.F.; Houten, S.M.; Kosoy, R.; Huang, R.; Vaz, F.M.; Hou, R.; Wei, G.; Wang, W.; Comella, P.H.; Dodatko, T.; et al. Integrative Analysis of the Inflammatory Bowel Disease Serum Metabolome Improves Our Understanding of Genetic Etiology and Points to Novel Putative Therapeutic Targets. Gastroenterology 2022, 162, 828–843.e11. [Google Scholar] [CrossRef]
  226. Chu, X.; Jaeger, M.; Beumer, J.; Bakker, O.B.; Aguirre-Gamboa, R.; Oosting, M.; Smeekens, S.P.; Moorlag, S.; Mourits, V.P.; Koeken, V.A.C.M.; et al. Integration of metabolomics, genomics, and immune phenotypes reveals the causal roles of metabolites in disease. Genome Biol. 2021, 22, 198. [Google Scholar] [CrossRef] [PubMed]
  227. Sudhakar, P.; Verstockt, B.; Cremer, J.; Verstockt, S.; Sabino, J.; Ferrante, M.; Vermeire, S. Understanding the Molecular Drivers of Disease Heterogeneity in Crohn’s Disease Using Multi-omic Data Integration and Network Analysis. Inflamm. Bowel Dis. 2020, 27, 870–886. [Google Scholar] [CrossRef]
  228. Taylor, H.; Serrano-Contreras, J.I.; McDonald, J.A.K.; Epstein, J.; Fell, J.M.; Seoane, R.C.; Li, J.V.; Marchesi, J.R.; Hart, A.L. Multiomic features associated with mucosal healing and inflammation in paediatric Crohn’s disease. Aliment. Pharmacol. Ther. 2020, 52, 1491–1502. [Google Scholar] [CrossRef]
  229. Ramos, R.J.; Zhu, C.; Joseph, D.F.; Thaker, S.; LaComb, J.F.; Markarian, K.; Lee, H.J.; Petrov, J.C.; Monzur, F.; Buscaglia, J.M. Metagenomic and Bile Acid Metabolomic Analysis of Fecal Microbiota Transplantation for Recurrent Clostridiodes Difficile and/or Inflammatory Bowel Diseases. Med. Res. Arch. 2022, 10, mra.v10i10.3318. [Google Scholar] [CrossRef] [PubMed]
  230. Borren, N.Z.; Ananthakrishnan, A.N. Precision medicine: How multiomics will shape the future of inflammatory bowel disease? Curr. Opin. Gastroenterol. 2022, 38, 382–387. [Google Scholar] [CrossRef]
  231. Liu, J.; Wang, Y.; Zhang, G.; Liu, L.; Peng, X. Multi-Omics Analysis Reveals Changes in the Intestinal Microbiome, Transcriptome, and Methylome in a Rat Model of Chronic Non-bacterial Prostatitis: Indications for the Existence of the Gut-Prostate Axis. Front Physiol. 2021, 12, 753034. [Google Scholar] [CrossRef] [PubMed]
  232. Zhang, Q.; Zhang, Y.; Zeng, L.; Chen, G.; Zhang, L.; Liu, M.; Sheng, H.; Hu, X.; Su, J.; Zhang, D.; et al. The Role of Gut Microbiota and Microbiota-Related Serum Metabolites in the Progression of Diabetic Kidney Disease. Front Pharm. 2021, 12, 757508. [Google Scholar] [CrossRef]
  233. Subramanian, I.; Verma, S.; Kumar, S.; Jere, A.; Anamika, K. Multi-omics Data Integration, Interpretation, and Its Application. Bioinform. Biol. Insights 2020, 14, 1177932219899051. [Google Scholar] [CrossRef] [Green Version]
  234. Fair, K.; Dunlap, D.G.; Fitch, A.; Bogdanovich, T.; Methé, B.; Morris, A.; McVerry, B.J.; Kitsios, G.D. Rectal Swabs from Critically Ill Patients Provide Discordant Representations of the Gut Microbiome Compared to Stool Samples. mSphere 2019, 4, e00358-19. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  235. Shobar, R.M.; Velineni, S.; Keshavarzian, A.; Swanson, G.; DeMeo, M.T.; Melson, J.E.; Losurdo, J.; Engen, P.A.; Sun, Y.; Koenig, L.; et al. The Effects of Bowel Preparation on Microbiota-Related Metrics Differ in Health and in Inflammatory Bowel Disease and for the Mucosal and Luminal Microbiota Compartments. Clin. Transl. Gastroenterol. 2016, 7, e143. [Google Scholar] [CrossRef] [PubMed]
  236. Tanes, C.; Bittinger, K.; Gao, Y.; Friedman, E.S.; Nessel, L.; Paladhi, U.R.; Chau, L.; Panfen, E.; Fischbach, M.A.; Braun, J.; et al. Role of dietary fiber in the recovery of the human gut microbiome and its metabolome. Cell Host Microbe 2021, 29, 394–407.e5. [Google Scholar] [CrossRef] [PubMed]
  237. Jones, R.B.; Zhu, X.; Moan, E.; Murff, H.J.; Ness, R.M.; Seidner, D.L.; Sun, S.; Yu, C.; Dai, Q.; Fodor, A.A.; et al. Inter-niche and inter-individual variation in gut microbial community assessment using stool, rectal swab, and mucosal samples. Sci. Rep. 2018, 8, 4139. [Google Scholar] [CrossRef] [PubMed]
  238. Martens, M.J.; Logan, B.R. A unified approach to sample size and power determination for testing parameters in generalized linear and time-to-event regression models. Stat. Med. 2021, 40, 1121–1132. [Google Scholar] [CrossRef]
  239. Xiao, X.; White, E.P.; Hooten, M.B.; Durham, S.L. On the use of log-transformation vs. nonlinear regression for analyzing biological power laws. Ecology 2011, 92, 1887–1894. [Google Scholar] [CrossRef] [Green Version]
  240. Hasin, Y.; Seldin, M.; Lusis, A. Multi-omics approaches to disease. Genome Biol. 2017, 18, 83. [Google Scholar] [CrossRef] [PubMed]
  241. Savoi, S.; Wong, D.C.J.; Degu, A.; Herrera, J.C.; Bucchetti, B.; Peterlunger, E.; Fait, A.; Mattivi, F.; Castellarin, S.D. Multi-Omics and Integrated Network Analyses Reveal New Insights into the Systems Relationships between Metabolites, Structural Genes, and Transcriptional Regulators in Developing Grape Berries (Vitis vinifera L.) Exposed to Water Deficit. Front. Plant Sci. 2017, 8, 1124. [Google Scholar] [CrossRef] [Green Version]
  242. Gligorijević, V.; Pržulj, N. Methods for biological data integration: Perspectives and challenges. J. R. Soc. Interface 2015, 12, 20150571. [Google Scholar] [CrossRef]
  243. Huang, S.; Chaudhary, K.; Garmire, L.X. More Is Better: Recent Progress in Multi-Omics Data Integration Methods. Front. Genet. 2017, 8, 84. [Google Scholar] [CrossRef] [Green Version]
  244. Haas, R.; Zelezniak, A.; Iacovacci, J.; Kamrad, S.; Townsend, S.; Ralser, M. Designing and interpreting ‘multi-omic’ experiments that may change our understanding of biology. Curr. Opin. Syst. Biol. 2017, 6, 37–45. [Google Scholar] [CrossRef]
  245. Li, P.; Luo, H.; Ji, B.; Nielsen, J. Machine learning for data integration in human gut microbiome. Microb. Cell Factories 2022, 21, 241. [Google Scholar] [CrossRef]
  246. Chung, N.C.; Mirza, B.; Choi, H.; Wang, J.; Wang, D.; Ping, P.; Wang, W. Unsupervised classification of multi-omics data during cardiac remodeling using deep learning. Methods 2019, 166, 66–73. [Google Scholar] [CrossRef] [PubMed]
  247. Narayanasamy, S.; Jarosz, Y.; Muller, E.E.L.; Heintz-Buschart, A.; Herold, M.; Kaysen, A.; Laczny, C.C.; Pinel, N.; May, P.; Wilmes, P. IMP: A pipeline for reproducible reference-independent integrated metagenomic and metatranscriptomic analyses. Genome Biol. 2016, 17, 260. [Google Scholar] [CrossRef] [PubMed]
  248. Kumar, M.S.; Slud, E.V.; Okrah, K.; Hicks, S.C.; Hannenhalli, S.; Corrada Bravo, H. Analysis and correction of compositional bias in sparse sequencing count data. BMC Genom. 2018, 19, 799. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  249. Rodosthenous, T.; Shahrezaei, V.; Evangelou, M. Integrating multi-OMICS data through sparse canonical correlation analysis for the prediction of complex traits: A comparison study. Bioinformatics 2020, 36, 4616–4625. [Google Scholar] [CrossRef]
  250. Witten, D.M.; Tibshirani, R.J. Extensions of sparse canonical correlation analysis with applications to genomic data. Stat Appl Genet Mol. Biol. 2009, 8, 28. [Google Scholar] [CrossRef] [Green Version]
  251. Gloor, G.B.; Macklaim, J.M.; Pawlowsky-Glahn, V.; Egozcue, J.J. Microbiome Datasets Are Compositional: And This Is Not Optional. Front. Microbiol. 2017, 8, 2224. [Google Scholar] [CrossRef] [Green Version]
  252. Greenland, S.; Mansournia, M.A.; Altman, D.G. Sparse data bias: A problem hiding in plain sight. BMJ 2016, 352, i1981. [Google Scholar] [CrossRef] [Green Version]
  253. Greenland, S.; Schwartzbaum, J.A.; Finkle, W.D. Problems due to Small Samples and Sparse Data in Conditional Logistic Regression Analysis. Am. J. Epidemiol. 2000, 151, 531–539. [Google Scholar] [CrossRef]
  254. Algamal, Z.Y.; Lee, M.H. A two-stage sparse logistic regression for optimal gene selection in high-dimensional microarray data classification. Adv. Data Anal. Classif. 2019, 13, 753–771. [Google Scholar] [CrossRef]
  255. Huang, P.-H. A penalized likelihood method for multi-group structural equation modelling. Br. J. Math. Stat. Psychol. 2018, 71, 499–522. [Google Scholar] [CrossRef] [PubMed]
  256. Mondol, M.H.; Rahman, M.S. Bias-reduced and separation-proof GEE with small or sparse longitudinal binary data. Stat. Med. 2019, 38, 2544–2560. [Google Scholar] [CrossRef]
  257. Platt, R.W.; Leroux, B.G.; Breslow, N. Generalized linear mixed models for meta-analysis. Stat. Med. 1999, 18, 643–654. [Google Scholar] [CrossRef]
  258. Ju, K.; Lin, L.; Chu, H.; Cheng, L.-L.; Xu, C. Laplace approximation, penalized quasi-likelihood, and adaptive Gauss–Hermite quadrature for generalized linear mixed models: Towards meta-analysis of binary outcome with sparse data. BMC Med. Res. Methodol. 2020, 20, 152. [Google Scholar] [CrossRef]
  259. Richardson, D.B.; Cole, S.R.; Ross, R.K.; Poole, C.; Chu, H.; Keil, A.P. Meta-Analysis and Sparse-Data Bias. Am. J. Epidemiol. 2020, 190, 336–340. [Google Scholar] [CrossRef]
  260. Sankey, S.S.; Weissfeld, L.A.; Fine, M.J.; Kapoor, W. An assessment of the use of the continuity correction for sparse data in meta-analysis. Commun. Stat. -Simul. Comput. 1996, 25, 1031–1056. [Google Scholar] [CrossRef]
  261. Sweeting, M.; Sutton, A.J.; Lambert, P. What to add to nothing? Use and avoidance of continuity corrections in meta-analysis of sparse data. Stat. Med. 2004, 23, 1351–1375. [Google Scholar] [CrossRef]
  262. Liew, A.W.-C.; Law, N.-F.; Yan, H. Missing value imputation for gene expression data: Computational techniques to recover missing data from available information. Brief. Bioinform. 2011, 12, 498–513. [Google Scholar] [CrossRef] [Green Version]
  263. Davis, T.J.; Firzli, T.R.; Higgins Keppler, E.A.; Richardson, M.; Bean, H.D. Addressing Missing Data in GC × GC Metabolomics: Identifying Missingness Type and Evaluating the Impact of Imputation Methods on Experimental Replication. Anal. Chem. 2022, 94, 10912–10920. [Google Scholar] [CrossRef]
  264. Zhang, S.; Zhang, J.; Zhu, X.; Qin, Y.; Zhang, C. Missing Value Imputation Based on Data Clustering. In Transactions on Computational Science I; Gavrilova, M.L., Tan, C.J.K., Eds.; Springer: Berlin/Heidelberg, Germany, 2008; pp. 128–138. [Google Scholar] [CrossRef]
  265. Dong, X.; Lin, L.; Zhang, R.; Zhao, Y.; Christiani, D.C.; Wei, Y.; Chen, F. TOBMI: Trans-omics block missing data imputation using a k-nearest neighbor weighted approach. Bioinformatics 2018, 35, 1278–1283. [Google Scholar] [CrossRef] [PubMed]
  266. Voillet, V.; Besse, P.; Liaubet, L.; San Cristobal, M.; González, I. Handling missing rows in multi-omics data integration: Multiple imputation in multiple factor analysis framework. BMC Bioinform. 2016, 17, 402. [Google Scholar] [CrossRef] [Green Version]
  267. Brock, G.N.; Shaffer, J.R.; Blakesley, R.E.; Lotz, M.J.; Tseng, G.C. Which missing value imputation method to use in expression profiles: A comparative study and two selection schemes. BMC Bioinform. 2008, 9, 12. [Google Scholar] [CrossRef]
  268. Buermans, H.P.J.; den Dunnen, J.T. Next generation sequencing technology: Advances and applications. Biochim. Et Biophys. Acta BBA -Mol. Basis Dis. 2014, 1842, 1932–1941. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  269. Bill & Melinda Gates Foundation. Pathogen Genomic Surveillance and Immunology in Asia Grand Challenges Grant Opportunity. Available online: https://submit.gatesfoundation.org/prog/asia_pathogen_genomics_and_immunology_for_preparedness/ (accessed on 15 December 2022).
  270. Bill & Melinda Gates Foundation. Strengthening Modeling and Analytics Capacity and Ecosystem for Women’s Health Grand Challenges Grant Opportunity. Available online: https://submit.gatesfoundation.org/prog/strengthening_modeling_and_analytics_capacity_and_ecosystem_for_womens_health/ (accessed on 15 December 2022).
  271. Fernandez-Banet, J.; Esposito, A.; Coffin, S.; Horvath, I.B.; Estrella, H.; Schefzick, S.; Deng, S.; Wang, K.; Aching, K.; Ding, Y.; et al. OASIS: Web-based platform for exploring cancer multi-omics data. Nat Methods 2016, 13, 9–10. [Google Scholar] [CrossRef] [PubMed]
  272. Janssen. Janssen Session—Multi-Omics Analysis in Precision Medicine: Greater than the Sum of Its Parts with AI; Janssen: Beerse, Belgium, 2021. [Google Scholar]
  273. Regev, A.; Berho, M.; Jeffers, L.J.; Milikowski, C.; Molina, E.G.; Pyrsopoulos, N.T.; Feng, Z.-Z.; Reddy, K.R.; Schiff, E.R. Sampling error and intraobserver variation in liver biopsy in patients with chronic HCV infection. Am. J. Gastroenterol. 2002, 97, 2614–2618. [Google Scholar] [CrossRef] [PubMed]
  274. Soloway, R.D.; Baggenstoss, A.H.; Schoenfield, L.J.; Summerskill, W.H.J. Observer error and sampling variability tested in evaluation of hepatitis and cirrhosis by liver biopsy. Am. J. Dig. Dis. 1971, 16, 1082–1086. [Google Scholar] [CrossRef]
  275. Wood, J.C.; Zhang, P.; Rienhoff, H.; Abi-Saab, W.; Neufeld, E.J. Liver MRI is more precise than liver biopsy for assessing total body iron balance: A comparison of MRI relaxometry with simulated liver biopsy results. Magn. Reson. Imaging 2015, 33, 761–767. [Google Scholar] [CrossRef]
  276. Ekwall-Larson, A.; Yu, D.; Dinnétz, P.; Nordqvist, H.; Özenci, V. Single-Site Sampling versus Multisite Sampling for Blood Cultures: A Retrospective Clinical Study. J Clin Microbiol 2022, 60, e0193521. [Google Scholar] [CrossRef]
  277. Rao, N.; Clark, S.; Habern, O. Bridging genomics and tissue pathology: 10x genomics explores new frontiers with the visium spatial gene expression solution. Genet. Eng. Biotechnol. News 2020, 40, 50–51. [Google Scholar] [CrossRef]
  278. He, S.; Bhatt, R.; Brown, C.; Brown, E.A.; Buhr, D.L.; Chantranuvatana, K.; Danaher, P.; Dunaway, D.; Garrison, R.G.; Geiss, G.; et al. High-plex imaging of RNA and proteins at subcellular resolution in fixed tissue by spatial molecular imaging. Nat. Biotechnol. 2022, 40, 1794–1806. [Google Scholar] [CrossRef]
  279. Jain, M.; Olsen, H.E.; Paten, B.; Akeson, M. The Oxford Nanopore MinION: Delivery of nanopore sequencing to the genomics community. Genome Biol. 2016, 17, 239. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  280. Cummins, G. Smart pills for gastrointestinal diagnostics and therapy. Adv. Drug Deliv. Rev. 2021, 177, 113931. [Google Scholar] [CrossRef]
  281. Delahaye, C.; Nicolas, J. Sequencing DNA with nanopores: Troubles and biases. PLoS ONE 2021, 16, e0257521. [Google Scholar] [CrossRef]
  282. Sahlin, K.; Medvedev, P. Error correction enables use of Oxford Nanopore technology for reference-free transcriptome analysis. Nat. Commun. 2021, 12, 2. [Google Scholar] [CrossRef] [PubMed]
  283. Seyed Tabib, N.S.; Madgwick, M.; Sudhakar, P.; Verstockt, B.; Korcsmaros, T.; Vermeire, S. Big data in IBD: Big progress for clinical practice. Gut 2020, 69, 1520–1532. [Google Scholar] [CrossRef] [PubMed]
  284. Bossuyt, P.; Vermeire, S.; Bisschops, R. Scoring endoscopic disease activity in IBD: Artificial intelligence sees more and better than we do. Gut 2020, 69, 788–789. [Google Scholar] [CrossRef]
  285. Stidham, R.W.; Enchakalody, B.; Waljee, A.K.; Higgins, P.D.R.; Wang, S.C.; Su, G.L.; Wasnik, A.P.; Al-Hawary, M. Assessing Small Bowel Stricturing and Morphology in Crohn’s Disease Using Semi-automated Image Analysis. Inflamm Bowel Dis. 2020, 26, 734–742. [Google Scholar] [CrossRef]
  286. Bielecki, C.; Bocklitz, T.W.; Schmitt, M.; Krafft, C.; Marquardt, C.; Gharbi, A.; Knösel, T.; Stallmach, A.; Popp, J. Classification of inflammatory bowel diseases by means of Raman spectroscopic imaging of epithelium cells. J. Biomed. Opt. 2012, 17, 076030. [Google Scholar] [CrossRef] [Green Version]
  287. Popejoy, A.B.; Fullerton, S.M. Genomics is failing on diversity. Nature 2016, 538, 161–164. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  288. Sirugo, G.; Williams, S.M.; Tishkoff, S.A. The Missing Diversity in Human Genetic Studies. Cell 2019, 177, 26–31. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  289. Roberts, R.L.; Wallace, M.C.; Drake, J.M.; Stamp, L.K. Identification of a novel thiopurine S-methyltransferase allele (TPMT*37). Pharm. Genom. 2014, 24, 320–323. [Google Scholar] [CrossRef] [PubMed]
  290. Kim, J.H.; Cheon, J.H.; Hong, S.S.; Eun, C.S.; Byeon, J.S.; Hong, S.Y.; Kim, B.Y.; Kwon, S.H.; Kim, S.W.; Han, D.S.; et al. Influences of thiopurine methyltransferase genotype and activity on thiopurine-induced leukopenia in Korean patients with inflammatory bowel disease: A retrospective cohort study. J. Clin. Gastroenterol. 2010, 44, e242–e248. [Google Scholar] [CrossRef]
  291. Takatsu, N.; Matsui, T.; Murakami, Y.; Ishihara, H.; Hisabe, T.; Nagahama, T.; Maki, S.; Beppu, T.; Takaki, Y.; Hirai, F.; et al. Adverse reactions to azathioprine cannot be predicted by thiopurine S-methyltransferase genotype in Japanese patients with inflammatory bowel disease. J. Gastroenterol. Hepatol. 2009, 24, 1258–1264. [Google Scholar] [CrossRef]
  292. Yang, S.K.; Hong, M.; Baek, J.; Choi, H.; Zhao, W.; Jung, Y.; Haritunians, T.; Ye, B.D.; Kim, K.J.; Park, S.H.; et al. A common missense variant in NUDT15 confers susceptibility to thiopurine-induced leukopenia. Nat. Genet. 2014, 46, 1017–1020. [Google Scholar] [CrossRef]
  293. Mars, R.A.T.; Yang, Y.; Ward, T.; Houtti, M.; Priya, S.; Lekatz, H.R.; Tang, X.; Sun, Z.; Kalari, K.R.; Korem, T.; et al. Longitudinal Multi-omics Reveals Subset-Specific Mechanisms Underlying Irritable Bowel Syndrome. Cell 2020, 182, 1460–1473.e17. [Google Scholar] [CrossRef]
  294. Stanberry, L.; Mias, G.I.; Haynes, W.; Higdon, R.; Snyder, M.; Kolker, E. Integrative Analysis of Longitudinal Metabolomics Data from a Personal Multi-Omics Profile. Metabolites 2013, 3, 741–760. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  295. Zhou, W.; Sailani, M.R.; Contrepois, K.; Zhou, Y.; Ahadi, S.; Leopold, S.R.; Zhang, M.J.; Rao, V.; Avina, M.; Mishra, T.; et al. Longitudinal multi-omics of host–microbe dynamics in prediabetes. Nature 2019, 569, 663–671. [Google Scholar] [CrossRef] [Green Version]
  296. Collins, L.M. Analysis of Longitudinal Data: The Integration of Theoretical Model, Temporal Design, and Statistical Model. Annu. Rev. Psychol. 2005, 57, 505–528. [Google Scholar] [CrossRef] [Green Version]
  297. Bodein, A.; Scott-Boyer, M.-P.; Perin, O.; Lê Cao, K.-A.; Droit, A. Interpretation of network-based integration from multi-omics longitudinal data. Nucleic Acids Res. 2021, 50, e27. [Google Scholar] [CrossRef]
  298. Bodein, A.; Scott-Boyer, M.-P.; Perin, O.; Lê Cao, K.-A.; Droit, A. timeOmics: An R package for longitudinal multi-omics data integration. Bioinformatics 2021, 38, 577–579. [Google Scholar] [CrossRef] [PubMed]
  299. Yao, M.D.; von Rosenvinge, E.C.; Groden, C.; Mannon, P.J. Multiple endoscopic biopsies in research subjects: Safety results from a National Institutes of Health series. Gastrointest Endosc 2009, 69, 906–910. [Google Scholar] [CrossRef] [Green Version]
  300. Conesa, A.; Beck, S. Making multi-omics data accessible to researchers. Sci. Data 2019, 6, 251. [Google Scholar] [CrossRef] [PubMed]
  301. Wallen, Z.D. Comparison study of differential abundance testing methods using two large Parkinson disease gut microbiome datasets derived from 16S amplicon sequencing. BMC Bioinform. 2021, 22, 265. [Google Scholar] [CrossRef] [PubMed]
  302. Topol, E.J. Money back guarantees for non-reproducible results? BMJ 2016, 353, i2770. [Google Scholar] [CrossRef]
  303. Gibert, A.; Marin, S.; Mouginot, P.; Archambeau, J.; Illes, M.; Ollivier, G.; Gandara, A.; Pujol, B. Non-reproducible signals of adaptation to elevation between open and understorey microhabitats in snapdragon plants. J. Evol. Biol. 2022, 35, 322–332. [Google Scholar] [CrossRef]
  304. Kafkafi, N.; Golani, I.; Jaljuli, I.; Morgan, H.; Sarig, T.; Würbel, H.; Yaacoby, S.; Benjamini, Y. Addressing reproducibility in single-laboratory phenotyping experiments. Nat. Methods 2017, 14, 462–464. [Google Scholar] [CrossRef] [PubMed]
  305. Martino, D.; Ben-Othman, R.; Harbeson, D.; Bosco, A. Multiomics and Systems Biology Are Needed to Unravel the Complex Origins of Chronic Disease. Challenges 2019, 10, 23. [Google Scholar] [CrossRef] [Green Version]
  306. Favresse, J.; Bayart, J.-L.; Gruson, D.; Bernardini, S.; Clerico, A.; Perrone, M. The underestimated issue of non-reproducible cardiac troponin I and T results: Case series and systematic review of the literature. Clin. Chem. Lab. Med. CCLM 2021, 59, 1201–1211. [Google Scholar] [CrossRef]
  307. Fricker, A.M.; Podlesny, D.; Fricke, W.F. What is new and relevant for sequencing-based microbiome research? A mini-review. J. Adv. Res. 2019, 19, 105–112. [Google Scholar] [CrossRef]
  308. Teytelman, L. No more excuses for non-reproducible methods. Nature 2018, 560, 411. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  309. Xuan, Y.; Bateman, N.W.; Gallien, S.; Goetze, S.; Zhou, Y.; Navarro, P.; Hu, M.; Parikh, N.; Hood, B.L.; Conrads, K.A.; et al. Standardization and harmonization of distributed multi-center proteotype analysis supporting precision medicine studies. Nat. Commun. 2020, 11, 5248. [Google Scholar] [CrossRef] [PubMed]
  310. Casadevall, A.; Fang, F.C. Reproducible Science. Infect. Immun. 2010, 78, 4972–4975. [Google Scholar] [CrossRef] [PubMed]
  311. Dai, C.; Füllgrabe, A.; Pfeuffer, J.; Solovyeva, E.M.; Deng, J.; Moreno, P.; Kamatchinathan, S.; Kundu, D.J.; George, N.; Fexova, S.; et al. A proteomics sample metadata representation for multiomics integration and big data analysis. Nat. Commun. 2021, 12, 5854. [Google Scholar] [CrossRef]
  312. Gorzelak, M.A.; Gill, S.K.; Tasnim, N.; Ahmadi-Vand, Z.; Jay, M.; Gibson, D.L. Methods for Improving Human Gut Microbiome Data by Reducing Variability through Sample Processing and Storage of Stool. PLoS ONE 2015, 10, e0134802. [Google Scholar] [CrossRef]
  313. Liang, Y.; Dong, T.; Chen, M.; He, L.; Wang, T.; Liu, X.; Chang, H.; Mao, J.-H.; Hang, B.; Snijders Antoine, M.; et al. Systematic Analysis of Impact of Sampling Regions and Storage Methods on Fecal Gut Microbiome and Metabolome Profiles. mSphere 2020, 5, e00763-00719. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  314. Ladoukakis, E.; Kolisis, F.N.; Chatziioannou, A.A. Integrative workflows for metagenomic analysis. Front. Cell Dev. Biol. 2014, 2, 70. [Google Scholar] [CrossRef] [Green Version]
  315. Ma, T.; Zhang, A. Omics Informatics: From Scattered Individual Software Tools to Integrated Workflow Management Systems. IEEE/ACM Trans. Comput. Biol. Bioinform. 2017, 14, 926–946. [Google Scholar] [CrossRef]
  316. Jackson, M.; Kavoussanakis, K.; Wallace, E.W.J. Using prototyping to choose a bioinformatics workflow management system. PLoS Comput. Biol. 2021, 17, e1008622. [Google Scholar] [CrossRef] [PubMed]
  317. Wratten, L.; Wilm, A.; Göke, J. Reproducible, scalable, and shareable analysis pipelines with bioinformatics workflow managers. Nat. Methods 2021, 18, 1161–1168. [Google Scholar] [CrossRef] [PubMed]
  318. Kolenc, Ž.; Pirih, N.; Gretic, P.; Kunej, T. Top Trends in Multiomics Research: Evaluation of 52 Published Studies and New Ways of Thinking Terminology and Visual Displays. OMICS A J. Integr. Biol. 2021, 25, 681–692. [Google Scholar] [CrossRef]
  319. Zheng, C.L.; Ratnakar, V.; Gil, Y.; McWeeney, S.K. Use of semantic workflows to enhance transparency and reproducibility in clinical omics. Genome Med. 2015, 7, 73. [Google Scholar] [CrossRef] [Green Version]
  320. Nardini, C.; Dent, J.; Tieri, P. Editorial: Multi-omic data integration. Front. Cell Dev. Biol. 2015, 3, 46. [Google Scholar] [CrossRef]
  321. Hudis, C.A. Big data: Are large prospective randomized trials obsolete in the future? Breast 2015, 24, S15–S18. [Google Scholar] [CrossRef]
  322. Kim, H.-S.; Lee, S.; Kim, J.H. Real-world Evidence versus Randomized Controlled Trial: Clinical Research Based on Electronic Medical Records. jkms 2018, 33, e213. [Google Scholar] [CrossRef]
  323. Ladha, K.S.; Arora, V.S.; Dutton, R.P.; Hyder, J.A. Potential and pitfalls for big data in health research. Adv. Anesth. 2015, 33, 97–111. [Google Scholar] [CrossRef]
  324. Wang, S.D. Opportunities and challenges of clinical research in the big-data era: From RCT to BCT. J. Thorac. Dis. 2013, 5, 721–723. [Google Scholar] [CrossRef]
  325. Cheung, K.S.; Leung, W.K.; Seto, W.K. Application of Big Data analysis in gastrointestinal research. World J. Gastroenterol. 2019, 25, 2990–3008. [Google Scholar] [CrossRef] [PubMed]
  326. Luecken, M.D.; Büttner, M.; Chaichoompu, K.; Danese, A.; Interlandi, M.; Mueller, M.F.; Strobl, D.C.; Zappia, L.; Dugas, M.; Colomé-Tatché, M.; et al. Benchmarking atlas-level data integration in single-cell genomics. Nat. Methods 2022, 19, 41–50. [Google Scholar] [CrossRef] [PubMed]
  327. Dimitrov, D.; Türei, D.; Garrido-Rodriguez, M.; Burmedi, P.L.; Nagai, J.S.; Boys, C.; Ramirez Flores, R.O.; Kim, H.; Szalai, B.; Costa, I.G.; et al. Comparison of methods and resources for cell-cell communication inference from single-cell RNA-Seq data. Nat. Commun. 2022, 13, 3224. [Google Scholar] [CrossRef] [PubMed]
Figure 1. A pipeline for multi-omics data generation and analysis. Luminal GI, hepatobiliary, and pancreatic tissue can be sampled, homogenized, and used to generate multiple types of data from the same sample, such as DNA and RNA sequencing, as well as metabolomic and proteomic mass spectrometry-based and NMR-based data. These data can then be quality checked, cleaned, and processed into final datasets which can then be incorporated into a set of integrative analyses. Created with BioRender.com. novel hypotheses. Created with BioRender.com. D, day; Wk, week.
Figure 1. A pipeline for multi-omics data generation and analysis. Luminal GI, hepatobiliary, and pancreatic tissue can be sampled, homogenized, and used to generate multiple types of data from the same sample, such as DNA and RNA sequencing, as well as metabolomic and proteomic mass spectrometry-based and NMR-based data. These data can then be quality checked, cleaned, and processed into final datasets which can then be incorporated into a set of integrative analyses. Created with BioRender.com. novel hypotheses. Created with BioRender.com. D, day; Wk, week.
Ijms 24 02458 g001
Figure 2. Example of a research design illustrating the benefits of using longitudinal multi-omics data in the context of evaluating changes in IBD treatment. Using stool and biopsy samples from multiple sites in the colon permits assessment of localized biosignatures that can be correlated for the development of diagnostics and therapeutics. Both pre- and post-treatment analyses facilitate the detection of biosignatures predicting therapeutic response. Using multi-omics data permits the inclusion of changes in the microbiome with genomic and metabolomic data—a holistic approach is also likely to generate this.
Figure 2. Example of a research design illustrating the benefits of using longitudinal multi-omics data in the context of evaluating changes in IBD treatment. Using stool and biopsy samples from multiple sites in the colon permits assessment of localized biosignatures that can be correlated for the development of diagnostics and therapeutics. Both pre- and post-treatment analyses facilitate the detection of biosignatures predicting therapeutic response. Using multi-omics data permits the inclusion of changes in the microbiome with genomic and metabolomic data—a holistic approach is also likely to generate this.
Ijms 24 02458 g002
Table 1. Benefits and limitations to using particular datasets in GI and hepatobiliary research.
Table 1. Benefits and limitations to using particular datasets in GI and hepatobiliary research.
DatasetsCommon Data TypesBenefits in GI and Hepatobiliary
Research
Possible LimitationsRefs
Genetics/
Genomics
Whole genome, whole exome sequencing dataMany GI and hepatobiliary disorders have an under-characterized hereditary componentMost diseases are multifaceted—a large amount of data is needed to reveal true signals[1]
Epigenetics/
Epigenomics
Most commonly methyl-seq, ChIP-seq, and ATAC-seqThese data may offer enhanced diagnostic utility, particularly in the context of GI and hepatobiliary disorders with complex genetic etiologiesMultiple factors such as the gut microbiome and diet greatly affect epigenetic regulation
Detecting changes that can be universally tracked may be difficult without access to large amounts of data and clearly characterized subgroups
Large number of cell types may render data interpretation difficult
[2]
Transcriptomics mRNA-seq, total RNA-seq, targeted RNA-seq, scRNA-seq, and snRNA-seqMetatranscriptomics offers insights into the transcriptional activity of intestinal microbes, whose presence does not always correlate with bacterial activity
Digestive organs comprise complex cell types with specific biomarkers that make them excellent candidates for scRNA-seq analyses
Transcript expression does not always correlate with bacterial or human protein output; so, downstream validation must be performed to confirm findings
The complex distribution of cell types makes it difficult to use or interpret bulk RNA-seq and other less costly methods for digestive organ research
[3,4,5]
Proteomics NMR, integrated chromatography, and mass spectrometry dataCaptures larger compounds than metabolomic analysis; this may be important in biomarker identification and validating transcriptional activityRemains cost-prohibitive[6]
Microbiome 16S rRNA amplicon sequencing, shotgun metagenomic sequencingThe intestinal microbiome contains the highest concentration of commensal bacteria residing on human tissue
Large amounts of stool are relatively easy to collect with little human DNA present
Findings derived from stool may differ from those derived from mucosal biopsies (i.e., taken at the source)
Association does not imply causation: bacterial profiles may change due to disease; findings must be carefully validated to attribute diseases to dysbiosis
[7,8,9,10]
MetabolomicsLiquid chromatography, gas chromatography, capillary electrophoresis, and ionic mobility spectrometry mass spectrometry, NMR dataUseful for gut volatile compounds, the metabolites thought to be most associated with disease
Metabolomics combined with gut microbiome data can reveal mechanistic targets—particularly useful in study of GI and hepatobiliary disorders
The preservation of volatile compounds requires use of a buffer or immediate sample processing and may still not adequately capture their presence or abundance
Host variation in endogenous compounds, particularly those interacting with bacterial pathways, may complicate development as a diagnostic tool
[11,12]
Medical informatics EHR, questionnaire, patient interview data Procedures such as colonoscopy and FibroScan for GI and hepatobiliary disorders, respectively, have quantifiable datapoints that can be combined with clinical and demographic data for retrospective and prospective researchInconsistent techniques and data input and variation require rigorous coordination, quality assessments, and large cohorts to accurately capture differences [13]
Imaging data X-ray, CT, MRI, endoscopy, capsule imaging dataAI-based interpretation of upper, lower, and video capsule endoscopy can capture findings that may otherwise be overlookedSubjective interpretation by the operator is still required (relatively minor concern since these can be standardized)[14,15]
Refs, reference; ChIP, chromatin immunoprecipitation; NMR, nuclear magnetic resonance; EHR, electronic health record; CT, computerized axial tomography; MRI, magnetic resonance imaging; AI, artificial intelligence.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Alizadeh, M.; Sampaio Moura, N.; Schledwitz, A.; Patil, S.A.; Ravel, J.; Raufman, J.-P. Big Data in Gastroenterology Research. Int. J. Mol. Sci. 2023, 24, 2458. https://0-doi-org.brum.beds.ac.uk/10.3390/ijms24032458

AMA Style

Alizadeh M, Sampaio Moura N, Schledwitz A, Patil SA, Ravel J, Raufman J-P. Big Data in Gastroenterology Research. International Journal of Molecular Sciences. 2023; 24(3):2458. https://0-doi-org.brum.beds.ac.uk/10.3390/ijms24032458

Chicago/Turabian Style

Alizadeh, Madeline, Natalia Sampaio Moura, Alyssa Schledwitz, Seema A. Patil, Jacques Ravel, and Jean-Pierre Raufman. 2023. "Big Data in Gastroenterology Research" International Journal of Molecular Sciences 24, no. 3: 2458. https://0-doi-org.brum.beds.ac.uk/10.3390/ijms24032458

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop