Research

Jump to: Review

12 pages, 6026 KiB

Open AccessArticle

Identifying the Common Cell-Free DNA Biomarkers across Seven Major Cancer Types

by Mingyu Luo, Yining Liu and Min Zhao

Biology 2023, 12(7), 934; https://0-doi-org.brum.beds.ac.uk/10.3390/biology12070934 - 29 Jun 2023

Viewed by 1083

Blood-based detection of circulating cell-free DNA (cfDNA) is a non-invasive and easily accessible method for early cancer detection. Despite the extensive utility of cfDNA, there are still many challenges to developing clinical biomarkers. For example, cfDNA with genetic alterations often composes a small [...] Read more.

Blood-based detection of circulating cell-free DNA (cfDNA) is a non-invasive and easily accessible method for early cancer detection. Despite the extensive utility of cfDNA, there are still many challenges to developing clinical biomarkers. For example, cfDNA with genetic alterations often composes a small portion of the DNA circulating in plasma, which can be confounded by cfDNA contributed by normal cells. Therefore, filtering out the potential false-positive cfDNA mutations from healthy populations will be important for cancer-based biomarkers. Additionally, many low-frequency genetic alterations are easily overlooked in a small number of cfDNA-based cancer tests. We hypothesize that the combination of diverse types of cancer studies on cfDNA will provide us with a new perspective on the identification of low-frequency genetic variants across cancer types for promoting early diagnosis. By building a standardized computational pipeline for 1358 cfDNA samples across seven cancer types, we prioritized 129 shard genetic variants in the major cancer types. Further functional analysis of the 129 variants found that they are mainly enriched in ribosome pathways such as cotranslational protein targeting the membrane, some of which are tumour suppressors, oncogenes, and genes related to cancer initiation. In summary, our integrative analysis revealed the important roles of ribosome proteins as common biomarkers in early cancer diagnosis. Full article

(This article belongs to the Special Issue Multi-omics Data Integration in Complex Diseases)

► Show Figures

Figure 1

12 pages, 2532 KiB

Open AccessArticle

A Database of Lung Cancer-Related Genes for the Identification of Subtype-Specific Prognostic Biomarkers

by Yining Liu, Min Zhao and Hong Qu

Biology 2023, 12(3), 357; https://0-doi-org.brum.beds.ac.uk/10.3390/biology12030357 - 24 Feb 2023

Cited by 1 | Viewed by 2014

Abstract

The molecular subtype is critical for accurate treatment and follow-up in patients with lung cancer; however, information regarding subtype-associated genes is dispersed among thousands of published studies. Systematic curation and cross-validation of the scientific literature would provide a solid foundation for comparative genetic [...] Read more.

The molecular subtype is critical for accurate treatment and follow-up in patients with lung cancer; however, information regarding subtype-associated genes is dispersed among thousands of published studies. Systematic curation and cross-validation of the scientific literature would provide a solid foundation for comparative genetic studies of the major molecular subtypes of lung cancer. Here, we constructed a literature-based lung cancer gene database (LCGene). In the current release, we collected and curated 2507 unique human genes, including 2267 protein-coding and 240 non-coding genes from comprehensive manual examination of 10,960 PubMed article abstracts. Extensive annotations were added to aid identification of differentially expressed genes, potential gene editing sites, and non-coding gene regulation. For instance, we prepared 607 curated genes with CRISPR knockout information in 43 lung cancer cell lines. Further comparison of these implicated genes among different subtypes identified several subtype-specific genes with high mutational frequencies. Common tumor suppressors and oncogenes shared by lung adenocarcinoma and lung squamous cell carcinoma, for example, exhibited different mutational frequencies and prognostic features, suggesting the presence of subtype-specific biomarkers. Our retrospective analysis revealed 43 small cell lung cancer-specific genes. Moreover, 52 tumor suppressors and oncogenes shared by lung adenocarcinoma and squamous cell carcinoma confirmed the different molecular mechanisms of these two cancer subtypes. The subtype-based genetic differences, when combined, may provide insight into subtype-specific biomarkers for genetic testing. Full article

(This article belongs to the Special Issue Multi-omics Data Integration in Complex Diseases)

► Show Figures

Figure 1

28 pages, 2298 KiB

Open AccessArticle

Interpretable and Predictive Deep Neural Network Modeling of the SARS-CoV-2 Spike Protein Sequence to Predict COVID-19 Disease Severity

by Bahrad A. Sokhansanj, Zhengqiao Zhao and Gail L. Rosen

Biology 2022, 11(12), 1786; https://0-doi-org.brum.beds.ac.uk/10.3390/biology11121786 - 08 Dec 2022

Cited by 5 | Viewed by 1835

Abstract

Through the COVID-19 pandemic, SARS-CoV-2 has gained and lost multiple mutations in novel or unexpected combinations. Predicting how complex mutations affect COVID-19 disease severity is critical in planning public health responses as the virus continues to evolve. This paper presents a novel computational [...] Read more.

Through the COVID-19 pandemic, SARS-CoV-2 has gained and lost multiple mutations in novel or unexpected combinations. Predicting how complex mutations affect COVID-19 disease severity is critical in planning public health responses as the virus continues to evolve. This paper presents a novel computational framework to complement conventional lineage classification and applies it to predict the severe disease potential of viral genetic variation. The transformer-based neural network model architecture has additional layers that provide sample embeddings and sequence-wide attention for interpretation and visualization. First, training a model to predict SARS-CoV-2 taxonomy validates the architecture’s interpretability. Second, an interpretable predictive model of disease severity is trained on spike protein sequence and patient metadata from GISAID. Confounding effects of changing patient demographics, increasing vaccination rates, and improving treatment over time are addressed by including demographics and case date as independent input to the neural network model. The resulting model can be interpreted to identify potentially significant virus mutations and proves to be a robust predctive tool. Although trained on sequence data obtained entirely before the availability of empirical data for Omicron, the model can predict the Omicron’s reduced risk of severe disease, in accord with epidemiological and experimental data. Full article

(This article belongs to the Special Issue Multi-omics Data Integration in Complex Diseases)

► Show Figures

Figure 1

17 pages, 4266 KiB

Open AccessArticle

A Genomics Resource for 12 Edible Seaweeds to Predict Seaweed-Secreted Peptides with Potential Anti-Cancer Function

by Yining Liu, Scott F. Cummins and Min Zhao

Biology 2022, 11(10), 1458; https://0-doi-org.brum.beds.ac.uk/10.3390/biology11101458 - 04 Oct 2022

Viewed by 1815

Abstract

Seaweeds are multicellular marine macroalgae with natural compounds that have potential anticancer activity. To date, the identification of those compounds has relied on purification and assay, yet few have been documented. Additionally, the genomes and associated proteomes of edible seaweeds that have been [...] Read more.

Seaweeds are multicellular marine macroalgae with natural compounds that have potential anticancer activity. To date, the identification of those compounds has relied on purification and assay, yet few have been documented. Additionally, the genomes and associated proteomes of edible seaweeds that have been identified thus far are scattered among different resources and with no systematic summary available, which hinders the development of a large-scale omics analysis. To enable this, we constructed a comprehensive genomics resource for the edible seaweeds. These data could be used for systematic metabolomics and a proteome search for anti-cancer compound and peptides. In brief, we integrated and annotated 12 publicly available edible seaweed genomes (8 species and 268,071 proteins). In addition, we integrate the new seaweed genomic resources with established cancer bioinformatics pipelines to help identify potential seaweed proteins that could help mitigate the development of cancer. We present 7892 protein domains that were predicted to be associated with cancer proteins based on a protein domain–domain interaction. The most enriched protein families were associated with protein phosphorylation and insulin signalling, both of which are recognised to be crucial molecular components for patient survival in various cancers. In addition, we found 6692 seaweed proteins that could interact with over 100 tumour suppressor proteins, of which 147 are predicted to be secreted proteins. In conclusion, our genomics resource not only may be helpful in exploring the genomics features of these edible seaweed but also may provide a new avenue to explore the molecular mechanisms for seaweed-associated inhibition of human cancer development. Full article

(This article belongs to the Special Issue Multi-omics Data Integration in Complex Diseases)

► Show Figures

Figure 1

18 pages, 3951 KiB

Open AccessArticle

Integrated In Silico Analyses Identify PUF60 and SF3A3 as New Spliceosome-Related Breast Cancer RNA-Binding Proteins

by Jennyfer M. García-Cárdenas, Isaac Armendáriz-Castillo, Andy Pérez-Villa, Alberto Indacochea, Andrea Jácome-Alvarado, Andrés López-Cortés and Santiago Guerrero

Biology 2022, 11(4), 481; https://0-doi-org.brum.beds.ac.uk/10.3390/biology11040481 - 22 Mar 2022

Cited by 3 | Viewed by 3195

Abstract

More women are diagnosed with breast cancer (BC) than any other type of cancer. Although large-scale efforts have completely redefined cancer, a cure remains unattainable. In that respect, new molecular functions of the cell should be investigated, such as post-transcriptional regulation. RNA-binding proteins [...] Read more.

More women are diagnosed with breast cancer (BC) than any other type of cancer. Although large-scale efforts have completely redefined cancer, a cure remains unattainable. In that respect, new molecular functions of the cell should be investigated, such as post-transcriptional regulation. RNA-binding proteins (RBPs) are emerging as critical post-transcriptional modulators of tumorigenesis, but only a few have clear roles in BC. To recognize new putative breast cancer RNA-binding proteins, we performed integrated in silico analyses of all human RBPs (n = 1392) in three major cancer databases and identified five putative BC RBPs (PUF60, TFRC, KPNB1, NSF, and SF3A3), which showed robust oncogenic features related to their genomic alterations, immunohistochemical changes, high interconnectivity with cancer driver genes (CDGs), and tumor vulnerabilities. Interestingly, some of these RBPs have never been studied in BC, but their oncogenic functions have been described in other cancer types. Subsequent analyses revealed PUF60 and SF3A3 as central elements of a spliceosome-related cluster involving RBPs and CDGs. Further research should focus on the mechanisms by which these proteins could promote breast tumorigenesis, with the potential to reveal new therapeutic pathways along with novel drug-development strategies. Full article

(This article belongs to the Special Issue Multi-omics Data Integration in Complex Diseases)

► Show Figures

Figure 1

15 pages, 907 KiB

Open AccessArticle

Integration of Multimodal Data from Disparate Sources for Identifying Disease Subtypes

by Kaiyue Zhou, Bhagya Shree Kottoori, Seeya Awadhut Munj, Zhewei Zhang, Sorin Draghici and Suzan Arslanturk

Biology 2022, 11(3), 360; https://0-doi-org.brum.beds.ac.uk/10.3390/biology11030360 - 24 Feb 2022

Cited by 5 | Viewed by 3160

Abstract

Studies over the past decade have generated a wealth of molecular data that can be leveraged to better understand cancer risk, progression, and outcomes. However, understanding the progression risk and differentiating long- and short-term survivors cannot be achieved by analyzing data from a [...] Read more.

Studies over the past decade have generated a wealth of molecular data that can be leveraged to better understand cancer risk, progression, and outcomes. However, understanding the progression risk and differentiating long- and short-term survivors cannot be achieved by analyzing data from a single modality due to the heterogeneity of disease. Using a scientifically developed and tested deep-learning approach that leverages aggregate information collected from multiple repositories with multiple modalities (e.g., mRNA, DNA Methylation, miRNA) could lead to a more accurate and robust prediction of disease progression. Here, we propose an autoencoder based multimodal data fusion system, in which a fusion encoder flexibly integrates collective information available through multiple studies with partially coupled data. Our results on a fully controlled simulation-based study have shown that inferring the missing data through the proposed data fusion pipeline allows a predictor that is superior to other baseline predictors with missing modalities. Results have further shown that short- and long-term survivors of glioblastoma multiforme, acute myeloid leukemia, and pancreatic adenocarcinoma can be successfully differentiated with an AUC of 0.94, 0.75, and 0.96, respectively. Full article

(This article belongs to the Special Issue Multi-omics Data Integration in Complex Diseases)

► Show Figures

Figure 1

17 pages, 4035 KiB

Open AccessArticle

Identification of Key Proteins from the Alternative Lengthening of Telomeres-Associated Promyelocytic Leukemia Nuclear Bodies Pathway

by Isaac Armendáriz-Castillo, Katherine Hidalgo-Fernández, Andy Pérez-Villa, Jennyfer M. García-Cárdenas, Andrés López-Cortés and Santiago Guerrero

Biology 2022, 11(2), 185; https://0-doi-org.brum.beds.ac.uk/10.3390/biology11020185 - 25 Jan 2022

Cited by 3 | Viewed by 3520

Abstract

Alternative lengthening of telomeres-associated promyelocytic leukemia nuclear bodies (APBs) are a hallmark of telomere maintenance. In the last few years, APBs have been described as the main place where telomeric extension occurs in ALT-positive cancer cell lines. A different set of proteins have [...] Read more.

Alternative lengthening of telomeres-associated promyelocytic leukemia nuclear bodies (APBs) are a hallmark of telomere maintenance. In the last few years, APBs have been described as the main place where telomeric extension occurs in ALT-positive cancer cell lines. A different set of proteins have been associated with APBs function, however, the molecular mechanisms behind their assembly, colocalization, and clustering of telomeres, among others, remain unclear. To improve the understanding of APBs in the ALT pathway, we integrated multiomics analyses to evaluate genomic, transcriptomic and proteomic alterations, and functional interactions of 71 APBs-related genes/proteins in 32 Pan-Cancer Atlas studies from The Cancer Genome Atlas Consortium (TCGA). As a result, we identified 13 key proteins which showed distinctive mutations, interactions, and functional enrichment patterns across all the cancer types and proposed this set of proteins as candidates for future ex vivo and in vivo analyses that will validate these proteins to improve the understanding of the ALT pathway, fill the current research gap about APBs function and their role in ALT, and be considered as potential therapeutic targets for the diagnosis and treatment of ALT-positive cancers in the future. Full article

(This article belongs to the Special Issue Multi-omics Data Integration in Complex Diseases)

► Show Figures

Figure 1

Review

Jump to: Research

23 pages, 1835 KiB

Open AccessReview

Integration of Omics Data and Network Models to Unveil Negative Aspects of SARS-CoV-2, from Pathogenic Mechanisms to Drug Repurposing

by Letizia Bernardo, Andrea Lomagno, Pietro Luigi Mauri and Dario Di Silvestre

Biology 2023, 12(9), 1196; https://0-doi-org.brum.beds.ac.uk/10.3390/biology12091196 - 31 Aug 2023

Viewed by 1305

Abstract

Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) caused the COVID-19 health emergency, affecting and killing millions of people worldwide. Following SARS-CoV-2 infection, COVID-19 patients show a spectrum of symptoms ranging from asymptomatic to very severe manifestations. In particular, bronchial and pulmonary cells, involved [...] Read more.

Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) caused the COVID-19 health emergency, affecting and killing millions of people worldwide. Following SARS-CoV-2 infection, COVID-19 patients show a spectrum of symptoms ranging from asymptomatic to very severe manifestations. In particular, bronchial and pulmonary cells, involved at the initial stage, trigger a hyper-inflammation phase, damaging a wide range of organs, including the heart, brain, liver, intestine and kidney. Due to the urgent need for solutions to limit the virus’ spread, most efforts were initially devoted to mapping outbreak trajectories and variant emergence, as well as to the rapid search for effective therapeutic strategies. Samples collected from hospitalized or dead COVID-19 patients from the early stages of pandemic have been analyzed over time, and to date they still represent an invaluable source of information to shed light on the molecular mechanisms underlying the organ/tissue damage, the knowledge of which could offer new opportunities for diagnostics and therapeutic designs. For these purposes, in combination with clinical data, omics profiles and network models play a key role providing a holistic view of the pathways, processes and functions most affected by viral infection. In fact, in addition to epidemiological purposes, networks are being increasingly adopted for the integration of multiomics data, and recently their use has expanded to the identification of drug targets or the repositioning of existing drugs. These topics will be covered here by exploring the landscape of SARS-CoV-2 survey-based studies using systems biology approaches derived from omics data, paying particular attention to those that have considered samples of human origin. Full article

(This article belongs to the Special Issue Multi-omics Data Integration in Complex Diseases)

► Show Figures