Explainable Artificial Intelligence in Bioinformatic

A special issue of Algorithms (ISSN 1999-4893).

Deadline for manuscript submissions: closed (15 April 2022) | Viewed by 7839

Special Issue Editor


E-Mail Website
Guest Editor
College of Medicine, Taipei Medical University, Taipei 11031, China
Interests: artificial intelligence; bioinformatics; biomedical and healthcare informatics; radiomics; medical imaging
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Bioinformatics is an interdisciplinary field that develops methods and software tools for understanding biological data. It attracts a lot of researchers from a variety of fields including biology, computer science, mathematics, statistics, and so on. Recently, with the assistance of fast improving explainable artificial intelligence (XAI) algorithms, we can use highly efficient data mining tools to handle a huge body of bioinformatics databases. XAI in bioinformatics includes both basic as well as clinical research with the information of biological sequence functions, protein structures, protein-protein interactions, single cell sequencing, etc. This analysis helps in the design and discovery of drugs as well as complex systems.

Keywords

Keywords

Explainable AI

Machine learning and deep learning

Computational biology

Protein function prediction

Protein-protein interaction prediction

Single-cell sequencing analysis

Cancer genomics

RNA sequencing analysis

 

Published Papers (2 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

14 pages, 1529 KiB  
Article
Graph Based Feature Selection for Reduction of Dimensionality in Next-Generation RNA Sequencing Datasets
by Consolata Gakii, Paul O. Mireji and Richard Rimiru
Algorithms 2022, 15(1), 21; https://0-doi-org.brum.beds.ac.uk/10.3390/a15010021 - 10 Jan 2022
Cited by 6 | Viewed by 3613
Abstract
Analysis of high-dimensional data, with more features (p) than observations (N) (p>N), places significant demand in cost and memory computational usage attributes. Feature selection can be used to reduce the dimensionality of the data. We [...] Read more.
Analysis of high-dimensional data, with more features (p) than observations (N) (p>N), places significant demand in cost and memory computational usage attributes. Feature selection can be used to reduce the dimensionality of the data. We used a graph-based approach, principal component analysis (PCA) and recursive feature elimination to select features for classification from RNAseq datasets from two lung cancer datasets. The selected features were discretized for association rule mining where support and lift were used to generate informative rules. Our results show that the graph-based feature selection improved the performance of sequential minimal optimization (SMO) and multilayer perceptron classifiers (MLP) in both datasets. In association rule mining, features selected using the graph-based approach outperformed the other two feature-selection techniques at a support of 0.5 and lift of 2. The non-redundant rules reflect the inherent relationships between features. Biological features are usually related to functions in living systems, a relationship that cannot be deduced by feature selection and classification alone. Therefore, the graph-based feature-selection approach combined with rule mining is a suitable way of selecting and finding associations between features in high-dimensional RNAseq data. Full article
(This article belongs to the Special Issue Explainable Artificial Intelligence in Bioinformatic)
Show Figures

Figure 1

21 pages, 3667 KiB  
Article
Robust Representation and Efficient Feature Selection Allows for Effective Clustering of SARS-CoV-2 Variants
by Zahra Tayebi, Sarwan Ali and Murray Patterson
Algorithms 2021, 14(12), 348; https://0-doi-org.brum.beds.ac.uk/10.3390/a14120348 - 29 Nov 2021
Cited by 15 | Viewed by 2485
Abstract
The widespread availability of large amounts of genomic data on the SARS-CoV-2 virus, as a result of the COVID-19 pandemic, has created an opportunity for researchers to analyze the disease at a level of detail, unlike any virus before it. On the one [...] Read more.
The widespread availability of large amounts of genomic data on the SARS-CoV-2 virus, as a result of the COVID-19 pandemic, has created an opportunity for researchers to analyze the disease at a level of detail, unlike any virus before it. On the one hand, this will help biologists, policymakers, and other authorities to make timely and appropriate decisions to control the spread of the coronavirus. On the other hand, such studies will help to more effectively deal with any possible future pandemic. Since the SARS-CoV-2 virus contains different variants, each of them having different mutations, performing any analysis on such data becomes a difficult task, given the size of the data. It is well known that much of the variation in the SARS-CoV-2 genome happens disproportionately in the spike region of the genome sequence—the relatively short region which codes for the spike protein(s). In this paper, we propose a robust feature-vector representation of biological sequences that, when combined with the appropriate feature selection method, allows different downstream clustering approaches to perform well on a variety of different measures. We use such proposed approach with an array of clustering techniques to cluster spike protein sequences in order to study the behavior of different known variants that are increasing at a very high rate throughout the world. We use a k-mers based approach first to generate a fixed-length feature vector representation of the spike sequences. We then show that we can efficiently and effectively cluster the spike sequences based on the different variants with the appropriate feature selection. Using a publicly available set of SARS-CoV-2 spike sequences, we perform clustering of these sequences using both hard and soft clustering methods and show that, with our feature selection methods, we can achieve higher F1 scores for the clusters and also better clustering quality metrics compared to baselines. Full article
(This article belongs to the Special Issue Explainable Artificial Intelligence in Bioinformatic)
Show Figures

Figure 1

Back to TopTop