Machine Learning Techniques in Molecular Function and Structure Analysis

A special issue of Applied Sciences (ISSN 2076-3417). This special issue belongs to the section "Computing and Artificial Intelligence".

Deadline for manuscript submissions: closed (31 December 2022) | Viewed by 8689

Special Issue Editor


grade E-Mail Website
Guest Editor
School of Software, Shandong University, Jinan 250300, China
Interests: bioinformatics; machine learning; data mining
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

The rapid increase in data dimension is a challenge for traditional analysis methods in bioinformatics and computational biology. Accordingly, there is an urgent need to develop computational methods to exploit these masses of molecular data more effectively, characterize the molecular structures from such large data, and reveal their functional roles in biological processes. For this purpose, machine learning naturally appears as one of the main drivers of progress. Machine learning and pattern recognition techniques are able to extract useful patterns hidden in the large-scale data and make effective use of these patterns to perform accurate predictions regarding future data. In recent years, bioinformatics has already induced significant new developments of general interest in machine learning, for example in the context of learning with structured data, graph inference, semi-supervised learning, and novel combinations of optimization and learning algorithms. Please kindly note that all submitted papers should be within the scope of the journal.

In this Special Issue, we will explore the potential of applying machine learning and related computational techniques to mine and model a significant amount of molecular data for structural and functional analysis. Possible research topics include, but are not limited to:

Modeling and analysis of gene expression data;

Prediction and analysis of gene regulatory elements;

Reconstruction and inference of biological networks;

Prediction of protein function, protein–protein interactions and interaction sites;

Identification of essential genes and biomarkers for disease diagnosis and prognosis.

Prof. Leyi Wei
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Applied Sciences is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Published Papers (4 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

14 pages, 3188 KiB  
Article
Predicting Algorithm of Tissue Cell Ratio Based on Deep Learning Using Single-Cell RNA Sequencing
by Zhendong Liu, Xinrong Lv, Xi Chen, Dongyan Li, Mengying Qin, Ke Bai, Yurong Yang, Xiaofeng Li and Peng Zhang
Appl. Sci. 2022, 12(12), 5790; https://0-doi-org.brum.beds.ac.uk/10.3390/app12125790 - 07 Jun 2022
Cited by 1 | Viewed by 1458
Abstract
Background: Understanding the proportion of cell types in heterogeneous tissue samples is important in bioinformatics. It is a challenge to infer the proportion of tissues using bulk RNA sequencing data in bioinformatics because most traditional algorithms for predicting tissue cell ratios heavily rely [...] Read more.
Background: Understanding the proportion of cell types in heterogeneous tissue samples is important in bioinformatics. It is a challenge to infer the proportion of tissues using bulk RNA sequencing data in bioinformatics because most traditional algorithms for predicting tissue cell ratios heavily rely on standardized specific cell-type gene expression profiles, and do not consider tissue heterogeneity. The prediction accuracy of algorithms is limited, and robustness is lacking. This means that new approaches are needed urgently. Methods: In this study, we introduced an algorithm that automatically predicts tissue cell ratios named Autoptcr. The algorithm uses the data simulated by single-cell RNA sequencing (ScRNA-Seq) for model training, using convolutional neural networks (CNNs) to extract intrinsic relationships between genes and predict the cell proportions of tissues. Results: We trained the algorithm using simulated bulk samples and made predictions using real bulk PBMC data. Comparing Autoptcr with existing advanced algorithms, the Pearson correlation coefficient between the actual value of Autoptcr and the predicted value was the highest, reaching 0.903. Tested on a bulk sample, the correlation coefficient of Lin was 41% higher than that of CSx. The algorithm can infer tissue cell proportions directly from tissue gene expression data. Conclusions: The Autoptcr algorithm uses simulated ScRNA-Seq data for training to solve the problem of specific cell-type gene expression profiles. It also has high prediction accuracy and strong noise resistance for the tissue cell ratio. This work is expected to provide new research ideas for the prediction of tissue cell proportions. Full article
Show Figures

Figure 1

21 pages, 7853 KiB  
Article
Dual Head and Dual Attention in Deep Learning for End-to-End EEG Motor Imagery Classification
by Meiyan Xu, Junfeng Yao and Hualiang Ni
Appl. Sci. 2021, 11(22), 10906; https://0-doi-org.brum.beds.ac.uk/10.3390/app112210906 - 18 Nov 2021
Cited by 1 | Viewed by 2489
Abstract
Event-Related Desynchronization (ERD) or Electroencephalogram (EEG) wavelet is essential for motor imagery (MI) classification and BMI (Brain–Machine Interface) application. However, it is difficult to recognize multiple tasks for non-trained subjects that are indispensable for the complexities of the task or the uncertainties in [...] Read more.
Event-Related Desynchronization (ERD) or Electroencephalogram (EEG) wavelet is essential for motor imagery (MI) classification and BMI (Brain–Machine Interface) application. However, it is difficult to recognize multiple tasks for non-trained subjects that are indispensable for the complexities of the task or the uncertainties in the environment. The subject-independent scenario, where an inter-subject trained model can be directly applied to new users without precalibration, is particularly desired. Therefore, this paper focuses on an effective attention mechanism which can be applied to a subject-independent set to learn EEG motor imagery features. Firstly, a custom form of sequence inputs with spatial and temporal dimensions is adopted for dual headed attention via deep convolution net (DHDANet). Secondly, DHDANet simultaneously learns temporal and spacial features. The features of spacial attention on each input head are divided into two parts for spatial attentional learning subsequently. The proposed model is validated based on the EEG-MI signals collected from 54 subjects in two sessions with 200 trials in each sessions. The classification of left and right hand motor imagery in this paper achieves an average accuracy of 75.52%, a significant improvement compared to state-of-the-art methods. In addition, the visualization of the frequency analysis method demonstrates that the temporal-convolution and spectral-attention is capable of identifying the ERD for EEG-MI. The proposed machine learning structure enables cross-session and cross-subject classification and makes significant progress in the BMI transfer learning problem. Full article
Show Figures

Figure 1

19 pages, 3060 KiB  
Article
6mAPred-MSFF: A Deep Learning Model for Predicting DNA N6-Methyladenine Sites across Species Based on a Multi-Scale Feature Fusion Mechanism
by Rao Zeng and Minghong Liao
Appl. Sci. 2021, 11(16), 7731; https://0-doi-org.brum.beds.ac.uk/10.3390/app11167731 - 22 Aug 2021
Cited by 5 | Viewed by 2319
Abstract
DNA methylation is one of the most extensive epigenetic modifications. DNA N6-methyladenine (6mA) plays a key role in many biology regulation processes. An accurate and reliable genome-wide identification of 6mA sites is crucial for systematically understanding its biological functions. Some machine learning tools [...] Read more.
DNA methylation is one of the most extensive epigenetic modifications. DNA N6-methyladenine (6mA) plays a key role in many biology regulation processes. An accurate and reliable genome-wide identification of 6mA sites is crucial for systematically understanding its biological functions. Some machine learning tools can identify 6mA sites, but their limited prediction accuracy and lack of robustness limit their usability in epigenetic studies, which implies the great need of developing new computational methods for this problem. In this paper, we developed a novel computational predictor, namely the 6mAPred-MSFF, which is a deep learning framework based on a multi-scale feature fusion mechanism to identify 6mA sites across different species. In the predictor, we integrate the inverted residual block and multi-scale attention mechanism to build lightweight and deep neural networks. As compared to existing predictors using traditional machine learning, our deep learning framework needs no prior knowledge of 6mA or manually crafted sequence features and sufficiently capture better characteristics of 6mA sites. By benchmarking comparison, our deep learning method outperforms the state-of-the-art methods on the 5-fold cross-validation test on the seven datasets of six species, demonstrating that the proposed 6mAPred-MSFF is more effective and generic. Specifically, our proposed 6mAPred-MSFF gives the sensitivity and specificity of the 5-fold cross-validation on the 6mA-rice-Lv dataset as 97.88% and 94.64%, respectively. Our model trained with the rice data predicts well the 6mA sites of other five species: Arabidopsis thaliana, Fragaria vesca, Rosa chinensis, Homo sapiens, and Drosophila melanogaster with a prediction accuracy 98.51%, 93.02%, and 91.53%, respectively. Moreover, via experimental comparison, we explored performance impact by training and testing our proposed model under different encoding schemes and feature descriptors. Full article
Show Figures

Figure 1

16 pages, 2613 KiB  
Article
Accurate Prediction and Key Feature Recognition of Immunoglobulin
by Yuxin Gong, Bo Liao, Dejun Peng and Quan Zou
Appl. Sci. 2021, 11(15), 6894; https://0-doi-org.brum.beds.ac.uk/10.3390/app11156894 - 27 Jul 2021
Cited by 5 | Viewed by 1515
Abstract
Immunoglobulin, which is also called an antibody, is a type of serum protein produced by B cells that can specifically bind to the corresponding antigen. Immunoglobulin is closely related to many diseases and plays a key role in medical and biological circles. Therefore, [...] Read more.
Immunoglobulin, which is also called an antibody, is a type of serum protein produced by B cells that can specifically bind to the corresponding antigen. Immunoglobulin is closely related to many diseases and plays a key role in medical and biological circles. Therefore, the use of effective methods to improve the accuracy of immunoglobulin classification is of great significance for disease research. In this paper, the CC–PSSM and monoTriKGap methods were selected to extract the immunoglobulin features, MRMD1.0 and MRMD2.0 were used to reduce the feature dimension, and the effect of discriminating the two–dimensional key features identified by the single dimension reduction method from the mixed two–dimensional key features was used to distinguish the immunoglobulins. The data results indicated that monoTrikGap (k = 1) can accurately predict 99.5614% of immunoglobulins under 5-fold cross–validation. In addition, CC–PSSM is the best method for identifying mixed two–dimensional key features and can distinguish 92.1053% of immunoglobulins. The above proves that the method used in this paper is reliable for predicting immunoglobulin and identifying key features. Full article
Show Figures

Graphical abstract

Back to TopTop