When Genes Meet Artificial Intelligence and Machine Learning

A special issue of Genes (ISSN 2073-4425). This special issue belongs to the section "Bioinformatics".

Deadline for manuscript submissions: closed (20 April 2024) | Viewed by 2910

Special Issue Editor


E-Mail Website
Guest Editor
Institute for Food Safety and Health, Illinois Institute of Technology, Chicago, IL 60501, USA
Interests: microbial genomics; environmental microbiology; microbial ecology; bioinformatics; machine learning
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues, 

In recent years, artificial intelligence (AI) and machine learning (ML) have gained significant attention, with publications doubling every 2.5 years, and have gained significant attention in the field of genetics. As the generated genomic data are being accumulated fast, the application of machine learning in gene research has become increasingly important. This has led to the development of novel techniques and approaches that enable researchers to analyze and interpret large-scale genomic data.

The aim of this Special Issue of Genes is to showcase the latest advances in the field of machine learning and its application in gene research. This issue will provide a platform for researchers to share their experiences, successes, and challenges in the use of machine learning algorithms to analyze and interpret genetic data.

One example of the successful application of machine learning in gene research is the PlasmidHunter tool by our guest editor. PlasmidHunter is a machine learning-based tool that predicts the type of DNA molecules in bacterial genomes using sequence data. It achieves an impressive accuracy of 96% and has been shown to outperform traditional methods for plasmid detection.

We invite researchers to submit original research articles, reviews, and perspectives that demonstrate the application of machine learning techniques in gene research. This issue aims to foster interdisciplinary collaborations between computer scientists and geneticists, and to promote the development of new tools and approaches that will advance our understanding of genetics and genomics.

Dr. Renmao Tian
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Genes is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • artificial intelligence
  • machine learning
  • gene
  • genome
  • genomic data

Published Papers (2 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

16 pages, 6399 KiB  
Article
DARDN: A Deep-Learning Approach for CTCF Binding Sequence Classification and Oncogenic Regulatory Feature Discovery
by Hyun Jae Cho, Zhenjia Wang, Yidan Cong, Stefan Bekiranov, Aidong Zhang and Chongzhi Zang
Genes 2024, 15(2), 144; https://0-doi-org.brum.beds.ac.uk/10.3390/genes15020144 - 23 Jan 2024
Viewed by 1083
Abstract
Characterization of gene regulatory mechanisms in cancer is a key task in cancer genomics. CCCTC-binding factor (CTCF), a DNA binding protein, exhibits specific binding patterns in the genome of cancer cells and has a non-canonical function to facilitate oncogenic transcription programs by cooperating [...] Read more.
Characterization of gene regulatory mechanisms in cancer is a key task in cancer genomics. CCCTC-binding factor (CTCF), a DNA binding protein, exhibits specific binding patterns in the genome of cancer cells and has a non-canonical function to facilitate oncogenic transcription programs by cooperating with transcription factors bound at flanking distal regions. Identification of DNA sequence features from a broad genomic region that distinguish cancer-specific CTCF binding sites from regular CTCF binding sites can help find oncogenic transcription factors in a cancer type. However, the presence of long DNA sequences without localization information makes it difficult to perform conventional motif analysis. Here, we present DNAResDualNet (DARDN), a computational method that utilizes convolutional neural networks (CNNs) for predicting cancer-specific CTCF binding sites from long DNA sequences and employs DeepLIFT, a method for interpretability of deep learning models that explains the model’s output in terms of the contributions of its input features. The method is used for identifying DNA sequence features associated with cancer-specific CTCF binding. Evaluation on DNA sequences associated with CTCF binding sites in T-cell acute lymphoblastic leukemia (T-ALL) and other cancer types demonstrates DARDN’s ability in classifying DNA sequences surrounding cancer-specific CTCF binding from control constitutive CTCF binding and identifying sequence motifs for transcription factors potentially active in each specific cancer type. We identify potential oncogenic transcription factors in T-ALL, acute myeloid leukemia (AML), breast cancer (BRCA), colorectal cancer (CRC), lung adenocarcinoma (LUAD), and prostate cancer (PRAD). Our work demonstrates the power of advanced machine learning and feature discovery approach in finding biologically meaningful information from complex high-throughput sequencing data. Full article
(This article belongs to the Special Issue When Genes Meet Artificial Intelligence and Machine Learning)
Show Figures

Figure 1

14 pages, 303 KiB  
Article
When Protein Structure Embedding Meets Large Language Models
by Sarwan Ali, Prakash Chourasia and Murray Patterson
Genes 2024, 15(1), 25; https://0-doi-org.brum.beds.ac.uk/10.3390/genes15010025 - 23 Dec 2023
Viewed by 1349
Abstract
Protein structure analysis is essential in various bioinformatics domains such as drug discovery, disease diagnosis, and evolutionary studies. Within structural biology, the classification of protein structures is pivotal, employing machine learning algorithms to categorize structures based on data from databases like the Protein [...] Read more.
Protein structure analysis is essential in various bioinformatics domains such as drug discovery, disease diagnosis, and evolutionary studies. Within structural biology, the classification of protein structures is pivotal, employing machine learning algorithms to categorize structures based on data from databases like the Protein Data Bank (PDB). To predict protein functions, embeddings based on protein sequences have been employed. Creating numerical embeddings that preserve vital information while considering protein structure and sequence presents several challenges. The existing literature lacks a comprehensive and effective approach that combines structural and sequence-based features to achieve efficient protein classification. While large language models (LLMs) have exhibited promising outcomes for protein function prediction, their focus primarily lies on protein sequences, disregarding the 3D structures of proteins. The quality of embeddings heavily relies on how well the geometry of the embedding space aligns with the underlying data structure, posing a critical research question. Traditionally, Euclidean space has served as a widely utilized framework for embeddings. In this study, we propose a novel method for designing numerical embeddings in Euclidean space for proteins by leveraging 3D structure information, specifically employing the concept of contact maps. These embeddings are synergistically combined with features extracted from LLMs and traditional feature engineering techniques to enhance the performance of embeddings in supervised protein analysis. Experimental results on benchmark datasets, including PDB Bind and STCRDAB, demonstrate the superior performance of the proposed method for protein function prediction. Full article
(This article belongs to the Special Issue When Genes Meet Artificial Intelligence and Machine Learning)
Show Figures

Figure 1

Back to TopTop