molecules-logo

Journal Browser

Journal Browser

Data and Low-Data Tools for Artificial Intelligence in Medicinal Chemistry

A special issue of Molecules (ISSN 1420-3049). This special issue belongs to the section "Medicinal Chemistry".

Deadline for manuscript submissions: closed (15 August 2022) | Viewed by 15193

Special Issue Editors


E-Mail Website
Guest Editor
Department of Pharmaceutical Sciences, University of Milan, Via Mangiagalli 25, 20133 Milan, Italy
Interests: prediction of metabolic reactions and toxicity mechanisms; prediction of the bioactivity profile of new drug candidates; structure-based researches applied to drug design; development of new strategies to improve the performance of predictive algorithms

E-Mail Website
Guest Editor
Department of Biomedical Engineering, Institute for Complex Molecular Systems, Eindhoven University of Technology, Groene Loper 3, 5612 AE, Eindhoven, The Netherlands
Interests: drug discovery; virtual screening; QSAR; machine learning; molecular descriptors; de novo design; generative deep learning

Special Issue Information

Dear Colleagues,

In the last few years, the scientific community has witnessed the renaissance of so-called “artificial intelligence” (AI) methods in many scientific domains. Machine and deep learning methods have the potential to transform people’s life in multiple aspects and sectors (healthcare, education, marketing, etc.), possibly translating into a general benefit for society.

This unparalleled emergence of AI can also be observed in medicinal chemistry and toxicology, where machine learning is starting to be routinely applied for several tasks, such as property prediction, retrosynthesis planning, and molecule generation. In medicinal chemistry, unlike in other fields (e.g., image recognition, language translation), however, the potential of “data hungry” machine and deep learning algorithms is often limited by the lack of data, both in terms of numerosity and quality. For this reason, the scientific community is in need of high-quality datasets, open access curation pipelines, and AI tools tailored for low-data regimes.

In this Special Issue, we welcome original research articles and reviews aimed to improve the current status of data and their usage for AI in medicinal chemistry and related fields. The Special Issue will include, but is not limited to, ligand- and structure-based approaches, molecular design, virtual screening, target identification, drug repurposing as well as bioactivity, safety, and ADMET property prediction. We particularly welcome papers focused on the creation and curation of novel datasets or describing the curation and/or usefulness of well-established databases. We also encourage the submission of papers addressing the development/application of AI approaches in low-data regimes. Papers providing accessible code and data are also particularly welcome.

To further improve the impact of the proposed Special Issue, upon acceptance and agreement with the authors, the datasets will be collected in a dedicated repository on Zenodo (Zenodo.org) and assigned a DOI identifier.

We look forward to receiving your submissions!

Dr. Angelica Mazzolari
Dr. Francesca Grisoni
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Molecules is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2700 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • artificial intelligence
  • machine learning
  • medicinal chemistry
  • small dataset
  • high-quality dataset
  • data curation
  • drug repurposing
  • virtual screening
  • ADMET

Published Papers (6 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

22 pages, 2647 KiB  
Article
Prediction Models for Brain Distribution of Drugs Based on Biomimetic Chromatographic Data
by Theodosia Vallianatou, Fotios Tsopelas and Anna Tsantili-Kakoulidou
Molecules 2022, 27(12), 3668; https://0-doi-org.brum.beds.ac.uk/10.3390/molecules27123668 - 07 Jun 2022
Cited by 5 | Viewed by 1657
Abstract
The development of high-throughput approaches for the valid estimation of brain disposition is of great importance in the early drug screening of drug candidates. However, the complexity of brain tissue, which is protected by a unique vasculature formation called the blood–brain barrier (BBB), [...] Read more.
The development of high-throughput approaches for the valid estimation of brain disposition is of great importance in the early drug screening of drug candidates. However, the complexity of brain tissue, which is protected by a unique vasculature formation called the blood–brain barrier (BBB), complicates the development of robust in silico models. In addition, most computational approaches focus only on brain permeability data without considering the crucial factors of plasma and tissue binding. In the present study, we combined experimental data obtained by HPLC using three biomimetic columns, i.e., immobilized artificial membranes, human serum albumin, and α1-acid glycoprotein, with molecular descriptors to model brain disposition of drugs. Kp,uu,brain, as the ratio between the unbound drug concentration in the brain interstitial fluid to the corresponding plasma concentration, brain permeability, the unbound fraction in the brain, and the brain unbound volume of distribution, was collected from literature. Given the complexity of the investigated biological processes, the extracted models displayed high statistical quality (R2 > 0.6), while in the case of the brain fraction unbound, the models showed excellent performance (R2 > 0.9). All models were thoroughly validated, and their applicability domain was estimated. Our approach highlighted the importance of phospholipid, as well as tissue and protein, binding in balance with BBB permeability in brain disposition and suggests biomimetic chromatography as a rapid and simple technique to construct models with experimental evidence for the early evaluation of CNS drug candidates. Full article
Show Figures

Figure 1

13 pages, 3795 KiB  
Article
A Consensus Compound/Bioactivity Dataset for Data-Driven Drug Design and Chemogenomics
by Laura Isigkeit, Apirat Chaikuad and Daniel Merk
Molecules 2022, 27(8), 2513; https://0-doi-org.brum.beds.ac.uk/10.3390/molecules27082513 - 13 Apr 2022
Cited by 10 | Viewed by 3306
Abstract
Publicly available compound and bioactivity databases provide an essential basis for data-driven applications in life-science research and drug design. By analyzing several bioactivity repositories, we discovered differences in compound and target coverage advocating the combined use of data from multiple sources. Using data [...] Read more.
Publicly available compound and bioactivity databases provide an essential basis for data-driven applications in life-science research and drug design. By analyzing several bioactivity repositories, we discovered differences in compound and target coverage advocating the combined use of data from multiple sources. Using data from ChEMBL, PubChem, IUPHAR/BPS, BindingDB, and Probes & Drugs, we assembled a consensus dataset focusing on small molecules with bioactivity on human macromolecular targets. This allowed an improved coverage of compound space and targets, and an automated comparison and curation of structural and bioactivity data to reveal potentially erroneous entries and increase confidence. The consensus dataset comprised of more than 1.1 million compounds with over 10.9 million bioactivity data points with annotations on assay type and bioactivity confidence, providing a useful ensemble for computational applications in drug design and chemogenomics. Full article
Show Figures

Figure 1

15 pages, 2568 KiB  
Article
CRNNTL: Convolutional Recurrent Neural Network and Transfer Learning for QSAR Modeling in Organic Drug and Material Discovery
by Yaqin Li, Yongjin Xu and Yi Yu
Molecules 2021, 26(23), 7257; https://0-doi-org.brum.beds.ac.uk/10.3390/molecules26237257 - 30 Nov 2021
Cited by 7 | Viewed by 2460
Abstract
Molecular latent representations, derived from autoencoders (AEs), have been widely used for drug or material discovery over the past couple of years. In particular, a variety of machine learning methods based on latent representations have shown excellent performance on quantitative structure–activity relationship (QSAR) [...] Read more.
Molecular latent representations, derived from autoencoders (AEs), have been widely used for drug or material discovery over the past couple of years. In particular, a variety of machine learning methods based on latent representations have shown excellent performance on quantitative structure–activity relationship (QSAR) modeling. However, the sequence feature of them has not been considered in most cases. In addition, data scarcity is still the main obstacle for deep learning strategies, especially for bioactivity datasets. In this study, we propose the convolutional recurrent neural network and transfer learning (CRNNTL) method inspired by the applications of polyphonic sound detection and electrocardiogram classification. Our model takes advantage of both convolutional and recurrent neural networks for feature extraction, as well as the data augmentation method. According to QSAR modeling on 27 datasets, CRNNTL can outperform or compete with state-of-art methods in both drug and material properties. In addition, the performances on one isomers-based dataset indicate that its excellent performance results from the improved ability in global feature extraction when the ability of the local one is maintained. Then, the transfer learning results show that CRNNTL can overcome data scarcity when choosing relative source datasets. Finally, the high versatility of our model is shown by using different latent representations as inputs from other types of AEs. Full article
Show Figures

Figure 1

14 pages, 2856 KiB  
Article
Parsimonious Optimization of Multitask Neural Network Hyperparameters
by Cecile Valsecchi, Viviana Consonni, Roberto Todeschini, Marco Emilio Orlandi, Fabio Gosetti and Davide Ballabio
Molecules 2021, 26(23), 7254; https://0-doi-org.brum.beds.ac.uk/10.3390/molecules26237254 - 30 Nov 2021
Cited by 10 | Viewed by 2372
Abstract
Neural networks are rapidly gaining popularity in chemical modeling and Quantitative Structure–Activity Relationship (QSAR) thanks to their ability to handle multitask problems. However, outcomes of neural networks depend on the tuning of several hyperparameters, whose small variations can often strongly affect their performance. [...] Read more.
Neural networks are rapidly gaining popularity in chemical modeling and Quantitative Structure–Activity Relationship (QSAR) thanks to their ability to handle multitask problems. However, outcomes of neural networks depend on the tuning of several hyperparameters, whose small variations can often strongly affect their performance. Hence, optimization is a fundamental step in training neural networks although, in many cases, it can be very expensive from a computational point of view. In this study, we compared four of the most widely used approaches for tuning hyperparameters, namely, grid search, random search, tree-structured Parzen estimator, and genetic algorithms on three multitask QSAR datasets. We mainly focused on parsimonious optimization and thus not only on the performance of neural networks, but also the computational time that was taken into account. Furthermore, since the optimization approaches do not directly provide information about the influence of hyperparameters, we applied experimental design strategies to determine their effects on the neural network performance. We found that genetic algorithms, tree-structured Parzen estimator, and random search require on average 0.08% of the hours required by grid search; in addition, tree-structured Parzen estimator and genetic algorithms provide better results than random search. Full article
Show Figures

Graphical abstract

17 pages, 1356 KiB  
Article
MetaClass, a Comprehensive Classification System for Predicting the Occurrence of Metabolic Reactions Based on the MetaQSAR Database
by Angelica Mazzolari, Alice Scaccabarozzi, Giulio Vistoli and Alessandro Pedretti
Molecules 2021, 26(19), 5857; https://0-doi-org.brum.beds.ac.uk/10.3390/molecules26195857 - 27 Sep 2021
Cited by 4 | Viewed by 1612
Abstract
(1) Background: Machine learning algorithms are finding fruitful applications in predicting the ADME profile of new molecules, with a particular focus on metabolism predictions. However, the development of comprehensive metabolism predictors is hampered by the lack of highly accurate metabolic resources. Hence, we [...] Read more.
(1) Background: Machine learning algorithms are finding fruitful applications in predicting the ADME profile of new molecules, with a particular focus on metabolism predictions. However, the development of comprehensive metabolism predictors is hampered by the lack of highly accurate metabolic resources. Hence, we recently proposed a manually curated metabolic database (MetaQSAR), the level of accuracy of which is well suited to the development of predictive models. (2) Methods: MetaQSAR was used to extract datasets to predict the metabolic reactions subdivided into major classes, classes and subclasses. The collected datasets comprised a total of 3788 first-generation metabolic reactions. Predictive models were developed by using standard random forest algorithms and sets of physicochemical, stereo-electronic and constitutional descriptors. (3) Results: The developed models showed satisfactory performance, especially for hydrolyses and conjugations, while redox reactions were predicted with greater difficulty, which was reasonable as they depend on many complex features that are not properly encoded by the included descriptors. (4) Conclusions: The generated models allowed a precise comparison of the propensity of each metabolic reaction to be predicted and the factors affecting their predictability were discussed in detail. Overall, the study led to the development of a freely downloadable global predictor, MetaClass, which correctly predicts 80% of the reported reactions, as assessed by an explorative validation analysis on an external dataset, with an overall MCC = 0.44. Full article
Show Figures

Figure 1

13 pages, 1498 KiB  
Article
MetaTREE, a Novel Database Focused on Metabolic Trees, Predicts an Important Detoxification Mechanism: The Glutathione Conjugation
by Angelica Mazzolari, Luca Sommaruga, Alessandro Pedretti and Giulio Vistoli
Molecules 2021, 26(7), 2098; https://0-doi-org.brum.beds.ac.uk/10.3390/molecules26072098 - 06 Apr 2021
Cited by 2 | Viewed by 2307
Abstract
(1) Background: Data accuracy plays a key role in determining the model performances and the field of metabolism prediction suffers from the lack of truly reliable data. To enhance the accuracy of metabolic data, we recently proposed a manually curated database collected by [...] Read more.
(1) Background: Data accuracy plays a key role in determining the model performances and the field of metabolism prediction suffers from the lack of truly reliable data. To enhance the accuracy of metabolic data, we recently proposed a manually curated database collected by a meta-analysis of the specialized literature (MetaQSAR). Here we aim to further increase data accuracy by focusing on publications reporting exhaustive metabolic trees. This selection should indeed reduce the number of false negative data. (2) Methods: A new metabolic database (MetaTREE) was thus collected and utilized to extract a dataset for metabolic data concerning glutathione conjugation (MT-dataset). After proper pre-processing, this dataset, along with the corresponding dataset extracted from MetaQSAR (MQ-dataset), was utilized to develop binary classification models using a random forest algorithm. (3) Results: The comparison of the models generated by the two collected datasets reveals the better performances reached by the MT-dataset (MCC raised from 0.63 to 0.67, sensitivity from 0.56 to 0.58). The analysis of the applicability domain also confirms that the model based on the MT-dataset shows a more robust predictive power with a larger applicability domain. (4) Conclusions: These results confirm that focusing on metabolic trees represents a convenient approach to increase data accuracy by reducing the false negative cases. The encouraging performances shown by the models developed by the MT-dataset invites to use of MetaTREE for predictive studies in the field of xenobiotic metabolism. Full article
Show Figures

Figure 1

Back to TopTop