A custom-made library was then compiled from the systematic review-derived records. Such a library gathered 836 compounds, containing 90.3% synthetic compounds and 9.7% natural products. Activity and structural details (as EC50
in µM and Simplified Molecular-Input Line-Entry System (SMILES) codes, respectively) of these compounds can be found in Supplementary Table S2
. The chemical space of the whole compound set was firstly examined in order to perform a structural filtering and qualitative characterization according to structural fragments/scaffolds. Therefore, a preliminary structural similarity analysis between test compounds was performed using the FragFp descriptor available in DataWarrior [79
]. Compound 431
(having a linked thiophen-epoxybenzo[b
]azepine moiety) was arbitrarily selected as a reference compound to calculate such a descriptor. The resulting similarity chart is presented in Figure 5
. After such an analysis, some clusters were revealed with FragFp values, according to the heatmap based on the structure similarity index, between 0 (red = very different) and 1 (green = very similar). Such a scale indicated that our custom-made library gathered compounds related to 53 different FragFp-derived main clusters, which comprise small subsets (covering 6–23 compounds) of very structurally similar compounds depending on compound origin (e.g., the synthetic approach or natural source used in the respective study). Hence, several attempts to study/model a statistically validated structure–activity relationship were unsuccessful. However, this structure similarity analysis clustered the test compounds into several classes with FragFp ≥ 0.4, indicating that the library involves particular scaffolds that deserve a more robust analysis.
Regarding the antileishmanial activity of retrieved compounds (Table 2
), Figure 6
A shows the general distribution of activity values for pure compounds expressed as negative decadic logarithm of the half-maximal effective concentration in mol/L(pEC50
). As can be seen, despite the relatively large number of publications included, most of the tested compounds from these Colombian campaigns are rather poorly active. Only a limited fraction (18%) of the compounds displays EC50
values in the range 1–10 μM (5 < pEC50
< 6), whereas the number of substances with activity in the sub-micromolar range (pEC50
≥ 6) is below 5%. This observation let us infer that the efforts made so far to find antileishmanial agents in Colombia should be considerably strengthened, if actual positive results are wanted. Otherwise, the studies will remain belonging to exploratory-oriented basic science rather than needed/applied medicinal and natural product chemistry projects.
As mentioned before, several Leishmania
species have been included within the considered reports. Analysis of the activity of the studied compounds by Leishmania
species (Figure 6
B) reinforces the previously discussed necessity of further and more efficiently driven projects (as observed, most of the boxes appeared in a range of pEC50
between 4 and 6). Interestingly, there are some compounds with remarkable activity against intracellular amastigotes of L. major
. Similarly, some compounds have displayed exceptionally high activity against intracellular amastigotes of L. donovani
and L. panamensis
(mainly found within the whiskers zone of the corresponding distribution). Following the well-recognized criteria for hit compounds for infectious diseases (including VL) [80
], solely those compounds with EC50
below 10 μM could be considered as hits. Nonetheless, lack of information regarding structure–activity relationships, tractability of the chemotype, conformity with the rule of five and selectivity (>10-fold) may be considered as the major issue to define them as truly hit compounds.
2.3.1. Chemical Space of Selected Compounds
The elevated number of papers determining activity against L. panamensis might be seen as directly related to the high incidence of reported cases of infection by this species in Colombia, as mentioned earlier. Thus, we decided to focus our attention on the compounds described therein, looking for possibly relevant structural information that could be used as a first-line tool for further investigations. The dataset was therefore filtered, keeping only entries for pure compounds that were tested against L. panamensis. More specifically, and in order to reach some degree of comparability, exclusively those entries with activity determinations on intracellular amastigotes were used in the subsequent analyses. This form of the parasite/model was preferred over axenic amastigotes and promastigotes within the publications (see above), supporting our decision for deeper examination. A total of 484 compounds were considered for further studies, comprising 9.1% natural products and 90.9% synthetic compounds.
Structural information of the resulting group of compounds was extracted by using the Molecular ACCess System (MACCS) keys (167 bits) [81
] and the Morgan fingerprints (radius = 2, 1024 bits) [82
] implemented in RDKit [83
]. These two sets of general molecular fingerprints were used aiming at a general glance of the corresponding chemical space. Owing to the typically high performance of t-distributed Stochastic Neighbor Embedding (t-SNE) as a dimensionality reduction method [84
], we decided to look for possible compound clustering using such an algorithm and the fingerprints as independent inputs (Figure 7
A). Although not directly comparable, both fingerprints offered some degree of clustering after t-SNE. Principal component analysis (PCA) was not able to represent clusters of similar compounds in a simple representation (data not shown). The partial generation of clusters of compounds by t-SNE would indicate actual structural relations among them, at least to some extent, as described by each fingerprint, i.e., some structural features seem to appear reiteratively within some groups of compounds. However, the limited clustering (high dispersion) also indicates a quite significant structural diversity within the whole group of compounds (wide chemical space). Hierarchical clustering analysis (HCA) proved to be consistent with the spatial distribution provided by t-SNE (see Supplementary Figure S2
After exhaustive analysis of the clusters obtained by HCA and extraction of the corresponding maximum common substructure (MCS), only three of the clusters were completely coincident between fingerprints, whereas a fourth was partly in agreement (the MCS from the cluster using Morgan fingerprint was part of a larger substructure found when using MACCS keys). This result highlights the well-known differences among fingerprints, which would translate to changes in outcomes coming from direct comparisons of fingerprints. The four common clusters are depicted in Figure 7
A by different colors.
Representative compounds from each cluster are also included in Figure 7
B, whose MCSs are highlighted. As expected, the relative location of each common cluster in the scatter plots is different. However, it is noteworthy to mention that three of them are quite well separated from the others, suggesting very particular features compared to the rest of the compounds. Interestingly, the compounds in the cluster in red were not particularly separated from other compounds compared to those previously mentioned. The seemingly marked lack of resolution of this particular cluster (especially when using Morgan fingerprints) might be due to high structural diversity of its compounds, whereupon the fingerprint features would be rather strongly shared (overlapped), ending up with many common bits with other clusters.
Looking for insights into the possible effect of structural diversity on the antileishmanial activity, the t-SNE-derived scatter plots were colored by activity threshold (actives in green, pEC50
≥ 5.0; Figure 7
C). It is evident that the most active compounds are not concentrated in any specific cluster, i.e., none of the scaffolds so far analyzed in Colombian studies are clearly favored over the others. Particularly, the cluster in red (Figure 7
A) is mainly constituted by poorly active compounds (red in Figure 7
C), while it is difficult to establish the potential of compounds in the cluster in green (Figure 7
A) owing to the lack of EC50
determinations for some of them (empty circles in Figure 7
2.3.2. Machine Learning
After filtering off entries whose EC50 determinations were not available (e.g., only biological determination of percentage of inhibition at specific compound concentrations reported), a final set of 428 compounds was obtained. Owing to the inherent structural similarities among some compounds but also the huge differences in other cases (as shown above), and to the restricted capacity of the linear algorithms to provide reliable models (as mentioned before), machine learning was selected as a tool to analyze this dataset. Two different extensively used supervised learning algorithms were chosen to accomplish the task: random forest (RF) and support vector machines (SVM). Both MACCS and Morgan fingerprints were independently used for building the models. Preliminary evaluation of the classification variants of the selected algorithms showed decent performance (data not shown) and encouraged us to use actual activity values rather than an arbitrarily defined categorical dependent variable. Having decided for regression models, a coarse-to-fine scheme was followed for the optimization of the corresponding hyperparameters. In case of RF models, the number of trees in the forest, the minimum number of samples required to be at a leaf node, the minimum number of samples required to split an internal node, the maximum number of features to consider for the best split and the number of samples to draw from the training set during bootstrap were considered for optimization. The dataset was randomly split into training and test set (80:20%), ensuring maximum coverage of activity range for the latter. For both fingerprints, models offered maximum performance using 1 sample as a minimum to be at a leaf node and the total number of features as maximum. Those models were named M1 and M2, for MACCS and Morgan, respectively. While M1 used 306 trees, 7 samples to split a node and 75% of samples drawn during bootstrap, M2 required 127 trees, 2 samples and 94% of the samples, respectively.
In case of SVM models (M3 and M4 for MACCS and Morgan, respectively), the optimization was performed considering variations in the kernel functions, the kernel coefficient, the epsilon-tube and the regularization parameter C. The optimized models made use of the Radial Basis Function (RBF) kernel. The best performance for M3 was achieved with epsilon = 0.1, C = 2.5 and gamma = 0.04. M4 performed better when it used the combination of hyperparameters epsilon = 0.08, C = 2.8 and gamma = 0.05.
All the models were trained and tested for predictivity using ten repetitions. Table 3
shows the corresponding validation parameters as a mean of the ten runs. As can be seen, the generated models offered barely acceptable cross-validation (CV) scores, with limited prediction power. Nonetheless, M1–M4 outperformed classical linear models. The limited robustness for M1–M4 was not less than expected coming from such a diverse dataset. It is impossible to properly ensure comparability of activity data due to presumable changes in the specific procedures, despite using the same parasite forms/models (i.e., not all the compounds were experimentally tested at the same time and under the same exact conditions or not even in the same laboratory). Moreover, we did not take into account the implicit data error (which is sometimes not adequately informed, either), whose impact on computational modeling was already highlighted long ago [85
]. It must be noted however that the data error is still not included in most of the Quantitative Structure-Activity Relationships/Quantitative Structure-Property Relationships (QSAR/QSPR) studies published in scientific journals. Regardless, both algorithms provided comparable results in terms of internal and external validation (Table 3
Although RF using Morgan (M2) fingerprints displayed significantly higher R2
than that from MACCS (M1) during training, both internal and external validation coefficients were indistinguishable between models. In the case of SVM, the use of Morgan fingerprints (M4) demonstrated better predictability of the external set of compounds, albeit comparably low performance during CV. Beyond the isolated statistical values, and in spite of their closely related performance, M2 afforded the lowest deviations in predicted antileishmanial activity, represented by the lowest dispersion of data points around the regression line between experimental and predicted values (Supplementary Figure S3
; all the corresponding activity predictions are included in Supplementary Table S3
With the limited but still acceptable capacity of the obtained models, we were interested in deciphering the governing structure–activity relationships behind them. Although the machine learning algorithms are typically known for their black box nature, recent advances have been made in order to extract information regarding feature importance, like the use of the SHAP (SHapley Additive exPlanations) theory and the derived Shapley values [86
]. Taken from game modeling, the SHAP theory helps to explain the contribution each single feature has on the outcome obtained. Implementation of this theory to gain detailed information from machine learning models has already been shown for drug discovery projects [88
], making it possible to define the most important fingerprint bits contributing to the variance in activity. We applied the SHAP theory to the optimized models. In the case of RF models (M2 and M4), the Gini importance was also analyzed. The results are shown in Figure 8
. There was an overall agreement between Gini and SHAP values for both M1 and M2, e.g., features 99 and 125 were consistently the top two in M1 (Figure 8
A,C), whereas for M2 features, 1 and 259 appeared the most relevant by both methods (Figure 8
B,D). High correlation between Gini and SHAP values have already been observed and reported [88
]. Interestingly, the observed profound effect of feature 99 on the prediction of activity also prevailed in M3 (Figure 8
E), suggesting some similarity between algorithms. The SHAP values also allowed inferring a significant effect of features 125 and 95 on M3 predictions, which were within the top ten features affecting M1 as well. This marked coincidence of features would indicate that both algorithms were able to identify basically the same structural features (held by the MACCS fingerprints) as responsible for the variance in activity. The SHAP values were also in agreement with the Gini importance values for M2, e.g., features 1, 259, 352 and 547 were ranked as the top four in both cases (Figure 8
B,D). However, analysis of the corresponding SHAP values for M4 revealed a completely different distribution of features affecting the outcome of the model (Figure 8
F). Only features 352 and 547 remained as part of the top ten, although with less importance. Being affected by several features at comparable costs implied that there was not any specific feature with a clear strong impact on M4 predictions, as it was observed above for the other models.
Detailed analysis of the definition of the MACCS keys with higher SHAP values revealed that both M1 and M3 relied on similar structural patterns overall. Features 99 and 125, found in both cases, and 162 and 101, being exclusive for each model respectively, are related to the presence of C = C and aromatic rings. For M1, feature 114, which represents the presence of ethyl units bound to any atom, was also important, while the presence of methyl groups bound to heteroatoms was relevant for M3 (feature 93). Particularly interesting was the fact that M3 predictions were affected by the number of oxygen atoms in the molecule (feature 140 for O > 3). In contrast, the presence of chlorine atoms (feature 103) was important for M1.
Analyzing the individual contributions of each feature to the general outcome for M2 showed that the presence of feature 1 in the compounds was deleterious for the activity (Figure 9
A). A similar general result was observed for features 352 and 751, although at considerably lesser extent. In contrast, the presence of feature 259 significantly favored the predicted activity values. Features 547, 561 and 1017 are other examples of features positively contributing to the activity. A comparable analysis in case of M4 was not straightforward due to the already mentioned high number of features responsible for the activity. However, absence of features 352 and 984 seemed beneficial for the activity, whereas features like 887 and 835 appeared to play a positive role (Figure 9
A more comprehensive analysis of the underlying structure–activity relationships for the compounds in the present dataset is not practically achievable because of the strong structural differences among compounds. Nevertheless, several additional insights could be retrieved from in-depth exploration of the individual Shapley values. Thus, taking advantage of the likelihood of drawing Morgan fingerprints offered by RDKit, representative compounds with low (compound 190
), intermediate (compound 586
) and high (compound 164
) antileishmanial activity were further studied. Model M2 was chosen for this analysis based on its apparently low deviation in predictions and clearly outlined important features. Figure 8
C–E shows the corresponding force plots for those compounds. It can be observed how the activity of the inactive compound (190
, Figure 9
C) is strengthened by the presence of feature 394, while features 61, 456 and 314 could be responsible for the low value as they are pushing it down. Surprisingly, only the latter feature is part of the top ten features affecting the general outcome of the model. On the other hand, the activity of compound 586
D) was apparently caused by the presence of features 73 and 55. Moreover, absence of feature 1 significantly contributed to the activity of this specific compound, too, being the most important feature for M2, as previously noted. In the case of the active compound (164
, Figure 9
E), the activity was dominated by the presence of several features, including 109, 104, 678, 619, 547 and 259. To make predictions, the model predominantly used most of those features. In addition, the presence of feature 1 in this compound decreased the predicted value, as expected from the general trend observed. Features 109, 547 and 678 correspond to the 5,6-dihydro-2H
-pyran-2-one moiety (Figure 9
E). Meanwhile, features 259 and 619 are related to the aliphatic chains in vicinity of the hydroxyl groups. Particularly, feature 1 in this compound structure refers to the hydroxylated chiral carbons. Presumably, M2 might have learned some effects on activity due to chirality of those stereocenters.
2.3.3. Drug-Likeness Filtering
In order to get an idea of some interesting scaffolds to be considered in future investigations, further analyses were carried out on the group of active compounds. As a first step, their drug-likeness was partially assessed checking for the presence of undesirable moieties according to the filters for Pan-Assay INterference compounds (PAINs) implemented in the FAFDrug4 web server [90
]. Despite that 85% of the active compounds passed the three filters available in the server, more than 60% of them might still be considered as potentially reactive substances containing groups susceptible to covalent binding (e.g., 23% of the active compounds contain Michael acceptor groups). Only a reduced set of twenty-eight compounds with confirmed activity on intracellular amastigotes passed the mentioned filters. However, more than half of these compounds (57%) showed compliance with the rule of five (Ro5) as well (one violation of the Ro5 was mainly found for the rest).
On the other hand, selectivity index (SI), defined as the ratio between cytotoxicity and antileishmanial activity, was considered for the last filtering step. This process revealed that whereas sixteen compounds (57%) showed SI > 2, only three of them (11%) displayed actual interesting selectivity values as to be considered for further development (Table 4
). Figure 10
shows some of the best candidates after the aforementioned filtering.
Interestingly, a similarity search in SciFinder revealed that the scaffolds comprised by the above-mentioned compounds (Figure 10
) have been rather uniquely considered as antileishmanial agents in Colombian research projects. Thus, in spite of the somewhat common nature of most of those scaffolds, their specific combinations have not yet become part of other studies focused on antileishmanial substances. No additional reports were found for the combination of chloroquine and pyrazole scaffolds nor for the combination of indolinone and tetrahydroquinoline scaffolds (as in compounds 511
, respectively). Similarly, no further studies on leishmanicidal sulfonylhydrazides of beyerene- or stevioside-like diterpenes (e.g., 84
) have been published. On the other hand, compound 191
represents a group of substances (styrylquinolines) quite commonly included in medicinal chemistry projects. Nonetheless, studies on their potential as antitrypanosomatid agents has been limited as well. To the best of our knowledge, there is only one recent study focused on the leishmanicidal properties of a group of related compounds (4-aminostyrylquinolines) [91
]. Decent activity against amastigotes of L. pifanoi
and moderate selectivity indexes were therein reported. Additionally, assessment of the antileishmanial potential of alkenylquinolines was previously reported [92
]. In this case, rather poorly active compounds were evinced, limiting the possible identification of interesting candidates. Seemingly, most of those compounds showed better antitrypanosomal activity.