3.2. Analysis of the Testing Set Results
Table 3 shows the top
t (
t = 10, 15, 20, and 30) predictions and the two evaluation indexes corresponding to the testing set results. It can be seen from
Table 3 that when 3 protein–protein interaction interfaces of each protein trimer were correctly predicted, a total of 9 trimer protein complexes were correctly predicted in the top 10 predictions. The prediction result of 2IY0 protein trimer was the best, with up to 10 positive interface residue pairs. When 3 protein–protein interaction interfaces of each protein trimer are correctly predicted, a total of 17 trimer protein complexes were correctly predicted in the top 30 prediction results. Among them, there were 10 trimer protein complexes for which at least 10 positive interface residue pairs were predicted.
To further analyze the prediction results, we obtained the index
from all
columns in
Table 3 (see
Table 4). When 3 protein–protein interaction interfaces of each protein trimer were correctly predicted in the top 15 predictions, the accuracy rate was 42.31%, i.e., more than 2/5 of trimer protein complexes in the testing set were correctly predicted. When 3 protein–protein interaction interfaces of each protein trimer were correctly predicted in the top 30 predictions, the accuracy rate was as high as 65.38%. When at least 2 protein–protein interaction interfaces of each protein trimer are correctly predicted in the top 10 predictions, the accuracy rate was 53.85%, i.e., more than half trimer protein complexes in the testing set were correctly predicted. When at least 1 protein–protein interaction interface of each protein trimer was correctly predicted, the accuracy rate was 76.92% in the top 10 predictions and up to 92.31% in the top 30 predictions.
There are 6479 pairs of interface residue pairs in the test set, of which 968 pairs are formed by residues at N- and C-terminal regions, accounting for 15% (here, residues at the N- and C-terminal regions is the residues that we have specially treated in the manuscript). We can accurately predict 190 interface residue pairs for all testing set protein trimers, of which 48 pairs are formed by residues at N-terminal and C-terminal, accounting for 25%.
We compared the performance of our method with the previous method [
16]. When at least 1 protein–protein interaction interface of each protein trimer was correctly predicted, the accuracy of our method is 76.92% and of the previous method [
16] is 31.1% in the top 10 predictions. The accuracy of our method is higher than them.
The analysis of the above results showed that our proposed method was able to accurately predict the interface residue pairs of trimer protein complexes. Additionally, our predicted results are consistent with the experimental results. In the experimental article of the 3ffd protein trimer [
36], it is mentioned that residues 20 and 24 are strictly conserved, which allows for extensive interactions with the antibody. Residues 16, 20, 27, 52, 59, 97, 102, and 104 are also binding sites. In our top 20 prediction results, we successfully predicted 8 positive interface residue pairs. For clarity, 6 positive interface residue pairs (Tyr 104-Phe 102, Tyr 104-Tyr 104, Tyr 104-Phe 23, Tyr 59-Phe 23, Phe 102-Phe 23, and Thr 32-Gln 16) for the 3ffd protein trimer are shown in
Figure 3a. In the experimental article of the 1s7o protein trimer [
37], it is pointed out that the 1s7o protein trimer has two structural domains and the primary interaction mainly involves the second central domain. The hydrophobic residues Ile 85, Phe 86, Met 89, Ile 90, Leu 99, Ile 103, and Leu 106 create both an intermolecular and intramolecular hydrophobic core in the second domain. Arg82 and Asp 110 form salt bridges, and two Arg82 guanidyl groups in adjacent molecules contribute to the intramolecular and intermolecular interactions. In our top 20 prediction results, we have successfully predicted 11 positive interface residues pairs formed by these residues and their surrounding residues. For illustration purposes, we show 6 positive interface residue pairs (Ile105-Ile109, Glu101-Ile 109, Ile 105-Ile 85, Glu 101-Val 81, Leu 106-Ile 85, and Ile 85-leu 106) in
Figure 3b.
The training set contains a lot of antibody fragments, which make up two of the three chains: 1BGX, 3O2D, 3R1G. 3GI9, 1JPS, 1JRH, 1FNS. Similarly, 1F6F, 1EER, 1HWG, and 3VA2 are all cytokine receptor complexes with probable similarity between the receptor CRH domains. We deleted 1BGX, 3O2D, 3R1G. 3GI9, 1JPS, 1JRH, 1FNS 1F6F, 1EER, 1HWG, 3VA2 in the training set. The test set also contains 3 complexes with antibody chains: 3FFD, 1OSP, and 1SY6. We generated testing set 2, which deleted 3FFD, 1OSP, and 1SY6 relative to testing set.
Appendix A Table A3 shows the top
t (
t = 15, 20, and 30) predictions and the two evaluation indexes corresponding to the testing set 2 results. We compared the prediction results of testing set with that of testing set 2 (
Table 5). When at least 2 protein–protein interaction interfaces of each protein trimer are correctly predicted, the accuracy of testing set 2 is about 7% lower than that of testing set. When at least 3 protein–protein interaction interfaces of each protein trimer are correctly predicted in the top 30 predictions, the accuracy of testing set 2 is about 8.5% lower than that of testing set. When at least 3 protein–protein interaction interfaces of each protein trimer are correctly predicted in the top 20 predictions, the accuracy of testing set 2 was 6% higher than that of test set. The rest of the prediction results of the two test sets are almost the same.
3.3. Comparison with Random Results
We assume that the stochastic prediction of interface residue pairs of each protein–protein interaction interface in trimer protein complexes obeys a hypergeometric distribution
[
38]; where
X is the number of positive interface residue pairs in the top
T predictions.
is the number of all the residue pairs of one protein–protein interaction interface in one protein trimer.
is the number of positive interface residue pairs in this protein–protein interaction interface. Next, we can calculate the probability P that there are
positive interface residue pairs in the
T predictions of one protein–protein interaction interface by the stochastic model (see Formula (14)):
In order to simplify the calculation, we assumed that each protein–protein interaction interface was independently identically distributed, and
is the mean value of all residue pairs in each protein–protein interaction interface, and
is the mean value of positive interface residue pairs in each protein–protein interaction interface. It can be seen that
is about 40,920 and
is about 83 in the
Appendix A Table A4 When at least 1 protein–protein interaction interface of each protein trimer has at least one positive interface residue pair in
T predictions, the probability
is:
Consideration of the complexity of the
calculation, we have made an enlarged calculation of
(see inequality 16 and 17). Obviously, the computational complexity of the
is less than
, and when
T is fixed,
is less than
. When the value of
T is 10, 15, 20, and 30, we can calculate
through the Monte Carlo simulation method (see
Table 6).
When at least 2 protein–protein interaction interfaces of each protein trimer have at least one positive interface residue pair in
T predictions, the probability
is:
Combining Formulas (15) and (18), we also enlarge
and obtained inequality 20. Obviously, the computational complexity of the
is much less than
, and when
T is fixed,
is less than
. When the value of
T is 10, 15, 20, and 30, we can calculate
through the Monte Carlo simulation method (see
Table 6).
When 3 protein–protein interaction interfaces of each protein trimer have at least one positive interface residue pair in
T predictions, the probability
is:
In the same way as above, we also enlarged
and obtained
(see Inequality (22) and (23)). When the value of
T was 10, 15, 20, and 30, we calculated
through the Monte Carlo simulation method (see
Table 6).
As can be seen from
Table 5, the accuracy of our method to predict the interface residue pairs of trimer protein complexes is much higher than that of random results. When at least 1 protein–protein interaction interface of each protein trimer was correctly predicted, our method accuracy was over 76.92% and up to 92.31%, while the random accuracy was lower than 0.20298%. When at least 2 protein–protein interaction interfaces of each protein trimer were correctly predicted, our accuracy was over 53.85% and up to 84.62%, whereas the random accuracy was below 0.0015%. When 3 protein–protein interaction interfaces of each protein trimer were correctly predicted, our accuracy achieved 65.38% in the top 30 predictions, and the accuracy was more than 108 times higher than that of random results.