Next Article in Journal
In Vitro Digestion and Fermentation by Human Fecal Microbiota of Polysaccharides from Flaxseed
Previous Article in Journal
Inhibitory Activity of Quercetin 3-O-Arabinofuranoside and 2-Oxopomolic Acid Derived from Malus domestica on Soluble Epoxide Hydrolase
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Two-Layer SVM Ensemble-Classifier to Predict Interface Residue Pairs of Protein Trimers

1
Mathematical Intelligence Application Lab, Institute for Mathematical Sciences, School of Math, Renmin University of China, Beijing 100872, China
2
Beijing Advanced Innovation Centre for Structural Biology, Tsinghua University, Beijing 100084, China
*
Author to whom correspondence should be addressed.
Submission received: 11 August 2020 / Revised: 16 September 2020 / Accepted: 18 September 2020 / Published: 23 September 2020

Abstract

:
Study of interface residue pairs is important for understanding the interactions between monomers inside a trimer protein–protein complex. We developed a two-layer support vector machine (SVM) ensemble-classifier that considers physicochemical and geometric properties of amino acids and the influence of surrounding amino acids. Different descriptors and different combinations may give different prediction results. We propose feature combination engineering based on correlation coefficients and F-values. The accuracy of our method is 65.38% in independent test set, indicating biological significance. Our predictions are consistent with the experimental results. It shows the effectiveness and reliability of our method to predict interface residue pairs of protein trimers.

1. Introduction

Many protein complexes are formed by the interactions of multiple protein monomers. These complexes can carry out many biological functions, such as gene expression and regulation, signal transduction, or enzyme catalytic mechanisms [1]. Understanding the mechanisms of protein–polymer interactions can provide useful information for the design of protein polymer structures, protein functional annotation, and drug design [2]. The accurate prediction of interface residue pairs in polymeric proteins is an important part in the study of protein polymer interactions. Various experimental methods have been used in the research of protein polymers, such as X-ray crystallography and nucleic magnetic resonance. It is impossible and unrealistic to find the interface residue pairs for all protein polymers by an experimental method. Therefore, the prediction of protein polymer interface residue pairs has become an important question in bioinformatics.
An increasing number of computational methods have been developed to predict protein monomer binding sites and protein–protein interface residue pairs. In relation to this, Segura et al. [3] used a two-step Random Forest classifier to predict protein binding sites. Hwang et al. [4] proposed an index of Residue Contact Frequency to predict the protein monomer binding sites. Lyu et al. [5] defined an index of contact frequency from the protein–protein docking result to accurately predict protein–protein interface residue pairs. Ovchinnikov predicted residue–residue interactions across protein interfaces using evolutionary information [6]. There are many other methods that are not described here [7,8,9,10,11,12,13,14,15].
A higher number of protein monomers implies more complex interaction mechanisms of the protein polymer. At present, there are a few methods to predict interface residue pairs of protein trimer. Zhao et al. [16] took the sequence feature as input in multilayered Long Short-Term Memory networks to predict interface residue pairs of protein trimer. In this paper, we want to develop a new and effective method for prediction of interface residue pairs in protein trimers.
Properties of protein sequences depend on the composition and distribution of amino acids. The position and type of each amino acid in a protein sequence are unique, containing important structural and functional information. Therefore, for a specific position amino acid in the protein sequence, ideal descriptors should not only reflect the amino acid type but also structural information or the role played in the performance of the protein function. Based on previous studies on protein monomer binding sites and protein–protein interface residue pairs, we found some common properties that can allow to distinguish interface residues from the rest of the protein, using information of hydrophobicity, polarizability, solvent accessibility, and so on. We summarized these properties and divided them into two categories. The first category corresponded to amino acid physicochemical properties, including hydrophobicity [17,18,19], polarizability [20], and polarity [21]. The second category corresponded to residue geometric properties, such as accessible surface area (ASA) [22] and relative accessible surface area (RASA) [23].
In this paper, we defined an amino acid k-interval product factor to describe the influence of surrounding amino acids based on their physicochemical properties. Hence, we described a residue pair with three types of characteristics: amino acid physicochemical features, residue geometric features, and amino acid k-interval product factor. Different descriptors and different combinations may give different prediction results. We performed feature combination engineering based on correlation coefficients and F-values for all characteristics. In general, when the number of positive and negative samples in the dataset is seriously unbalanced, the accuracy and robustness of ensemble classifiers is higher than those of a single classifier (negative samples: noninterface residue pairs and positive samples: interface residue pairs). We trained a two-layer support vector machine (SVM) ensemble-classifier method to predict the interface residue pairs of protein trimers and tested it using an independent testing set. We also use different indicators to evaluate testing set results, which proves that our method is feasible.
In summary, our method was divided into four parts: feature extraction, feature vector engineering, generation of a two-layer SVM ensemble-classifier, and performance evaluation, as shown in Figure 1.

2. Materials and Methods

2.1. Dataset

In this paper, the dataset was collected from the Protein Data Bank based on following four requirements: the number of chains is 3, the length of each chain is between 20 and 500, it is obtained by X-ray experiment, and there are physical bindings between each two chains in one protein trimer. Two chains are defined as interactors if there are interface residue pairs between the two chains. (If the contact area between any two atoms from two residues is bigger than zero, we called these two residues in contact and these two residues are called an interface residue pair. Here, we used the Qcontacts software to calculate contact area between two atoms.) By this way, we collect 78 protein trimers (The data can be downloaded from Supplementary Materials). We randomly divided 78 protein trimers into training set and testing set, of which the number of training set is 52, accounting for 2/3 of the total, and the remaining 26 protein trimers are used as testing set (see Table 1).

2.2. Features Extraction

2.2.1. The Amino Acid Physicochemical Features

A given protein sequence of length L defined on the base set Ω = {A, C, D, E, F, G, H, I, K, L, M, N, P, Q, R, S, T, V, W, Y} is expressed by
P = P1P2P3…PL.
Different amino acids in Formula (1) have different physicochemical properties. These physicochemical properties, including hydrophobicity [17,18,19] and polarizability [20], play important roles in multiple protein monomer interactions. In this paper, we considered five physicochemical properties of amino acids: hydrophobicity [17,18,19], polarizability [20], polarity [21], secondary structure, and codon diversity [24]. Three versions of hydrophobicity were proposed by Tanford Charles, Jack Kyte, and David Eisenberg in 1962, 1982, and 1984, respectively. Other previous works, [24,25] have used the hydrophobicity value of Tanford [17] as a feature to identify protein–protein interactions. Some other authors [15,26] used the last two versions of the hydrophobicity value [18,19] as features to predict residue pairs in a protein–protein interface. Their prediction results were good, so we took three versions of the hydrophobicity index in our method. Appendix A Table A1 shows the numerical values of the five physicochemical properties for the 20 amino acids.
According to the corresponding properties of each amino acid, the protein sequence P can be converted into seven different number sequences (see Formula (2)). We used Φ1, Φ2, Φ3, Φ4, Φ5, Φ6, and Φ7 to represent the seven numerical sequences. These seven numerical sequences are the hydrophobicity number sequence 1, polarizability number sequence, polarity number sequence, secondary structure number sequence, codon diversity number sequence, hydrophobicity number sequence 2, and hydrophobicity number sequence 3.
P   =   { Φ 1 1 Φ 2 1 Φ L 1 Φ 1 2 Φ 2 2 Φ L 2 Φ 1 3 Φ 2 3 Φ L 3 Φ 1 4 Φ 2 4 Φ L 4 Φ 1 5 Φ 2 5 Φ L 5 Φ 1 6 Φ 2 6 Φ L 6 Φ 1 7 Φ 2 7 Φ L 7
where Φ 1 1 is the hydrophobicity value of P1 in formula 1,   Φ 2 1 is the hydrophobicity value of P2 in formula 1, and so on. Φ 1 2 is the polarizability value of P1 in formula 1,   Φ 2 1 is the polarizability value of P2 in formula 1, and so on.

2.2.2. Definition of the Amino Acid K-Interval Product Factor (AAIPF(k))

In multiple protein monomers interactions, the individual behavior of the amino acid at each position is affected by the neighboring amino acids in the protein sequence. We define the amino acid k-interval product factor to describe the influence of neighboring amino acids on a given residue.
The AAIPF(k) is defined as follows: the numbers at two positions with interval k are multiplied and divided by k on the amino acid number sequence. The AAIPF(k) can be divided into: amino acid forward k-interval product factor (AAFIPF(k)) and amino acid backwards k-interval product factor (AABIPF(k)) (see Formulas (3)–(5)).
AAIPF ( k )   =   { A A F I P F ( k ) A A B I P F ( k )
AAFIPF ( k )   =   ( Φ j i Φ j k i ) / k
AABIPF ( k )   =   ( Φ j i Φ j + k i ) / k .
When exploring the individual behavior of each amino acid in a protein sequence P, as previously reported [27], we regard the protein sequence P as a cycle alphabet sequence with head-to-tail connections, and thus number sequences can also be regarded as cycle number sequences.
Considering the dimensionality of descriptors, and using the experience of previous works [28,29], we only used AAIPF(1), AAIPF(2), AAIPF(3), AAIPF(4), and AAIPF(5) to characterize each amino acid in the protein sequence P. In this way, each amino acid in the protein sequence P could be represented by the 10-dimensional characteristics of each numerical sequence. To reduce the redundancy between features, we only use the first five numerical sequences in formula 2 to calculate AAIPF(k). Thus, each amino acid in protein sequence P could be characterized by 50 features.
We also used as features the values of five physicochemical properties of amino acids, which we called basic first-order sequence features. Three versions of the hydrophobic values were used. Therefore, seven basic first-order sequence features were applied to describe an amino acid.
Considering the electrostatic interaction is also one of the important factors to stabilize the protein structure. Therefore, we used the electric property values (pK1 and pK2) as features to describe the residue. The values of pK1 and pK2 were calculated by the propka3.1 software [30].

2.2.3. Residue Geometric Features

In several previous research studies [31,32,33], it has been found that accessible surface area (ASA) and relative solvent accessible surface area (RASA) play important roles in distinguishing between interface residues and noninterface residues. In addition to the above two geometric features, we also use three residue geometric features, exterior contact area (ECA), interior contact area (ICA), and exterior void area (EVA) extracted by our laboratory to describe each residue. These five geometric features were considered the basic structural geometric features. These five geometric features and their calculation tools are shown in Table 2.
ASA is the surface area of molecules that is accessible to solvents. Here, we used the Naccess V2.1.1 software [34] to calculate ASA. The RASA was used to describe the exposed or buried state of the residue and was calculated by formula 6. The specific definition of ECA, ICA, and EVA is given in detail elsewhere [26]. ECA and ICA were calculated by the Qcontacts software [35].
R A S A = A S A b o u n d A S A u n b o u n d
where A S A b o u n d represents the solvent accessible surface area of the residue in the protein complex. A S A u n b o u n d is the solvent-accessible surface area of this residue in unbound state.
Through the above research and analysis, we extracted a total of 64 dimensional characteristics to describe each residue in the protein monomer. Therefore, we can use 128 dimensional characteristics to describe a residue pair formed by residues from two protein monomers. We used Formula (7) to standardize these 128 dimensional characteristics.
x = x m i n m a x m i n

2.3. Feature Vector Engineering

Different characteristics and their combinations may play different roles in the prediction of interactions interface residues pairs in multiple protein monomers. Therefore, we performed feature vector engineering using two sets of feature vectors to describe a residue pair. We used the 128 dimensional characteristics extracted as the first set of feature vectors. Considering that some of the 128 dimensional characteristics have a strong correlation, we deleted 1/4 of the characteristics, and used the remaining characteristics as the second set of feature vectors. We filtered the characteristics according to the Pearson correlation coefficients r and F-values. The specific process was as follows:
In the first step, 128 dimensional characteristics were clustered according to the Pearson correlation coefficient r. The formula of the Pearson correlation r is as follows:
r = c o v ( X k , X l ) σ k σ l
where X k and X l represent the k-th and l-th characteristics of the sample, respectively. σ k   and σ l represent the mean square deviation of the k-th and l-th characteristics of the sample (sample: residue pairs), respectively.
In the second step, we used Formula (9) to calculate the F-value of each characteristic between the positive sample and the negative sample (negative sample: noninterface residue pairs and positive sample: interface residue pairs). The larger the F-value of the characteristic, the greater the difference between positive and negative samples and the greater the contribution of the characteristic to distinguish positive and negative samples.
F ( x m ) = | μ m + μ m σ m + σ m + |
where μ m + and μ m are the mean values of the m-th characteristic of the positive sample and the negative sample, respectively. σ m + and σ m are the mean square deviation of the m-th characteristic of the positive sample and the negative sample, respectively.
We preserved basic first-order sequence characteristics and basic structural geometric characteristics. For the class with Pearson correlation coefficient |r| > 0.5, we preserved the characteristics of a relatively large F-value, with 96 characteristics in total. We used the 96 characteristics as the second set of feature vectors (see Appendix A Table A2).

2.4. Our Algorithms (A Two-Layer SVM Ensemble-Classifier)

Each protein trimer consists of three chains, and any two chains interact to form a protein–protein interaction interface. We split each protein trimer into three protein–protein interactions. Through Section 2.2 Feature Extraction and Section 2.3 Feature Vector Engineering, two sets of feature vectors were used to represent a residue pair. Therefore, we generated two sets of train data for the training trimer protein complexes. The train data composed of the first (second) set of feature vector was called Train data 1 (2).
The proportion of protein–protein interface residue pairs in all residue pairs was very low. Therefore, the positive and negative classes in train data 1 (2) were extremely imbalanced (negative classes: noninterface residue pairs and positive class: interface residue pairs). We used under-sampling to deal with the class imbalance problem. To improve the performance of the method, we used the classifier ensemble to alleviate the lack of information caused by the under-sampling. The process was as follows:
Generation of balanced subset samples (for training set j (j = 1,2)):
First, we randomly generated 100 subset samples of negative class from all the negative class samples in train data j. The total number of positive class was 12,687 in the entire training set. Therefore, we set the number of negative class in each subset sample to 12,687. Second, we combined each negative class subset sample with all positive class samples to generate a balanced subset sample. Finally, obtained 100 balanced subset samples for train data j (j = 1,2).
Generation of an ensemble classifier:
SVM is a supervised machine learning method, which is widely used in the field of protein–protein interactions. Here, we also used SVM to predict trimer protein complexes interface residue pairs.
There are 100 balance subset samples in train data j (j = 1,2), each of which can be used to train an SVM model. We obtained 100 individual SVM predictors for train data j. We then developed an ensemble SVM classifier P j by fusing the 100 individual SVM predictors in train data j through a probability system, as shown in formula 10. Finally, a two-layer SVM ensemble-classifier P was formed by fusing two ensemble SVM classifiers P1 and P2 through a weight ω (here, we set ω to 1/2). We provide a flowchart in Figure 2 to illustrate how to generate a two-layer SVM ensemble-classifier P (An implementation of our model is available at the website ftp://202.112.126.135/pub/Trimer/code).
P j ( x ) = i = 1 100 S V M i ( x )   ( i   =   1 , 2 , , 100 ,   j   =   1 , 2 )
where S V M i indicates the SVM predictor trained with an i-balance subset sample. x indicates a residue pair. S V M i ( x ) indicates the probability that the residue pair x is the interface residue pair in the i-th individual SVM prediction of train data j.
P ( x ) = ω × P 1 ( x ) + ( 1 ω ) × P 2 ( x )

2.5. Evaluation Criteria

The output of our model is a value between 0 and 1, showing the possibility of the residue pair to be an interface residue pair. The values were sorted from large to small. The number t predicted interface residue pairs with highest probability were used as the top t predicted interface residue pairs.
We used the following three measures to evaluate the performance of our method. First, we defined a three-dimensional vector   N P R P T ( t ) = ( n 1 , n 2 , n 3 ) t , where n z (z = 1,2,3) represents the number of positive interface residue pairs in the top t predictions for each possible protein–protein interface of a protein trimer. Here, N P R P T ( t ) is the abbreviation of the number of positive interface residue pairs in the top t predictions for each protein trimer. The first index is N P R P T ( t ) 0 , which represents the L0 norm of N P R P T ( t ) . It is consistent with the meaning of the L0 norm of vector in mathematics, which represents the number of nonzero elements in a vector. So N P R P T ( t ) 0 represents the number of interfaces that we can correctly predict in each protein trimer. In the top t predictions provided, if there is at least one positive interface residue pair, we assumed that the protein–protein interaction interface could be predicted correctly.
The second index is N P R P T ( t ) 1 , which represents the L1 norm of N P R P T ( t ) , see Formula (12). It is consistent with the meaning of the L1 norm of vector in mathematics. Therefore, N P R P T ( t ) 1 represents the number of positive interface residue pairs in the top t predictions at a protein trimer
N P R P T ( t ) 1 = n 1 + n 2 + n 3 .
The third index is index accuracy rate (see Formula (13)).
accuracy   rate   ( t ) z = SCT ( t ) T N T × 100 % .
SCT represents the sum of all correctly predicted trimer protein complexes. In the top t predictions, if there were z protein–protein interaction interfaces satisfying at least one positive interface residue pair, we assumed that the protein trimer was predicted correctly. TNT represents the total number of trimer protein complexes in the dataset.

3. Results and Discussion

3.1. Application of our Algorithms on the Testing Set

There were 26 trimer protein complexes in the testing set. Each protein trimer consists of three chains, and any two chains interact to form a protein–protein interaction interface. Therefore, we obtained 78 protein–protein interaction interfaces. Through Section 2.2 Feature Extraction and Section 2.3 Feature Vector Engineering, two sets of feature vectors can be used to represent a residue pair. Therefore, we can generate test data 1 and test data 2 for the testing trimer protein complexes. Then by inputting test data 1 and test data 2 to our proposed Algorithms, we can get the prediction results of the 78 protein–protein interaction interfaces.

3.2. Analysis of the Testing Set Results

Table 3 shows the top t (t = 10, 15, 20, and 30) predictions and the two evaluation indexes corresponding to the testing set results. It can be seen from Table 3 that when 3 protein–protein interaction interfaces of each protein trimer were correctly predicted, a total of 9 trimer protein complexes were correctly predicted in the top 10 predictions. The prediction result of 2IY0 protein trimer was the best, with up to 10 positive interface residue pairs. When 3 protein–protein interaction interfaces of each protein trimer are correctly predicted, a total of 17 trimer protein complexes were correctly predicted in the top 30 prediction results. Among them, there were 10 trimer protein complexes for which at least 10 positive interface residue pairs were predicted.
To further analyze the prediction results, we obtained the index a c c u r a c y   r a t e   ( t ) z from all N P R P T 0 columns in Table 3 (see Table 4). When 3 protein–protein interaction interfaces of each protein trimer were correctly predicted in the top 15 predictions, the accuracy rate was 42.31%, i.e., more than 2/5 of trimer protein complexes in the testing set were correctly predicted. When 3 protein–protein interaction interfaces of each protein trimer were correctly predicted in the top 30 predictions, the accuracy rate was as high as 65.38%. When at least 2 protein–protein interaction interfaces of each protein trimer are correctly predicted in the top 10 predictions, the accuracy rate was 53.85%, i.e., more than half trimer protein complexes in the testing set were correctly predicted. When at least 1 protein–protein interaction interface of each protein trimer was correctly predicted, the accuracy rate was 76.92% in the top 10 predictions and up to 92.31% in the top 30 predictions.
There are 6479 pairs of interface residue pairs in the test set, of which 968 pairs are formed by residues at N- and C-terminal regions, accounting for 15% (here, residues at the N- and C-terminal regions is the residues that we have specially treated in the manuscript). We can accurately predict 190 interface residue pairs for all testing set protein trimers, of which 48 pairs are formed by residues at N-terminal and C-terminal, accounting for 25%.
We compared the performance of our method with the previous method [16]. When at least 1 protein–protein interaction interface of each protein trimer was correctly predicted, the accuracy of our method is 76.92% and of the previous method [16] is 31.1% in the top 10 predictions. The accuracy of our method is higher than them.
The analysis of the above results showed that our proposed method was able to accurately predict the interface residue pairs of trimer protein complexes. Additionally, our predicted results are consistent with the experimental results. In the experimental article of the 3ffd protein trimer [36], it is mentioned that residues 20 and 24 are strictly conserved, which allows for extensive interactions with the antibody. Residues 16, 20, 27, 52, 59, 97, 102, and 104 are also binding sites. In our top 20 prediction results, we successfully predicted 8 positive interface residue pairs. For clarity, 6 positive interface residue pairs (Tyr 104-Phe 102, Tyr 104-Tyr 104, Tyr 104-Phe 23, Tyr 59-Phe 23, Phe 102-Phe 23, and Thr 32-Gln 16) for the 3ffd protein trimer are shown in Figure 3a. In the experimental article of the 1s7o protein trimer [37], it is pointed out that the 1s7o protein trimer has two structural domains and the primary interaction mainly involves the second central domain. The hydrophobic residues Ile 85, Phe 86, Met 89, Ile 90, Leu 99, Ile 103, and Leu 106 create both an intermolecular and intramolecular hydrophobic core in the second domain. Arg82 and Asp 110 form salt bridges, and two Arg82 guanidyl groups in adjacent molecules contribute to the intramolecular and intermolecular interactions. In our top 20 prediction results, we have successfully predicted 11 positive interface residues pairs formed by these residues and their surrounding residues. For illustration purposes, we show 6 positive interface residue pairs (Ile105-Ile109, Glu101-Ile 109, Ile 105-Ile 85, Glu 101-Val 81, Leu 106-Ile 85, and Ile 85-leu 106) in Figure 3b.
The training set contains a lot of antibody fragments, which make up two of the three chains: 1BGX, 3O2D, 3R1G. 3GI9, 1JPS, 1JRH, 1FNS. Similarly, 1F6F, 1EER, 1HWG, and 3VA2 are all cytokine receptor complexes with probable similarity between the receptor CRH domains. We deleted 1BGX, 3O2D, 3R1G. 3GI9, 1JPS, 1JRH, 1FNS 1F6F, 1EER, 1HWG, 3VA2 in the training set. The test set also contains 3 complexes with antibody chains: 3FFD, 1OSP, and 1SY6. We generated testing set 2, which deleted 3FFD, 1OSP, and 1SY6 relative to testing set. Appendix A Table A3 shows the top t (t = 15, 20, and 30) predictions and the two evaluation indexes corresponding to the testing set 2 results. We compared the prediction results of testing set with that of testing set 2 (Table 5). When at least 2 protein–protein interaction interfaces of each protein trimer are correctly predicted, the accuracy of testing set 2 is about 7% lower than that of testing set. When at least 3 protein–protein interaction interfaces of each protein trimer are correctly predicted in the top 30 predictions, the accuracy of testing set 2 is about 8.5% lower than that of testing set. When at least 3 protein–protein interaction interfaces of each protein trimer are correctly predicted in the top 20 predictions, the accuracy of testing set 2 was 6% higher than that of test set. The rest of the prediction results of the two test sets are almost the same.

3.3. Comparison with Random Results

We assume that the stochastic prediction of interface residue pairs of each protein–protein interaction interface in trimer protein complexes obeys a hypergeometric distribution X ~ H ( N , M , T ) [38]; where X is the number of positive interface residue pairs in the top T predictions. N is the number of all the residue pairs of one protein–protein interaction interface in one protein trimer. M is the number of positive interface residue pairs in this protein–protein interaction interface. Next, we can calculate the probability P that there are x positive interface residue pairs in the T predictions of one protein–protein interaction interface by the stochastic model (see Formula (14)):
P ( X = x ) = C M x C N M T x C N T .
In order to simplify the calculation, we assumed that each protein–protein interaction interface was independently identically distributed, and N is the mean value of all residue pairs in each protein–protein interaction interface, and M is the mean value of positive interface residue pairs in each protein–protein interaction interface. It can be seen that N is about 40,920 and M is about 83 in the Appendix A Table A4 When at least 1 protein–protein interaction interface of each protein trimer has at least one positive interface residue pair in T predictions, the probability P 1 ˜ is:
P 1 ˜ ( X 1 ) = 1 C 40,920 83 T C 40,920 T .
Consideration of the complexity of the P 1 ˜ calculation, we have made an enlarged calculation of P 1 ˜ (see inequality 16 and 17). Obviously, the computational complexity of the P ^ 1 is less than P 1 ˜ , and when T is fixed, P 1 ˜ is less than P ^ 1 . When the value of T is 10, 15, 20, and 30, we can calculate P ^ 1 through the Monte Carlo simulation method (see Table 6).
P 1 ˜ ( X 1 ) = 1 C 40,920 83 T C 40,920 T 1 40,837 T + 1 40,920 T + 1
P 1 ˜ P ^ 1   ( P ^ 1 = 1 40,837 T + 1 40,920 T + 1 )
When at least 2 protein–protein interaction interfaces of each protein trimer have at least one positive interface residue pair in T predictions, the probability P 2 ˜ is:
P 2 ˜ = C 3 2 P 1 ˜ 2 ( 1 P 1 ˜ ) + P 1 ˜ 3
Combining Formulas (15) and (18), we also enlarge P 2 ˜ and obtained inequality 20. Obviously, the computational complexity of the P ^ 2 is much less than P 2 ˜ , and when T is fixed, P 2 ˜ is less than P ^ 2 . When the value of T is 10, 15, 20, and 30, we can calculate P ^ 2 through the Monte Carlo simulation method (see Table 6).
P 2 ˜ = C 3 2 P 1 ˜ 2 × ( 1 P 1 ˜ ) + P 1 ˜ 3 = 1 3 ( 40,837 × 40,836 × × ( 40,837 T + 1 ) 40,920 × 40,919 × × ( 40,920 T + 1 ) ) 2 + 2 ( 40,837 × 40,836 × × ( 40,837 T + 1 ) 40,920 × 40,919 × × ( 40,920 T + 1 ) ) 3 1 3 ( 40,837 T + 1 40,920 T + 1 ) 2 + 2 ( 40,837 40,920 ) 3
P 2 ˜ P ^ 2 ( P ^ 2 = 1 3 ( 40,837 T + 1 40,920 T + 1 ) 2 + 2 ( 40,837 40,920 ) 3 )
When 3 protein–protein interaction interfaces of each protein trimer have at least one positive interface residue pair in T predictions, the probability P 3 ˜ is:
P 3 ˜ = P 1 ˜ 3
P 3 ˜ = P 1 ˜ 3 = ( 1 40,837 × 40,837 × × ( 40,837 T + 1 ) 40,920 × 40,919 × × ( 40,920 T + 1 ) ) 3 ( 1 40,837 T + 1 40,920 T + 1 ) 3
P 3 ˜ P ^ 3   ( P ^ 3 = ( 1 40,837 T + 1 40,920 T + 1 ) 3 )
In the same way as above, we also enlarged P 3 ˜ and obtained P ^ 3 (see Inequality (22) and (23)). When the value of T was 10, 15, 20, and 30, we calculated P ^ 3 through the Monte Carlo simulation method (see Table 6).
As can be seen from Table 5, the accuracy of our method to predict the interface residue pairs of trimer protein complexes is much higher than that of random results. When at least 1 protein–protein interaction interface of each protein trimer was correctly predicted, our method accuracy was over 76.92% and up to 92.31%, while the random accuracy was lower than 0.20298%. When at least 2 protein–protein interaction interfaces of each protein trimer were correctly predicted, our accuracy was over 53.85% and up to 84.62%, whereas the random accuracy was below 0.0015%. When 3 protein–protein interaction interfaces of each protein trimer were correctly predicted, our accuracy achieved 65.38% in the top 30 predictions, and the accuracy was more than 108 times higher than that of random results.

4. Conclusions

In this paper, we defined an amino acid k-interval product factor to describe the influence of neighboring amino acids on a residue. This method takes advantage of the physicochemical and geometric properties of amino acids, and also considers the influence of neighboring amino acids (amino acid k-interval product factor) as features. Finally, we developed a two-layer SVM ensemble-classifier method, based on feature vector engineering and SVM, to predict the interface residue pairs of trimer protein complexes. In our testing set, the accuracy rate of successfully predicting one interface was 84.62%, and for two interfaces, the accuracy rate was 73.08%, in the top 15 predictions, which indicates significance for biological experimentation and biomedical-related research. Moreover, our predicted results are consistent with the experimental results. This shows that our method is effective and reliable to predict interface residue pairs of trimer protein complexes. However, our accuracy rate was not high when three interfaces of one trimer are required to predict correctly. We also did not consider protein conformational changes. These are the areas where we will improve in the future.

Supplementary Materials

Author Contributions

Conceptualization, Y.L. and X.G.; methodology, Y.L.; software, Y.L.; validation, Y.L.; formal analysis, Y.L. and X.G.; investigation, Y.L.; resources, Y.L. and X.G.; data curation, Y.L.; writing—original draft preparation, Y.L. and X.G.; writing—review and editing, Y.L. and X.G.; supervision, Y.F.L. and X.G. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the National Natural Science Foundation of China (31670725).

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Table A1. Five physicochemical properties for the 20 amino acids.
Table A1. Five physicochemical properties for the 20 amino acids.
Amino AcidΦ1Φ2Φ3Φ4Φ5Φ6Φ7
A0.620.0468.1−1.3021.570.170.50
C0.290.1285.50.465−1.02−0.24−0.02
D−0.90.105130.302−0.2591.233.64
E−0.740.15112.3−1.4530.1132.023.63
F1.190.295.2−0.59−0.397−1.13−1.71
G0.48091.6521.0450.011.15
H−0.40.2310.4−0.417−1.4740.962.33
I1.380.1865.2−0.5470.393−0.31−1.12
K−1.50.21911.3−0.561−0.2770.992.80
L1.060.1864.9−0.9871.266−0.56−1.25
M0.640.2215.7−1.524−1.005−0.23−0.67
N−0.780.13411.60.828−0.1690.420.85
P0.120.13182.0810.4210.450.14
Q−0.850.1810.5−0.179−0.5030.580.77
R−2.530.29110.5−0.0550.440.811.81
S−0.180.0629.21.3990.670.130.46
T−0.050.10880.3260.9080.140.25
V1.080.145.9−0.2791.2420.07−0.46
W0.810.4095.40.009−2.128−1.85−2.09
Y0.260.2986.20.83−0.838−0.94−0.71
We used Φ1, Φ2, Φ3, Φ4, Φ5, Φ6, and Φ7 to represent the hydrophobicity 1, polarizability, polarity, secondary structure, codon diversity, hydrophobicity 2, and hydrophobicity 3 for the 20 amino acids.
Table A2. Forty-eight characteristics to describe a residue.
Table A2. Forty-eight characteristics to describe a residue.
A Residue of Protein Monomer Characteristics
ASA_AΦ4AAFIPF1(5)AABIPF2(4)AAFIPF4(5)AAFIPF5(3)
RASA_AΦ5AABIPF1(1)AABIPF3(1)AABIPF4(1)AAFIPF5(4)
ECA_AΦ6AABIPF1(2)AABIPF3(2)AABIPF4(2)AAFIPF5(5)
ICA_AΦ7AABIPF1(3)AABIPF3(5)AABIPF4(3)AABIPF5(1)
EVA_AAAFIPF1(1)AABIPF1(4)AAFIPF4(1)AABIPF4(4)AABIPF5(2)
Φ1AAFIPF1(2)AABIPF1(5)AAFIPF4(2)AABIPF4(5)AABIPF5(3)
Φ2AAFIPF1(3)AAFIPF2(2)AAFIPF4(3)AAFIPF5(1)AABIPF5(4)
Φ3AAFIPF1(4)AABIPF2(2)AAFIPF4(4)AAFIPF5(2)AABIPF5(5)
We used Φ1, Φ2, Φ3, Φ4, Φ5, Φ6, and Φ7 to represent basic first-order sequence features.We use 48 characteristics to describe a residue in the second set of feature vector.
Table A3. Two evaluation indexes of the testing set 2 prediction results.
Table A3. Two evaluation indexes of the testing set 2 prediction results.
Protein Name t = 15 t = 20 t = 30
N P R P T 0 N P R P T 1 N P R P T 0 N P R P T 1 N P R P T 0 N P R P T 1
1osp3536310
1oy3131415
1p323636310
1q5x343539
1qb3311313317
1s7o3739310
1sg2111111
1stz223335
1sy6242527
1w9z122324
1wdj38311312
1ynb111224
1za7363638
2ig8312314321
2ium232439
2iy0363839
2izw000011
2ms2000000
2r3u333437
2wr5000000
3dli252627
3ffd002223
3m6n233538
Table A4. All residue pairs and positive interface residue pairs in each protein–protein interface.
Table A4. All residue pairs and positive interface residue pairs in each protein–protein interface.
Protein–Protein InterfacePositive Residue PairsAll Residue Pairs
1osp_H_L16046,652
1osp_H_O3054,718
1osp_L_O2653,714
1oy3_B_C5315,120
1oy3_B_D3824,640
1oy3_C_D8629,700
1p32_A_B9331,122
1p32_A_C9532,032
1p32_B_C9430,096
1q5x_A_B4225,238
1q5x_A_C3724,800
1q5x_B_C4124,490
1qb3_A_B713,447
1qb3_A_C17212,317
1qb3_B_C1512,971
1s7o_A_B1211,130
1s7o_A_C2111,448
1s7o_B_C7811,340
1sg2_A_B4415,369
1sg2_A_C7220,022
1sg2_B_C5015,478
1stz_A_B53100,453
1stz_A_C135100,453
1stz_B_C4596,721
1sy6_A_H3736,792
1sy6_A_L1535,784
1sy6_H_L16446,647
1w9z_A_B15066,049
1w9z_A_C14965,278
1w9z_B_C15265,278
1wdj_A_B15328,272
1wdj_A_C4534,596
1wdj_B_C928,272
1ynb_A_B19727,889
1ynb_A_C2527,889
1ynb_B_C8427,889
1za7_A_B4324,915
1za7_A_C3124,915
1za7_B_C3327,225
2ig8_A_B9820,306
2ig8_A_C10420,164
2ig8_B_C10220,306
2ium_A_B8644,521
2ium_A_C8844,521
2ium_B_C8844,521
2iy0_A_B9017,176
2iy0_A_C2835,256
2iy0_B_C1311,856
2izw_A_B8831,862
2izw_A_C8037,024
2izw_B_C9037,232
2ms2_A_B3916,641
2ms2_A_C3616,641
2ms2_B_C4116,641
2r3u_A_B8839,390
2r3u_A_C8437,370
2r3u_B_C9736,075
2wr5_A_B169235,225
2wr5_A_C176235,225
2wr5_B_C162235,225
3dli_A_B8648,841
3dli_A_C8648,841
3dli_B_C8448,841
3ffd_A_B15045,570
3ffd_A_P403780
3ffd_B_P263906
3m6n_A_B42570,752
3m6n_A_C6869,696
3m6n_B_C7470,752
3owt_A_B821,904
3owt_A_C592960
3owt_B_C312960
3p5j_A_B6744,802
3p5j_A_C10830,392
3p5j_B_C21419,836
3qks_A_B26735,621
3qks_A_C294179
3qks_B_C243759
mean83.064140,919.63
Protein–protein interface column denotes the protein–protein interface of those two chains in a protein trimer, such as 1osp_ H_ L is the interaction interface between H chain and L chain of 1osp protein trimer.

References

  1. Arango-Rodriguez, J.D.; Cardona-Escobar, A.F.; Jaramillo-Garzon, J.A.; Arroyave-Ospina, J.C. Machine learning based protein-protein interaction prediction using physical-chemical representations. In Proceedings of the 2016 XXI Symposium on Signal Processing, Images and Artificial Vision (STSIVA), Bucaramanga, Colombia, 31 August–2 September 2016. [Google Scholar]
  2. Vidal, M.; Cusick, M.E.; Barabási, A.L. Interactome Networks and Human Disease. Cell 2011, 144, 986–998. [Google Scholar] [CrossRef] [Green Version]
  3. Segura, J.; Jones, P.F.; Fernandez-Fuentes, N. Improving the prediction of protein binding sites by combining heterogeneous data and Voronoi Diagrams. BMC Bioinform. 2011, 12, 352. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  4. Hwang, H.; Vreven, T.; Weng, Z. Binding interface prediction by combining protein-protein docking results. Proteins Struct. Funct. Bioinform. 2014, 82, 57–66. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  5. Lyu, Y.F.; Huang, H.; Gong, X.Q. A novel index of Contact frequency from noise protein-protein interaction data help for accurate interface residue pair prediction. Interdiscip. Sci. Comput. Life Sci. 2020, 12, 204–216. [Google Scholar] [CrossRef] [PubMed]
  6. Ovchinnikov, S.; Kamisetty, H.; Baker, D. Robust and accurate prediction of residue–residue interactions across protein interfaces using evolutionary information. Elife 2014, 3, e02030. [Google Scholar] [CrossRef] [PubMed]
  7. Dong, Z.; Wang, K.; Dang, T.K.L.; Gültas, M. CRF-based models of protein surfaces improve protein-protein interaction site predictions. BMC Bioinform. 2014, 15, 277. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  8. Du, T.; Liao, L.; Wu, C.; Sun, B. Prediction of residue-residue contact matrix for protein-protein interaction with fisher score features and deep learning. Methods 2016, 110, 97–105. [Google Scholar] [CrossRef]
  9. Kamisetty, H.; Ovchinnikov, S.; Baker, D. Assessing the utility of coevolution-based residue-residue contact predictions in a sequence- and structure-rich era. Proc. Natl. Acad. Sci. USA 2013, 110, 15674–15679. [Google Scholar] [CrossRef] [Green Version]
  10. La, D.; Kihara, D. A novel method for protein–protein interaction site prediction using phylogenetic substitution models. Proteins 2012, 80, 126–141. [Google Scholar] [CrossRef] [Green Version]
  11. Mirco, M.; Sikander, H.; Skwark, M.J.; Chris, S.; Marks, D.S.; Arne, E. PconsFold: Improved contact predictions improve protein models. Bioinformatics 2014, 17, i482–i488. [Google Scholar] [CrossRef]
  12. Peng, C.; Zhang, J.; Sun, W.; Bing, W. Current status of machine learning-based methods for identifying protein-protein interaction sites. Curr. Bioinform. 2013, 8, 177–182. [Google Scholar] [CrossRef]
  13. Perišić, O. Contact and first layer residues prediction in protein dimers using the gaussian network model with adjusTable number of fast modes. arXiv 2013. [Google Scholar]
  14. Raza, K. Protein Features Identification for Machine Learning-Based Prediction of Protein-Protein Interactions. International conference on Information, Communication and Computing Technology; Springer: Singapore, 2017; ISBN 978-981-10-6543-9. [Google Scholar]
  15. Wang, W.; Yang, Y.; Yin, J.; Gong, X. Different protein-protein interface patterns predicted by different machine learning methods. Sci. Rep. 2017, 7, 16023. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  16. Zhao, Z.; Gone, X. Protein-protein interaction interface residue pair prediction based on deep learning architecture. IEEE/ACM Trans. Comput. Biol. Bioinform. 2017, 16, 1753–1759. [Google Scholar] [CrossRef]
  17. Tanford, C. Contribution of Hydrophobic Interactions to the Stability of the Globular Conformation of Proteins. J. Am. Chem. Soc. 1962, 84, 4240–4247. [Google Scholar] [CrossRef]
  18. Eisenberg, D. Three-dimensional structure of membrane and surface proteins. Annu. Rev. Biochem. 1984, 53, 595–623. [Google Scholar] [CrossRef]
  19. Kyte, J.; Doolittle, R.F. A simple method for displaying the hydropathic character of a protein. J. Mol. Biol. 1982, 157, 105–132. [Google Scholar] [CrossRef] [Green Version]
  20. Marvin, C.; Charton, B.I. The structural dependence of amino acid hydrophobicity parameters. J. Theor. Biol. 1982, 99, 629–644. [Google Scholar] [CrossRef]
  21. Grantham, R. Amino acid difference formula to help explain protein evolution. Science 1974, 185, 862–864. [Google Scholar] [CrossRef]
  22. Shrake, A.; Rupley, J.A. Environment and exposure to solvent of protein atoms. lysozyme insulin. J. Mol. Biol. 1973, 79, 351–371. [Google Scholar] [CrossRef]
  23. Afsar Minhas, F.U.A.; Geiss, B.J.; Ben-Hur, A. PAIRpred: Partner-specific prediction of interacting residues from sequence and structure. Proteins Struct. Funct. Bioinform. 2014, 82, 1142–1155. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  24. Tahir, M.; Hayat, M. Machine learning based identification of protein-protein interactions using derived features of physiochemical properties and evolutionary profiles. Artif. Intell. Med. 2017, 78, 61–71. [Google Scholar] [CrossRef] [PubMed]
  25. Jianhua, J.; Liu, Z.; Xiao, X.; Liu, B.; Chou, K. iPPI-Esml: An ensemble classifier for identifying the interactions of proteins by incorporating their physicochemical properties and wavelet transforms into PseAAC. J. Theor. Biol. 2015, 377, 47–56. [Google Scholar] [CrossRef]
  26. Yang, Y.; Gong, X.Q. A new probability method to understand protein-protein interface formation mechanism at amino acid level. J. Theor. Biol. 2018, 436, 18–25. [Google Scholar] [CrossRef]
  27. Afreixo, V.; Bastos Carlos, A.C.; Pinho, A.J.; Garcia, S.P.; Ferreira Paulo, J.S.G. Genome analysis with inter-nucleotide distances, Bioinformatics. 2009, 25, 3064–3070. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  28. Wang, L.L.; Brown, S.J. BindN: A web-based tool for efficient prediction of NA and RNA binding sites in amino acid sequences. Nucleic Acids Res. 2006, 34, 243–248. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  29. Wang, L.L.; Huang, C.; Yang, M.Q.; Yang, J.Y. BindN+ for accurate prediction of DNA and RNA-binding residues from protein sequence features. Bmc Syst. Biol. 2010, 4, S3. [Google Scholar] [CrossRef] [Green Version]
  30. Olsson, M.H.M.; Søndergaard, C.R.; Rostkowski, M.; Jensen, J.H. Propka3: Consistent treatment of internal and surface residues in empirical pka predictions. J. Chem. Theory Comput. 2011, 7, 525–537. [Google Scholar] [CrossRef]
  31. Dhole, K.; Singh, G.; Pai, P.P.; Mondal, S. Sequence-based prediction of protein–protein interaction sites with L1-logreg classifier. J. Theor. Biol. 2014, 348, 47–54. [Google Scholar] [CrossRef]
  32. Dhole, K.; Singh, G.; Pai, P.P.; Mondal, S. SPRINGS: Prediction of Protein-Protein Interaction Sites Using Artificial Neural Networks. Peerj 2014. [Google Scholar] [CrossRef]
  33. Porollo, A.; Meller, J. Prediction-based fingerprints of protein—protein interactions. Proteins Struct. Funct. Bioinform. 2010, 66, 630–645. [Google Scholar] [CrossRef] [PubMed]
  34. Hubbard, S.J.; Thornton, M. NACCESS, 2.1.1.; Department of biochemistry and molecular biology: London, UK, 1993. [Google Scholar]
  35. Fischer, T.B.; Holmes, J.B.; Miller, I.R.; Parsons, J.R.; Tung, L.; Hu, J.C.; Tsai, J. Assessing methods for identifying pair-wise atomic contacts across binding interfaces. J. Struct. Biol. 2006, 153, 103–112. [Google Scholar] [CrossRef] [PubMed]
  36. Mckinstry, W.J.; Polekhina, G.; Diefenbach-Jagger, H.; Ho, P.W.M.; Sato, K.; Onuma, E.; Gillespie, M.T.; Martin, T.J.; Parker, M.W. Structural basis for antibody discrimination between two hormones that recognize the parathyroid hormone receptor*. J. Biol. Chem. 2009, 284, 15557–15563. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  37. Oganesyan, V.; Pufan, R.; DeGiovanni, A.; Yokota, H.; Kim, R.; Kim, S.H. Structure of the putative dna-binding protein sp_1288 from streptococcus pyogenes. Acta Crystallogr. 2004, 60, 1266–1271. [Google Scholar] [CrossRef]
  38. Li, C.X.; Dobbs, D.; Bonvin, A.M.J.J.; Honavar, V. Computational prediction of protein interfaces: A review of data driven methods. FEBS Lett. 2015, 589, 3516–3526. [Google Scholar] [CrossRef] [Green Version]
Sample Availability: Samples of the compounds are not available from the authors.
Figure 1. Key steps of the method.
Figure 1. Key steps of the method.
Molecules 25 04353 g001
Figure 2. Flow chart of the two-layer support vector machine (SVM) ensemble-classifier method.
Figure 2. Flow chart of the two-layer support vector machine (SVM) ensemble-classifier method.
Molecules 25 04353 g002
Figure 3. Experimental three-dimensional structure of the 3ffd and 1s7o trimer protein complexes. Figure (a) and Figure (b) are the three-dimensional structure of 3ffd and 1s7o protein trimer. We label three protein monomers with pink, blue, and green. The number of markers in the black circle indicates the correct predicted interface residue pair position on the two protein monomers.
Figure 3. Experimental three-dimensional structure of the 3ffd and 1s7o trimer protein complexes. Figure (a) and Figure (b) are the three-dimensional structure of 3ffd and 1s7o protein trimer. We label three protein monomers with pink, blue, and green. The number of markers in the black circle indicates the correct predicted interface residue pair position on the two protein monomers.
Molecules 25 04353 g003
Table 1. Detailed information of the training set and testing set.
Table 1. Detailed information of the training set and testing set.
Data Set NamePDB Code
Training Set1A121AHS1AWI1B771BGX1CJD1CUN1DKG1EER1EL61F6F1FNS1FPO
1G2X1HWG1IDP1IK91J5S1JPS1JRH1KI91KKE1L5A1LW12ADV2AZE
2B2Y2B4I2BSD2CU52DJ62E2A2E4M2FB52FM82FVH2FZ12GDG2GMI
2I152P902PBQ3CC03EMF3F5C3G653GI93N4G3NAP3O2D3R1G3VA2
Testing Set1OSP1OY31P321Q5X1QB31S7O1SG21STZ1SY61W9Z1WDJ1YNB1ZA7
2IG82IUM2IY02IZW2MS22R3U2WR53DLI3FFD3M6N3OWT3P5J3QKS
Table 2. The five geometric features and their calculation tools.
Table 2. The five geometric features and their calculation tools.
FeaturesAbbreviation Software or Researchers
Accessible surface areaASANaccess V2.1.1
Relative accessible surface areaRASANaccess V2.1.1
Exterior contact areaECAQcontacts
Interior contact areaICAQcontacts
Exterior void areaEVANACCES V2.1.1, Qcontacts
Table 3. Two evaluation indexes of the testing set prediction results.
Table 3. Two evaluation indexes of the testing set prediction results.
Protein Name t = 10 t = 15 t = 20 t = 30
N P R P T 0 N P R P T 1 N P R P T 0 N P R P T 1 N P R P T 0 N P R P T 1 N P R P T 0 N P R P T 1
1osp232325310
1oy334353639
1p3213131314
1q5x353536310
1qb334343439
1s7o3739311314
1sg223353739
1stz00000000
1sy622222325
1w9z33333337
1wdj22242628
1ynb22232424
1za7142639312
2ig811122335
2ium12343639
2iy0310314316319
2izw12232439
2ms2353838310
2r3u00001122
2wr500000000
3dli33333336
3ffd343438310
3m6n00000011
3owt13242528
3p5j00222334
3qks00112236
Table 4. Accuracy   rate   ( t ) z of the testing set prediction results
Table 4. Accuracy   rate   ( t ) z of the testing set prediction results
t t = 10 t = 15 t = 20 t = 30
z
z = 3 34.62%42.31%46.15%65.38%
z = 2 53.85%73.08%80.77%84.62%
z = 1 76.92%84.62%88.46%92.31%
Table 5. Accuracy   rate   ( t ) z of the testing set and testing set 2 prediction results.
Table 5. Accuracy   rate   ( t ) z of the testing set and testing set 2 prediction results.
t t = 15 t = 20 t = 30
z Result1Result2Result1Result2Result1Result2
z = 3 42.31%43.48%46.15%52.17%65.38%56.52%
z = 2 73.08%65.22%80.77%73.91%84.62%78.26%
z = 1 84.62%82.61%88.46%86.96%92.31%91.30%
a. Result1 represents the testing set result accuracy. b. Result2 represents the accuracy of the result in the testing set 2.
Table 6. Comparison of our method with that of random results.
Table 6. Comparison of our method with that of random results.
Accuracy Rate t = 10 t = 15 t = 20 t = 30
P ^ 1 0.20288%0.20290%0.20292%0.20298%
accuracy   rate ( t )   1 76.92%84.62%88.46%92.31%
P ^ 2 0.001500%0.001648%0.001797%0.00002094%
accuracy   rate ( t )   2 53.85%73.08%80.77%84.62%
P ^ 3 0.0000008351%0.0000008354%0.0000008357%0.0000008363%
accuracy   rate ( t )   2 34.62%42.31%46.15%65.38%

Share and Cite

MDPI and ACS Style

Lyu, Y.; Gong, X. A Two-Layer SVM Ensemble-Classifier to Predict Interface Residue Pairs of Protein Trimers. Molecules 2020, 25, 4353. https://0-doi-org.brum.beds.ac.uk/10.3390/molecules25194353

AMA Style

Lyu Y, Gong X. A Two-Layer SVM Ensemble-Classifier to Predict Interface Residue Pairs of Protein Trimers. Molecules. 2020; 25(19):4353. https://0-doi-org.brum.beds.ac.uk/10.3390/molecules25194353

Chicago/Turabian Style

Lyu, Yanfen, and Xinqi Gong. 2020. "A Two-Layer SVM Ensemble-Classifier to Predict Interface Residue Pairs of Protein Trimers" Molecules 25, no. 19: 4353. https://0-doi-org.brum.beds.ac.uk/10.3390/molecules25194353

Article Metrics

Back to TopTop