Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

How to Use Not-Always-Reliable Binding Site Information in Protein-Protein Docking Prediction

  • Lin Li,

    Affiliations Biomolecular Physics and Modeling Group, Department of Physics, Huazhong University of Science and Technology, Wuhan, Hubei, China, Computational Biophysics and Bioinformatics, Department of Physics, Clemson University, South Carolina, United States of America

  • Yanzhao Huang ,

    yzhuang@hust.edu.cn (YH); yxiao@hust.edu.cn (YX)

    Affiliation Biomolecular Physics and Modeling Group, Department of Physics, Huazhong University of Science and Technology, Wuhan, Hubei, China

  • Yi Xiao

    yzhuang@hust.edu.cn (YH); yxiao@hust.edu.cn (YX)

    Affiliation Biomolecular Physics and Modeling Group, Department of Physics, Huazhong University of Science and Technology, Wuhan, Hubei, China

Abstract

In many protein-protein docking algorithms, binding site information is used to help predicting the protein complex structures. Using correct and accurate binding site information can increase protein-protein docking success rate significantly. On the other hand, using wrong binding sites information should lead to a failed prediction, or, at least decrease the success rate. Recently, various successful theoretical methods have been proposed to predict the binding sites of proteins. However, the predicted binding site information is not always reliable, sometimes wrong binding site information could be given. Hence there is a high risk to use the predicted binding site information in current docking algorithms. In this paper, a softly restricting method (SRM) is developed to solve this problem. By utilizing predicted binding site information in a proper way, the SRM algorithm is sensitive to the correct binding site information but insensitive to wrong information, which decreases the risk of using predicted binding site information. This SRM is tested on benchmark 3.0 using purely predicted binding site information. The result shows that when the predicted information is correct, SRM increases the success rate significantly; however, even if the predicted information is completely wrong, SRM only decreases success rate slightly, which indicates that the SRM is suitable for utilizing predicted binding site information.

Introduction

Most proteins interact with other proteins or molecules to perform their biological functions. On average, each protein interacts with three to ten partners approximately [1]. The details of protein-protein interactions need 3D structures of complexes. However, it is difficult to determine the structures of protein complexes experimentally, thus the number of available complex structures is still limited, compared with monomer protein structures. Therefore, it is helpful to use computational approaches to predict structures of protein complexes.

Many great docking algorithms have been developed. Some algorithms are based on Fast Fourier Transform (FFT) methods [2], such as MolFit [3], 3D-Dock [4], [5], [6], GRAMM [7], ZDock [8], [9], DOT [10], BiGGER [11], HEX [12] and so on. These FFT-based algorithms search 6D space fast and effectively. Thus, they are usually used as initial stages in docking procedures. However, the FFT-based algorithms consider receptor and ligand as rigid bodies. So, many of them are combined with other methods to further refine or re-rank the structures obtained in the initial stage [4], [13], [14]. Besides these FFT-based algorithms, some other algorithms are also developed, which are able to consider flexibility of proteins during docking procedure, such as RosettaDock [15], ICM-DISC [16], AutoDock [17], and HADDOCK [18].

If binding sites of a protein are known, they can be used to improve success rate of docking prediction [5], [19]. Many properties have been used to predict protein binding sites or interface residues and the widely used features include the hydrophobicity of residues [20], [21], [22], [23], the evolution conservation of residues [24], [25], [26], [27], [28], [29], planarity and accessible surface area of patches [30], [31]. Besides, some other interface-distinguishing features have also been explored. For example, it was found that the protein binding sites are surrounded by more bound waters and have lower temperature β-factors than other surface residues [32]. Some analysis also showed that protein interfaces are likely to contain backbone hydrogen bonds which are wrapped by more than nine hydrophobic groups [33]. Another work indicated that the side chains of interface residues have higher energies than other surface residues [34]. A single feature mentioned above cannot distinguish the binding sites from other surface residues. Thus some algorithms and meta servers have been developed, which combine different features to improve the binding site prediction success rate [32], [35], [36], [37], [38], [39], [40], [41]. A test on a dataset of 62 complexes shows that the success rates of these methods are about 30 percent [41].

Several groups integrate experimentally determined binding sites into their docking algorithms [4], [5], [19], [41], [42], [43], [44], [45]. These algorithms use the information in three different ways: (1) Most groups treat the information as a post filtering stage [4], [5], [41], [44], [45]. (2) Some algorithms [46], [47], [48], including Zdock’s block method [46], use the information to restrict the docking area during sampling stage. (3) Ben-zeev and Eisenstein implemented a weighted geometric method into Molfit [19]. For the first two kinds of algorithms, using correct binding site information can increase the success rate significantly, but obviously using wrong information will lead to a failed prediction. The third kind of algorithm could tolerant some inaccurate information, which made a success on a dataset of five complexes.

The predicted binding site information is not always reliable [41]. Thus, there is a high risk of using the unreliable information. In this work, A softly restricting method (SRM) is developed to utilize the predicted information. This SRM is based on our ASPDock algorithm [49], which has been proved to be successful in CAPRI(Critical Assessment of PRediction of Interactions) [50] rounds 18 and 19. SRM softly constrains the receptor and ligand to bind around predicted key residues during the sampling stage. The result shows that using SRM, the hit count number of the dataset increases significantly, which should greatly help scorers to pick out the near-native structures.

This work is different from Ben-zeev and Eisenstein’s. Ben-zeev and Eisenstein’s method is based on geometric complementary. On the contrast, our softly restricting method (SRM) is based on the ASPDock algorithm, which uses atomic solvation parameters (ASP) [51] rather than geometric complementary. Ben-zeev and Eisenstein test their method on several systems with experimental biochemical and biophysical data, which is correct information. However, in this work, we perform a large test on 99 complexes in benchmark 3.0 using only purely predicted information, which is mixed with correct and incorrect information.

Results and Discussion

Antibody-antigen and Dockground Complexes

Antibody-antigen complex structures are difficult to predict using ordinary FFT docking method without binding site information, mainly because each antibody Fab structure has two big pockets that are not the binding sites (Figure 1). The native binding site, CDR, usually has no advantage on geometry features. Using our ASPDock, antigens also have strong tendency to bind at the big pockets of antibodies because the accessible surface area decreases dramatically when antigens bind at the pockets. However, there are several methods to specify the CDR residues from sequences of antibodies. Using AbM definition, we specified CDR residues of all the 21 antibodies as correct information. We softly restrict the antigens to bind at the CDR residues and adjust the key residues weight in our algorithm by verifying the value of the weight factor α. When α>1.5, antigens strongly tend to bind at CDR residues. Consequently, the success rate and hit count are enhanced dramatically (Figures 2a and 2b).

thumbnail
Figure 1. An example of antibody-antigen prediction.

a. Native structure of antibody-antigen (1dqj), light blue structure is the receptor, Fab structure of antibody, CDR is colored orange. Green structure is the ligand. b. Ligand mass centers predicted by ASPDock without any predicted information. c. Ligand mass centers predicted by SRM, weight of CDR is 1.5. d. Ligand mass centers predicted by SRM, weight of CDR is 3.

https://doi.org/10.1371/journal.pone.0075936.g001

thumbnail
Figure 2. Results of 21 antibody-antigen and 11 dockground complexes.Predicted by ASPDock, SRM+Correct binding site information and SRM+Wrong binding site information.

a. Success rate of antibody-antigens. b. Hit count of antibody-antigens. c. Success rate of Dockground complexes. d. Hit count of Dockground complexes. e. Success rate of total complexes. f. Hit count of total complexes.

https://doi.org/10.1371/journal.pone.0075936.g002

However, even using correct information, there are still 5 antibody-antigen complexes that cannot be successfully predicted in top 2000 structures (Table 1), mainly because each of these complexes has a very small relative interface. In the top 2000 predictions, these 5 antigens tend to bind around the CDR residues of their conjunct antibodies, but the predicted interfaces of antigens are not correct. It implies that for these 5 antibody-antigen complexes, only information of CDR cannot make a successful prediction and it also needs to know antigen’s binding sites (antigenic determinant).

thumbnail
Table 1. Results of antibody-antigen and dockground complexes predicted by ASPDock and SRM.

https://doi.org/10.1371/journal.pone.0075936.t001

The sensitivity of SRM to incorrect information is also tested. For each antibody, we randomly selected 10 surface but non-interface residues as incorrect information. All the incorrect residues are out of CDR biding site, therefore, the incorrect information should result decrease of success rate and hit count. When the incorrect information is used for these 21 antibody-antigen complexes and the weight factor α is still set as1.5, success rate and hit count decrease slightly. This indicates that SRM is insensitive to incorrect information (Figures 2a and 2b).

For test on the 11 dockground3.0 complexes, when the weight factor α is also set as 1.5, success rate and hit count increase for correct information and did not decrease significantly for incorrect information (Figures 2c and 2d). This indicates that ASPDock evaluates near-native predictions as high score predictions, which are easy to get into top rank when the weight factor is 1.5. By contrast, most wrong predictions are evaluated as low score structures, even the ASP values of their binding site residues are enhanced 1.5 times, they still have no enough high scores to get into top rank.

Enzyme-inhibitor and Other Complexes

The tests on 21 antibody-antigen and 11 dockground3.0 complexes demonstrate that using SRM, correct information improves success rate and hit count significantly, while the incorrect information reduces success rate and hit count only slightly (Figures 2e and 2f). This means SRM is suitable for utilizing predicted information. Therefore, we test SRM on a 99-complexes data set by using predicted information from PPI-PRED server (figure 3).

thumbnail
Figure 3. Results of 35 enzyme-inhibitor complexes and 64 other type complexes.

Predicted by ASPDock and SRM. a. Success rate of enzyme-inhibitors. b. Hit count of enzyme-inhibitors. c. Success rate of other complexes. d. Hit count of other complexes.

https://doi.org/10.1371/journal.pone.0075936.g003

This data set includes 35 enzyme-inhibitor complexes and 64 other type complexes. For enzyme-inhibitor complexes, ASPDock has already made a high success rate without using any predicted information, 24 out of 35 complexes are successfully predicted (in top 2000 predictions). Using information provided by PPI-PRED, the success rate doesn’t increase significantly, and 25 out of 35 complexes were successfully predicted (in top 2000 predictions). However, the hit count number in top 2000 predictions increases from 742 to 2348 (Table 2). This improvement could help scorers easier to pick up the near-native structures using their scoring functions.

thumbnail
Table 2. Results of enzyme-inhibitor and other complexes predicted by ASPDock and SRM.

https://doi.org/10.1371/journal.pone.0075936.t002

For 64 complexes of other types, ASPDock successfully predicts 26 complexes in top 2000 predictions. This number increases to 31 (by 19%) by using SRM with binding site information from PPI-PRED. However, hit count in top 2000 doesn’t increase a lot, which is raised from 831 to 1094.

As a first stage sampling algorithm, the most important goal is obtaining as many hits as possible. For all of the 99 complexes, the number of correctly predicted complexes from ASPDock is 50, total hit count from ASPDock is 1573, and thus the average hit count for ASPDock is 31.5; By contrast, the number of correctly predicted complexes from SRM is 56, total hit count from SRM is 3442, therefore the average hit count for SRM is 61.5. Once more, it demonstrates that SRM is able to get a better success rate as well as larger average hit count. Here we noticed that the average hit count from SRM is increased to almost twice as from ASPDock, which is very useful for the scoring functions to pick up the correct structures from the top 2000 structures for each complex.

In the above results, all the hits are defined as structures with LRMSD≤10 Å, which are “acceptable predictions” in CAPRI criterion. In order to test how SRM performs on predicting “medium predictions”, we did another analysis by defining hits to be structures with LRMSD≤5 Å. Under this definition, For all of the 99 complexes, the number of correctly predicted complexes from ASPDock is 23, total hit count from ASPDock is 284, and thus the average hit count for ASPDock is 12.3; By contrast, the number of correctly predicted complexes from SRM is 31, total hit count from SRM is 834, therefore the average hit count for SRM is 26.9. This analysis indicates that even the criterion is stricter, the SRM still works better than ASPDock. We didn’t test the performance of SRM on predicting “high accuracy predictions” (LRMSD≤2.5 Å). Because without scoring function and structure refinement program, SRM, a sampling stage algorithm, is not supposed to be good at obtaining “high accuracy predictions”.

As mentioned in method section, the weight factor α value is searched from 1.0 to 3.0 by a step of 0.1, and we found the optimized value of α is 1.5, which can enhance the success rate when using correct information and tolerate some incorrect information. The weight factor α is the key parameter, it effects the success rate and hit count. For example, when the α is set as 2.0 and the criterion for hit is set as LRMSD≤10 Å, the number of correctly predicted complexes from SRM is 53, total hit count from SRM is 3051, therefore the average hit count for SRM is 57.6. The reason for the decrease is that when the α value is enhanced, the wrong information gets more weight, which may decrease the success rate. However, the optimized weight factor equal to 1.5 is based on the atomic solvation parameters scoring function in ASPDock. Other docking method based on different scoring functions may need different optimized weight factors.

The results on 21 antibody-antigen complexes and 11 dockground3.0 complexes demonstrate that by using proper weight factor, our protein-protein docking sampling method is sensitive to correct information and insensitive to incorrect information. Based on this feature, we only use purely predicted information to test 99 complexes in benchmark3.0. The result shows that the SRM can improve docking prediction significantly, even when the information used is not totally correct.

Conclusions

Results on antibody-antigen and dockground 3.0 complexes indicate that SRM is much more sensitive to correct information than wrong information. This implies that SRM is effective if we know all or some of the native binding sites. Moreover, SRM can tolerate some wrong information. Results on enzyme-inhibitor and other complexes show that using predicted information overall hit count number increases significantly and success rate is also raised. The result should be better if predicted information is more accurate.

In our test on 99 complexes from benchmark3.0, only purely theoretically predicted information is used. Currently, there are lot of great works focusing on enhancing the success rate of theoretical binding site prediction. It is believed that the theoretical binding site prediction method will be more accurate in the future due to those great works. We will keep on improving our SRM to utilize the theoretically predicted binding information more effectively. Combining the binding site prediction method and protein-protein docking method together to predict the protein-protein interaction should be more widely used in the future.

Methods

ASPDock

ASPDock is a docking algorithm based on FFT method [49]. Traditional FFT docking methods consider the shape complementarity as a crucial criterion to rank the predicted complex structures [2]. ASPDock implements atomic solvation parameters in traditional FFT method to rank the predicted complex structures. ASPDock performs better than the shape complementarity docking method on benchmark3.0 [52], and it also made successes in CAPRI rounds 18 and 19.

In ASPDock [49], receptor and ligand are projected on 3-dimensional grids as follows:(1)

ASP (atomic solvation parameters) value here depends on atom type, which is always a negative number. is a constant positive number, which is a penalty for protein-protein overlap. In this work  = 20. is the imaginary unit.

Then we can search the 3-dimensional translation space by calculating the correlation function:(2)

This calculation can be accelerated by using FFT method,(3)

For rotation scan, we use 10 degree step and pick up top 3 structures in each rotation. Grid step in translation scan is 1 Å.

Softly Restricted Method

Based on the ASPDock [49], we develop a softly restricting method (SRM) to utilize the predicted binding site information. The residues at the predicted binding sites are taken as key residues. We enhance the ASP value of these key residues by multiplying a weight factor α, and keep ASP values of other residues unchanged.(4)where is the original ASP value and is the enhanced ASP value of atom i. α>1 if atom i is expected to be on the interface. 0<α<1 if atom i is expected to be NOT on the interface. In this work, we don’t consider the later situation.

Then based on ASPDock, we can search the 6-dimensional space using instead of and pick up top N predictions. These N predictions should tend to bind at the key residues. The tendency could be adjusted by the weight factor α, and a larger α leads to a stronger tendency to bind at the key residues.

As shown by Huang in 2008 [41], success rate of predicting interface residues is only about 30%, there is a risk to use predicted information. Thus the weight factor α should be a moderate value and it cannot be a very large number. In this work a simple grid step method is used to optimize the weight factor α. We search α value from 1 to 3 by a step of 0.1, and found the optimized value of α is 1.5, which can enhance the success rate when using correct information and tolerate some incorrect information.

Dataset

Most docking algorithms can improve the predictions if correct information is used. However, if the information is incorrect, the post filtering algorithms and restrict algorithms would fail to predict near-native structure. Predicted information cannot be always correct. When using the predicted information, the crucial problem is to keep docking success rate not decreasing when information is incorrect.

In this work, 21 antibody-antigen complexes from benchmark3.0 [52] and dockground3.0 [53] are selected as our training set. Totally there are 30 non-redundant antibody-antigen complexes in benchmark3.0 and dockground3.0. For these antibody-antigen complexes, we only select the complexes that contain the entire Fab (Fragment of antigen binding region) structures. Because the complexes with entire Fab structures are difficult for docking programs without any information and their complementarity determining regions (CDR) could be detected by AbM definition or other prediction methods. Thus 9 out of 30 complexes are removed from our training set. Antibody proteins with Fab structures are well studied and their binding sites can be easily specified from their sequences. There are several different methods (http://www.bioinf.org.uk/abs/) to specify the CDR of antibodies. Here we use a simple method of AbM definition (http://www.bioinf.org.uk/abs/). The results have no significant change if we choose other methods. As the binding site of antibodies could be well predicted before docking, the antibody-antigen training set is suitable for assessing the SRM’s ability to use correct predicted information during docking procedure. We also randomly selected 10 surface but non-interface residues for each antibody as wrong information.

Antibody-antigen complexes are difficult to predict without predicted binding site information. Besides the antibody-antigen complexes, we also selected some other complexes which are easier to predict than antibody-antigen complexes. These complexes are selected from dockground3.0 rank1 and all of the bound-unbound complexes are removed. The redundant complexes compared to benchmark3.0 are also removed. After these filtering procedures, 17 complexes remain. Using our ASPDock, we successfully predicted (at least 1 hit in top 2000 predictions) 11 in 17 of these complexes. For each of these 11 complexes’ receptor, we randomly selected 10 interface residues as correct information and 10 surface but non-interface residues as incorrect information. Our training set is built up by these 11 complexes and 21 antibody-antigen complexes mentioned above with correct and incorrect information.

Enzyme-inhibitor and other type complexes of benchmark3.0 are selected as our test dataset. This test dataset totally contains 99 complexes, including 35 enzyme-inhibitor and 64 complexes of other types. We predicted the binding sites for each monomer in this dataset using PPI-PRED [37].

PPI-PRED

Five binding site prediction methods have been test on a data set in Huang and schroeder’s work. Success rate of these methods are from 14 to 34 percents. Among the five methods, PPISP [40] and PPI-PRED [37] have 34% and 33% success rate, respectively. PPI-PRED considers more sequence and structure features than PPISP and is selected as the prediction method in our work.

Criterion

LRMSD is the RMSD between the predicted and native ligand molecules after superposing the predicted and native receptor molecules. LRMSD is used as a criterion in CAPRI (Critical Assessment of PRediction of Interactions) [50]: predictions with LRMSD≤10 Å are considered as “acceptable predictions”; predictions with LRMSD≤5 Å are considered as “medium predictions”; predictions with LRMSD≤2.5 Å are considered as “high accuracy predictions”. This CAPRI style measure is widely used in protein-protein docking and scoring works. [14], [49], [54], [55]; In this work, a hit is defined as a predicted complex with LRMSD≤10 Å, which is an “acceptable prediction”. Since our SRM is a structure sampling method, which is the first stage of the entire docking algorithm, the LRMSD of acceptable structures could be decreased after some other refinement process. [56], [57], [58].

A residue is a surface residue if there is more than 10% relative residue surface area exposed to solvent, where the surface area is calculated by NACCESS (http://wolf.bms.umist.ac.uk/naccess). An interface residue is defined as a surface residue if the minimum distance of its atoms from the atoms of another protein in the native complex structure is less than 5 Å. We don’t use 10 Å as a criterion because it is useless if a predicted binding site is 10 Å away from interface. The radius of some small protein is no more than 20 Å. For each monomer, accuracy of prediction is calculated by . Here is the number of successful predicted interface residues, and is total number of predicted interface residues.

We used unbound-bound RMSD (UB-RMSD) and relative interface area to assess the difficulty to predict each complex. UB-RMSD is the RMSD between unbound and bound monomers. Relative interface area is the ratio of interface area and total complex area. Obviously a complex is difficult to predict if it has a large UB-RMSD of its monomers, or if it has a small relative interface area.

Our SRM is a first stage sampling method, which should be combined with some post processing methods. [47], [48], [59], [60] Currently, most post processing methods are able to handle at least 2000 structures. [55], [61], [62] The post processing methods are aiming at re-score the top 2000 (or even more) predictions and then pick up the best 10–20 predictions. Thus, for each docking prediction, we keep top 2000 predicted structures for further analysis.

Acknowledgments

Most of the calculations presented in this paper were carried out using the High Performance Computing Center experimental testbed in SCTS/CGCL.

Author Contributions

Conceived and designed the experiments: YX YH. Performed the experiments: LL. Analyzed the data: LL YH YX. Wrote the paper: YX YH LL.

References

  1. 1. Bork P, Jensen LJ, von Mering C, Ramani AK, Lee I, et al. (2004) Protein interaction networks from yeast to human. Current Opinion in Structural Biology 14: 292–299.
  2. 2. Katchalski-Katzir E, Shariv I, Eisenstein M, Friesem A, Aflalo C, et al. (1992) Molecular surface recognition: determination of geometric fit between proteins and their ligands by correlation techniques. Proceedings of the National Academy of Sciences of the United States of America 89: 2195–2199.
  3. 3. Heifetz A, Katchalski-Katzir E, Eisenstein M (2002) Electrostatics in protein–protein docking. Protein Science: A Publication of the Protein Society 11: 571–587.
  4. 4. Jackson RM, Gabb HA, Sternberg MJE (1998) Rapid refinement of protein interfaces incorporating solvation: application to the docking problem. Journal of molecular biology 276: 265–285.
  5. 5. Gabb HA, Jackson RM, Sternberg MJE (1997) Modelling Protein Docking using Shape Complementarity, Electrostatics and Biochemical Information. Journal of Computational Chemistry 272: 106–120.
  6. 6. Moont G, Gabb HA, Sternberg MJE (1999) Use of pair potentials across protein interfaces in screening predicted docked complexes. Proteins: Structure, Function, and Bioinformatics 35: 364–373.
  7. 7. Vakser I (1997) Evaluation of GRAMM low-resolution docking methodology on the hemagglutinin-antibody complex. Proteins: Structure, Function, and Bioinformatics 29: 226–230.
  8. 8. Chen R, Li L, Weng Z (2003) ZDOCK: an initial-stage protein-docking algorithm. Proteins: Structure, Function, and Bioinformatics 52: 80–87.
  9. 9. Mintseris J, Pierce B, Wiehe K, Anderson R, Chen R, et al. (2007) Integrating statistical pair potentials into protein complex prediction. Proteins: Structure, Function, and Genetics 69: 511–520.
  10. 10. Mandell J, Roberts V, Pique M, Kotlovyi V, Mitchell J, et al. (2001) Protein docking using continuum electrostatics and geometric fit. Protein Engineering Design and Selection 14: 105–113.
  11. 11. Palma PN, Krippahl L, Wampler JE, Moura JJG (2000) BiGGER: A new (soft) docking algorithm for predicting protein interactions. Proteins: Structure, Function, and Bioinformatics 39: 372–384.
  12. 12. Ritchie D, Kemp G (2000) Protein docking using spherical polar Fourier correlations. Proteins Structure Function and Genetics 39: 178–194.
  13. 13. Pierce B, Weng Z (2007) ZRANK: reranking protein docking predictions with an optimized energy function. Proteins: Structure, Function, and Bioinformatics 67: 1078–1086.
  14. 14. Huang S, Zou X (2008) An iterative knowledge-based scoring function for protein–protein recognition. proteins 72: 557.
  15. 15. Gray J, Moughon S, Wang C, Schueler-Furman O, Kuhlman B, et al. (2003) Protein–protein docking with simultaneous optimization of rigid-body displacement and side-chain conformations. Journal of molecular biology 331: 281–299.
  16. 16. Fernández-Recio J, Totrov M, Abagyan R (2002) Soft protein–protein docking in internal coordinates. Protein Science: A Publication of the Protein Society 11: 280–291.
  17. 17. Harris R, Olson A, Goodsell D (2008) Automated prediction of ligand-binding sites in proteins. Proteins: Structure, Function, and Bioinformatics 70: 1506–1517.
  18. 18. Dominguez C, Boelens R, Bonvin AMJJ (2003) HADDOCK: a protein-protein docking approach based on biochemical or biophysical data. Journal of the American Chemical Society 125: 1731–1737.
  19. 19. Ben-Zeev E, Eisenstein M (2003) Weighted geometric docking: incorporating external information in the rotation-translation scan. Proteins: Structure, Function, and Bioinformatics 52: 24–27.
  20. 20. Young L, Jernigan R, Covell D (1994) A role for surface hydrophobicity in protein-protein recognition. Protein Science: A Publication of the Protein Society 3: 717–729.
  21. 21. Conte L, Chothia C, Janin J (1999) The atomic structure of protein-protein recognition sites. Journal of molecular biology 285: 2177–2198.
  22. 22. Jones S, Thornton J (1996) Principles of protein-protein interactions. Proceedings of the National Academy of Sciences of the United States of America 93: 13–20.
  23. 23. Glaser F, Steinberg D, Vakser I, Ben-Tal N (2001) Residue frequencies and pairing preferences at protein-protein interfaces. Proteins Structure Function and Genetics 43: 89–102.
  24. 24. Zhou H, Shan Y (2001) Prediction of protein interaction sites from sequence profile and residue neighbor list. Proteins Structure Function and Genetics 44: 336–343.
  25. 25. Fariselli P, Pazos F, Valencia A, Casadio R (2002) Prediction of protein-protein interaction sites in heterocomplexes with neural networks. European Journal of Biochemistry 269: 1356–1361.
  26. 26. Pupko T, Bell R, Mayrose I, Glaser F, Ben-Tal N (2002) Rate4Site: an algorithmic tool for the identification of functional regions in proteins by surface mapping of evolutionary determinants within their homologues. Bioinformatics 18: s71–s77.
  27. 27. Panchenko A, Kondrashov F, Bryant S (2004) Prediction of functional sites by analysis of sequence and structure conservation. Protein Science: A Publication of the Protein Society 13: 884–892.
  28. 28. Chung J, Wang W, Bourne P (2006) Exploiting sequence and structure homologs to identify protein-protein binding sites. Proteins Structure Function and Bioinformatics 62: 630–640.
  29. 29. Luo L, Zhang S, Chen W, Pan Q (2009) Predicting protein-protein interaction based on the sequence-segmented amino acid composition. Acta Biophys Sin 25: 282–286.
  30. 30. Jones S, Thornton J (1997) Analysis of protein-protein interaction sites using surface patches1. Journal of molecular biology 272: 121–132.
  31. 31. Porollo A, Meller J (2007) Prediction-based fingerprints of protein–protein interactions. Proteins: Structure, Function, and Bioinformatics 66: 630–645.
  32. 32. Neuvirth H, Raz R, Schreiber G (2004) ProMate: a structure based prediction program to identify the location of protein-protein binding sites. Journal of molecular biology 338: 181–199.
  33. 33. Fernández A, Scheraga H (2003) Insufficiently dehydrated hydrogen bonds as determinants of protein interactions. Proceedings of the National Academy of Sciences of the United States of America 100: 113–118.
  34. 34. Liang S, Zhang J, Zhang S, Guo H (2004) Prediction of the interaction site on the surface of an isolated protein structure by analysis of side chain energy scores. Proteins: Structure, Function, and Bioinformatics 57: 548–557.
  35. 35. Bordner AJ, Abagyan R (2005) Statistical analysis and prediction of protein–protein interfaces. Proteins: Structure, Function, and Bioinformatics 60: 353–366.
  36. 36. Jones S, Thornton J (1997) Prediction of protein-protein interaction sites using patch analysis. Journal of molecular biology 272: 133–143.
  37. 37. Bradford J, Westhead D (2005) Improved prediction of protein-protein binding sites using a support vector machines approach. Bioinformatics 21: 1487–1494.
  38. 38. Chen H, Zhou H (2005) Prediction of interface residues in protein-protein complexes by a consensus neural network method: test against NMR data. Proteins: Structure, Function, and Bioinformatics 61: 21–35.
  39. 39. Liang S, Zhang C, Liu S, Zhou Y (2006) Protein binding site prediction using an empirical scoring function. Nucleic acids research 34: 3698–3707.
  40. 40. Qin S, Zhou H (2007) meta-PPISP: a meta web server for protein-protein interaction site prediction. Bioinformatics 23: 3386–3387.
  41. 41. Huang B, Schroeder M (2008) Using protein binding site prediction to improve protein docking. Gene 422: 14–21.
  42. 42. de Vries S, van Dijk A, Bonvin A (2006) WHISCY: what information does surface conservation yield? Application to data-driven docking. Proteins Structure Function and Bioinformatics 63: 479.
  43. 43. Gottschalk K, Neuvirth H, Schreiber G (2004) A novel method for scoring of docked protein complexes using predicted protein-protein binding sites. Protein Engineering Design and Selection 17: 183–189.
  44. 44. Krippahl L, Moura J, Palma P (2003) Modeling protein complexes with BiGGER. Proteins: Structure, Function, and Bioinformatics 52: 19–23.
  45. 45. Law DS, Eyck LFT, Katzenelson O, Tsigelny I, Roberts VA, et al. (2003) Finding needles in haystacks: Reranking DOT results by using shape complementarity, cluster analysis, and biological information. Proteins: Structure, Function, and Genetics 52: 33–40.
  46. 46. Chen R, Weng Z (2003) A novel shape complementarity scoring function for protein protein docking. Proteins: Structure, Function, and Bioinformatics 51: 397–408.
  47. 47. Zhang C, Liu S, Zhou Y (2005) Docking prediction using biological information, ZDOCK sampling technique, and clustering guided by the DFIRE statistical energy function. Proteins: Structure, Function, and Bioinformatics 60: 314–318.
  48. 48. Ma X, Li C, Shen L, Gong X, Chen W, et al. (2005) Biologically enhanced sampling geometric docking and backbone flexibility treatment with multiconformational superposition. Proteins: Structure, Function, and Bioinformatics 60: 319–323.
  49. 49. Li L, Guo D, Huang Y, Liu S, Xiao Y (2011) ASPDock: protein-protein docking algorithm using atomic solvation parameters model. BMC Bioinformatics 12: 36.
  50. 50. Janin J, Henrick K, Moult J, Eyck LT, Sternberg MJ, et al. (2003) CAPRI: a critical assessment of predicted interactions. Proteins: Structure, Function, and Bioinformatics 52: 2–9.
  51. 51. Zhou H, Zhou Y (2002) Stability scale and atomic solvation parameters extracted from 1023 mutation experiments. Proteins-Structure Function and Genetics 49: 483–492.
  52. 52. Hwang H, Pierce B, Mintseris J, Janin J, Weng Z (2008) Protein-protein docking benchmark version 3.0. proteins 73: 705–709.
  53. 53. Liu S, Gao Y, Vakser I (2008) DOCKGROUND protein-protein docking decoy set. Bioinformatics 24: 2634.
  54. 54. Wang C, Bradley P, Baker D (2007) Protein–protein docking with backbone flexibility. Journal of molecular biology 373: 503–519.
  55. 55. Liu S, Vakser I (2011) DECK: Distance and environment-dependent, coarse-grained, knowledge-based potentials for protein-protein docking. BMC Bioinformatics 12: 280.
  56. 56. Guharoy M, Janin J, Robert CH (2010) Side-chain rotamer transitions at protein-protein interfaces. Proteins: Structure, Function, and Bioinformatics 78: 3219–3225.
  57. 57. Qin S, Zhou H-X (2007) A holistic approach to protein docking. Proteins: Structure, Function, and Bioinformatics 69: 743–749.
  58. 58. Schueler-Furman O, Wang C, Baker D (2005) Progress in protein–protein docking: Atomic resolution predictions in the CAPRI experiment using RosettaDock with an improved treatment of side-chain flexibility. Proteins: Structure, Function, and Bioinformatics 60: 187–194.
  59. 59. Wiehe K, Pierce B, Mintseris J, Tong WW, Anderson R, et al. (2005) ZDOCK and RDOCK performance in CAPRI rounds 3, 4, and 5. Proteins: Structure, Function, and Bioinformatics 60: 207–213.
  60. 60. Hwang H, Vreven T, Pierce BG, Hung JH, Weng Z (2010) Performance of ZDOCK and ZRANK in CAPRI rounds 13–19. Proteins: Structure, Function, and Bioinformatics 78: 3104–3110.
  61. 61. Li L, Chen R, Weng Z (2003) RDOCK: Refinement of rigid-body protein docking predictions. Proteins: Structure, Function, and Bioinformatics 53: 693–707.
  62. 62. Huang S-Y, Zou X (2010) MDockPP: A hierarchical approach for protein-protein docking and its application to CAPRI rounds 15–19. Proteins: Structure, Function, and Bioinformatics 78: 3096–3103.