In this study, we suggest that taking into account multiple templates offers two opportunities: (i) the ability to improve contact map prediction and (ii) the capacity to capture more than a single snapshot of the protein's conformational space. The latter is very appealing when different experimentally-derived 3D structures of identical, or highly similar, sequences are available. Such data, which are often considered as “redundant” when sequence-based redundancy removal procedures are used, may contain information about alternate conformations (Dan, et al., 2010; Kosloff and Kolodny, 2008; Zhang, et al., 2007).
However, taking into account multiple templates raises the challenge of avoiding bias towards sequences that are overrepresented in the databases. We suggested that weighting the templates according to their evolutionary distances from the target protein should be adequate for this purpose.
Towards this end, the evolutionary distances between each template and the target protein were estimated according to the evolutionary tree that was constructed by SEMPHY (Ninio, et al., 2007) based on the protein sequences. Then, templates (t) were grouped according to their evolutionary distances, so all templates with a distance of less than 0.02 to the group's common ancestor were grouped together. Then, for each group (g) a binary contact map (M) was greedily constructed, where all interactions (between residue i and residue j) found in at least one of the group members were noted:
Finally, the groups’ contact maps are summed-up in a weighted fashion to construct the predicted contact map. The prediction score (Si,j) for each pair of residues is calculated as follow:
Where L is the length of the target protein, G is the number of groups, and Wg is the group's weight. The weight of the contact map of each group was calculated to be equal to d-3, where d is the evolutionary distance and the third power was empirically determined. This method was named ‘Weighted Multiple Conformations’ (WMC).
REFERENCES
Dan, A., Ofran, Y. and Kliger, Y. (2010) Large-scale analysis of secondary structure changes in proteins suggests a role for disorder-to-order transitions in nucleotide binding proteins, Proteins, 78, 236-248.
Kosloff, M. and Kolodny, R. (2008) Sequence-similar, structure-dissimilar protein pairs in the PDB, Proteins, 71, 891-902.
Ninio, M., et al. (2007) Phylogeny reconstruction: increasing the accuracy of pairwise distance estimation using Bayesian inference of evolutionary rates,Bioinformatics, 23, e136-141.
Zhang, Y., Stec, B. and Godzik, A. (2007) Between order and disorder in protein structures: analysis of "dual personality" fragments in proteins, Structure, 15,1141-1147.
Download
The WMC source code, freely available for academic use, along with a short tutorial is availble here
Citation
Haim Ashkenazy, Ron Unger and Yossef Kliger. 2011.
Hidden conformations in protein structures
Bioinformatics. 2011 Jul 15;27(14):1941-7. [ABS],[PDF]
Contact
For any questions or suggestions please contact
Haim Ashkenazy