Template based Contact Map Predictor

About
Prediction of interactions between protein residues (contact map prediction) can facilitate various aspects of 3D structure modeling. However the accuracy of ab-initio contact prediction is still limited. As structural genomics initiatives move ahead, solved structures of homologous proteins can be used as multiple templates to improve contact prediction of the major conformation of an unsolved target protein. Furthermore, multiple templates may provide a wider view of the protein’s conformational space. However, successful usage of multiple structural templates is not straightforward, due to their variable relevance to the target protein, and because of data redundancy issues.

WMC is an algorithm that addresses these two limitations in the use of multiple structure templates. First, the algorithm unites contact maps extracted from templates sharing high sequence similarity with each other in a fashion that acknowledges the possibility of multiple conformations. Next, it weights the resulting united maps in inverse proportion to their evolutionary distance from the target protein.

Testing this algorithm against CASP8 targets resulted in high precision contact maps. Remarkably, based solely on structural data of remote homologues, our algorithm identified residue-residue interactions that account for all the known conforma-tions of Calmodulin, a multi-faceted protein. Therefore, employing multiple templates, which improves prediction of contact maps, can also be used to reveal novel conformations.

As multiple templates will soon be available for most proteins, our scheme suggests an effective procedure for their optimal consideration.



Methodology
In this study, we suggest that taking into account multiple templates offers two opportunities: (i) the ability to improve contact map prediction and (ii) the capacity to capture more than a single snapshot of the protein's conformational space. The latter is very appealing when different experimentally-derived 3D structures of identical, or highly similar, sequences are available. Such data, which are often considered as “redundant” when sequence-based redundancy removal procedures are used, may contain information about alternate conformations (Dan, et al., 2010; Kosloff and Kolodny, 2008; Zhang, et al., 2007).

However, taking into account multiple templates raises the challenge of avoiding bias towards sequences that are overrepresented in the databases. We suggested that weighting the templates according to their evolutionary distances from the target protein should be adequate for this purpose.

Towards this end, the evolutionary distances between each template and the target protein were estimated according to the evolutionary tree that was constructed by SEMPHY (Ninio, et al., 2007) based on the protein sequences. Then, templates (t) were grouped according to their evolutionary distances, so all templates with a distance of less than 0.02 to the group's common ancestor were grouped together. Then, for each group (g) a binary contact map (M) was greedily constructed, where all interactions (between residue i and residue j) found in at least one of the group members were noted:

Finally, the groups’ contact maps are summed-up in a weighted fashion to construct the predicted contact map. The prediction score (Si,j) for each pair of residues is calculated as follow:

Where L is the length of the target protein, G is the number of groups, and Wg is the group's weight. The weight of the contact map of each group was calculated to be equal to d-3, where d is the evolutionary distance and the third power was empirically determined. This method was named ‘Weighted Multiple Conformations’ (WMC).


REFERENCES

Dan, A., Ofran, Y. and Kliger, Y. (2010) Large-scale analysis of secondary structure changes in proteins suggests a role for disorder-to-order transitions in nucleotide binding proteins, Proteins, 78, 236-248.

Kosloff, M. and Kolodny, R. (2008) Sequence-similar, structure-dissimilar protein pairs in the PDB, Proteins, 71, 891-902.

Ninio, M., et al. (2007) Phylogeny reconstruction: increasing the accuracy of pairwise distance estimation using Bayesian inference of evolutionary rates,Bioinformatics, 23, e136-141.

Zhang, Y., Stec, B. and Godzik, A. (2007) Between order and disorder in protein structures: analysis of "dual personality" fragments in proteins, Structure, 15,1141-1147.



Download

The WMC source code, freely available for academic use, along with a short tutorial is availble here



Citation

Haim Ashkenazy, Ron Unger and Yossef Kliger. 2011.
Hidden conformations in protein structures
Bioinformatics. 2011 Jul 15;27(14):1941-7. [ABS],[PDF]



Contact

For any questions or suggestions please contact Haim Ashkenazy