Template based Contact Map Predictor
Please note that the WMC source code is freely distributed for academic use only (see copyrights statement below)
1. Download and configure WMC
2. Running WMC
Download and Configure WMC
The following instructions should work right out of the box for
UNIX-like systems. Mac should also work in principle, but is not yet
supported. Windows will require some additional work, such as setting up a cygwin environment.
1. Download the WMC source code.
2. Unzip and untar the files:
tar -xzvf WMC.v1.01.tgz
This will create a directory named WMC.v1.01
3. Check if you have the desired programs installed:
- MAFFT: Type "mafft" and check that you have version 6.712 or newer.
- Else download and install MAFFT from:
http://mafft.cbrc.jp/alignment/software/
- Repalce 'PUT MAFFT FULL PATH HERE' on the WMC_CONSTANTS.pm file (located on your WMC.v.1.01 directory) with the path of your MAFFT installation
- NCBI-PSI-BLAST: Type "which blastpgp" and check that you have it insalled
- Else follow the instructions for downloading and installing NCBI-PSI-BLAST from:
http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/blastpgp.html
- Repalce 'PUT BLASTPGP FULL PATH HERE' on the WMC_CONSTANTS.pm file (located on your WMC.v.1.01 directory) with the path of your PSI-BLAST installation
- SEMPHY: Type "semphy -h" and check that you have semphy installed.
- CSU: Check that you have CSU installed.
- Else download and install CSU from:
http://ligin.weizmann.ac.il/space/programs/
- Repalce 'PUT CSU FULL PATH HERE' on the WMC_CONSTANTS.pm file (located on your WMC.v.1.01 directory) with the full path of your CSU installation
4. When searching for templates, WMC requires the following protein sequence databases indexed using NCBI-BLAST formatdb program (see more about formatdb here)
- Non-Redundant protein sequence database e.g. UNIREF100, NCBI-NR
- Repalce 'PUT NR SEQ DB FULL PATH HERE' on the WMC_CONSTANTS.pm file (located on your WMC.v.1.01 directory) with the full path of your protein sequence NR DB
- PDB sequences - Currently the PDB sequences are requierd to be in the format used in the Dunbrack lab and can be downloaded from here
- Repalce 'PUT PDB SEQ DB FULL PATH HERE' on the WMC_CONSTANTS.pm file (located on your WMC.v.1.01 directory) with the full path of your PDB sequences database
- Proteins structures (OPTIONAL) It is recomendded to have a local copy of the structurs from the Protein Data Bank (PDB) which can be downloaded from: here
For cases in which the desired template structure is not found locally, WMC try to retrive it from the PDB FTP service
- Repalce 'PUT PDB STRUCTURES DB FULL PATH HERE' on the WMC_CONSTANTS.pm file (located on your WMC.v.1.01 directory) with the full path of your PDB structures dir (e.g. /biodb/PDB/data/structures/divided/pdb/)
5. WMC also uses Perl and BioPerl:
- Type "perl -v" and check that you have Perl installed.
- Type "perl -e 'use Bio::SeqIO'" to check that you have BioPerl.
For any problems - please contact me
Running WMC
Run the Perl script: WMC.pl
WMC uses flags in the command line arguments: (for help, type:
"perl WMC.pl")
USAGE: perl WMC.pl --Target_Seq --Out_Path --Out
Required parameters:
--Target_Seq: Input sequence file in FASTA format IMPORTANT: in this release, only sequence names of type >NAME1_NAME2 are supported (e.g. >My_PROT)
--Out_Path: Output directory that will be created automatically and hold all output files
--Out: name for the prediction file
Optional parameters:
--Templates_List : List of Templates PDB ID to Use (PDB ID FORMAT: 1CLLA for 1CLL cahin A) - don't look for templates
--PSI_Blast : Blast File vs. PDB to start from
--Target_PDB_ID : If given prediction file with the extracted contact map (true lable) will be created (PDB ID should be in 1CLL_A forrmat for entry 1CLL chain A)
--Power : Considering the 1/Distance^Power as the contact map group's weight (default Power=3)
--Q_Align : Minimal percent of alignment overlap out of the query length (default=0.5)
--S_Align : Minimal percent of alignment overlap out of the subject length (deafult=0.5)
--Min_ID : Minimal percent of sequence identity (deafult=0)
--Min_Length : The minimal length of template to consider in AA sence (rather than percent of the target length) (deafult= 50% of target length)
--Best_Templates <NUMBER> : Consider only best NUMBER of templates
--ID_Cutoff : Indicate whether the PDB templates determined using identity cutoff as sepecified on Min_ID Var
--E_Value_Cutoff : Indicate whether the PDB templates determined using E-Value cutoff, default, to disable use --noE_Value_Cutoff
--NR_Struct : Indicate whether to take only one uniq PDB for each template (i.e discard 'reduandant' information)
--Only_Overlap : Consider only the number of overlapping sequences in both positions for the averaging (without gaps) rather than the number of all templates
--Phylo_Mode : Consider the phylogeneteic information in a weighted fashion
--Phylo_Sum_Mode : Consider the phylogeneteic information in a weighted fashion (not average but only sum)
--GODZIK_Mode : Find templates according to PDB-BLAST procedure described by Godzik and colleagues (Protein Science 2000) - default
--Entropy_Like : Consider the amount of information in the paired columns
--Clusters : Group contact maps according to the phylogenetic tree constructed (especially desigend to consider 'reduandant data' without bias) - default, to disable use --noClusters
--Include_Very_Close : Consider also very close homologs structures (including itsels if exist)
EXAMPLES:
- perl WMC.pl --Target_Seq protein.fs --Out_Path /home/somewhere/protein_WMC_Pred/ --Out /home/somewhere/protein_WMC_Pred/protein.WMC
- will run WMC for the sequence in protein.fs
- perl WMC.pl --Target_Seq protein.fs --Out_Path /home/somewhere/protein_WMC_Pred/ --Out /home/somewhere/protein_WMC_Pred/protein.WMC --Target_PDB_ID 1PDB_A
- will run WMC and also create a file with true lable based on the given PDB_ID
For any problems or questions
please contact me at
haim.ashkenazy@gmail.com
Enjoy!
Copyrights:
- To modify the code, or use parts of it for other purposes, permission should be requested. Please contact Haim Ashkenazy
- Please note that the WMC program is for academic use only!
- When using WMC please cite: Ashkenazy H., Unger R. and Kliger Y.; Hidden conformations in protein structures; Bioinformatics. 2011 Jul 15;27(14):1941-7. [ABS],[PDF]