Introduction

A schematic of the algorithm used to generate a database of microenvironments from available PDB structures. A) 43,327 receptor-ligand complexes were identified with 202,584 total ligands. B) Each ligand was fragmented into its constituent molecular parts. C) Geometric rays, separated by 10° in all directions, were extended from each fragment atom out into space. D) These rays were used to identify microenvironment receptor residues. E) A ligand-receptor distance cutoff was implemented. The cutoff was gradually scaled back from 4 Å to 0 Å, and receptor residues beyond the cutoff were discarded at every step. In this way, multiple microenvironments were identified for each molecular fragment. Only those microenvironments with 3, 4, and 5 receptor residues, 823,460 in total, were subsequently considered.








CrystalDock is a computer algorithm that aids the computational identification of molecular fragments predicted to bind a receptor pocket of interest. CrystalDock makes direct use of crystallographic and NMR structures from the Protein Data Bank (PDB). As of June 2011, the PDB contained 73,503 molecular models, 52,253 of which included ligands. While long-range electrostatic interactions certainly can influence ligand binding, for many ligands the predominant interactions required for molecular recognition are often with receptor atoms that immediately line the active site. When these key residues are grouped by proximity, they form "microenvironments" that are only compatible with the binding of certain molecular fragments. The PDB provides ample experimental data from which these microenvironments can be identified and characterized.




A schematic of the algorithm used to position (i.e., “dock”) binding fragments into a pocket of interest. A) CrystalDock sends out rays to identify the receptor residues that line the binding pocket. B) CrystalDock searches through the database of predefined microenvironments in an attempt to find geometric matches. C) Though the RMSD alignment considers only receptor residues (i.e., the residues of the microenvironments), the structures include models of the original ligand fragments as well; RMSD alignment essentially docks these molecular fragments into the binding pocket of interest.














CrystalDock identifies the microenvironments of an active site of interest and then performs a geometric comparison to identify similar microenvironments present in ligand-bound PDB structures. Germane fragments from the crystallographic or NMR ligands are subsequently placed within the novel active site. These positioned fragments can then be linked together to produce ligands that are likely to bind the pocket of interest; alternatively, they can be joined to an inhibitor with a known or suspected binding pose to potentially improve binding affinity.

Download

CrystalDock has been tested on Ubuntu Linux 10.04.1 LTS, Mac OS X 10.6.6, and Windows XP using Python versions 2.6.5, 2.6.1, and 2.7.1, respectively. The program also requires SciPy and NumPy.

You can download a copy of CrystalDock in the ZIP format. On unix-based systems, the file can be uncompressed by typing unzip crystal_dock.zip  into the terminal.

Command-Line Parameters

The following command-line parameters are available:

Parameter Default Description
-receptor_pdb_file REQUIRED A pdb file of the receptor into which fragments will be docked.
-user_specified_residue OPTIONAL (if absent, auto-detect) By default, CrystalDock determines the number of active-site-lining residues automatically. However, the user can specify these residues explicitly if desired. A residue is specified by entering the residue index, name, and chain, separated by underscores (i.e., 371_ARG_A).
-output_directory ./ The directory where the docking results will be written. This directory must not exist prior to running CrystalDock.
-microenvironment_database_directory ./microenvironment_database/ The directory containing the microenvironment database.
-pocket_center_x 0.0 The x-coordinate of a point specifying the location of the active site or active-site region of interest.
-pocket_center_y 0.0 The y-coordinate of a point specifying the location of the active site or active-site region of interest.
-pocket_center_z 0.0 The z-coordinate of a point specifying the location of the active site or active-site region of interest.
-pocket_radius 5 The radius of a sphere, centered on the location coordinate above, that defines the region of the receptor surface that will be analyzed. Must be an integer.
-filter_steps_one_and_two_tolerance 2.0 Performing full RMSD alignments of all active-site and database microenvironments is too computationally intensive. Consequently, microenvironments go through a number of "filters" to try to eliminate ones that are geometrically dissimilar without having to perform the full RMSD alignment. This commandline parameter can be used to control the first two steps of this filter. See the source code for more details. 2.0 angstroms by default.
-filter_step_three_CA_rmsd_cutoff 2.5 A preliminary, fast RMSD alignment is performed using only the alpha carbons of the active-site and database microenvironments to ensure that they are generally geometrically similar. This commandline variable specifies the maximum allowable RMS distance following alignment. 2.5 angstroms by default.
-filter_step_four_side_chain_angle_cutoff 100.0 Following the preliminary RMSD alignment, the program analyzes the angle between the side chains of aligned residues. If any of these side chains are not generally pointing in the same direction, the microenvironments are judged to be dissimilar. This commandline variable specifies the maximum allowable angle between aligned side chains. 100.0 degrees by default.
-filter_step_five_rmsd_cutoff 1.5 A final, more rigorous RMSD alignment is performed using all the common heavy atoms of the active-site and database microenvironments to ensure that they are geometrically similar. This commandline variable specifies the maximum allowable RMS distance following alignment. 1.5 angstroms by default.
-filter_step_six_steric_clash_cutoff 2.0 A docked fragment is eliminated if it comes too close to a protein-receptor atom. This commandline variable specifies the minimum allowable distance. 2.0 angstroms by default.
-use_microenvironments_of_3_residues TRUE By default, CrystalDock identifies receptor microenvironments containing three, four, and five residues. If this flag is set to "FALSE", microenvironments of three residues will not be considered.
-use_microenvironments_of_4_residues TRUE If this flag is set to "FALSE", microenvironments of four residues will not be considered.
-use_microenvironments_of_5_residues TRUE If this flag is set to "FALSE", microenvironments of five residues will not be considered.
-num_processors Max on machine The number of processors to use.

Description of Output Files

Three items are placed in the output directory:

1.orig_files (subdirectory) A directory containing all the identified fragments, in PDB format.
2.non_redundant_files (subdirectory) A directory containing all the identified fragments with redundant fragments (RMSD < 0.5) removed.
3.ranked_compounds.txt A text file listing all the non-redundant fragments. Fragments with microenvironments whose amino-acid matches are exact always rank better than those whose matches are merely similar. If fragments cannot be distinguished based on this criteria, fragments with microenvironments containing greater numbers of amino acids are given precedence over those with fewer amino acids. Finally, all other things being equal, the fragments are ranked by the RMSD between their associated microenvironments and the identified microenvironments of the novel binding pocket.

Examples

Find fragments predicted to bind to a protein called 1XDN.pdb. The region of the active site to be analyzed is located at (37.6, 23.2, 13.4). Only the region within 6 angstroms of the specified coordinate should be considered. The output will be written to a directory called ./my_output/:

python crystal_dock.py -receptor_pdb_file 1XDN.pdb -pocket_center_x 37.6 -pocket_center_y 23.2 -pocket_center_z 13.4 -pocket_radius 6 -output_directory ./my_output/

Same as above, but only microenvironments containing three residues should be considered:

python crystal_dock.py -receptor_pdb_file 1XDN.pdb -pocket_center_x 37.6 -pocket_center_y 23.2 -pocket_center_z 13.4 -pocket_radius 6 -output_directory ./my_output/ -use_microenvironments_of_3_residues true -use_microenvironments_of_4_residues false -use_microenvironments_of_5_residues false

Same as above, but rather than allowing CrystalDock to auto-detect active-site-lining residues, the user specifies those residues explicitly:

python crystal_dock.py -receptor_pdb_file 1XDN.pdb -pocket_center_x 37.6 -pocket_center_y 23.2 -pocket_center_z 13.4 -pocket_radius 6 -output_directory ./my_output/ -use_microenvironments_of_3_residues true -use_microenvironments_of_4_residues false -use_microenvironments_of_5_residues false -user_specified_residue 371_ARG_A -user_specified_residue 118_ARG_A -user_specified_residue 292_ARG_A -user_specified_residue 406_TYR_A -user_specified_residue 347_TYR_A

Same as above, but now with altered filter parameters:

python crystal_dock.py -receptor_pdb_file 1XDN.pdb -pocket_center_x 37.6 -pocket_center_y 23.2 -pocket_center_z 13.4 -pocket_radius 6 -output_directory ./my_output/ -use_microenvironments_of_3_residues true -use_microenvironments_of_4_residues false -use_microenvironments_of_5_residues false -user_specified_residue 371_ARG_A -user_specified_residue 118_ARG_A -user_specified_residue 292_ARG_A -user_specified_residue 406_TYR_A -user_specified_residue 347_TYR_A -filter_steps_one_and_two_tolerance 4.0 -filter_step_three_CA_rmsd_cutoff 4.0 -filter_step_four_side_chain_angle_cutoff 180.0 -filter_step_five_rmsd_cutoff 4.0 -filter_step_six_steric_clash_cutoff 4.0

Same as above, but explicitly instruct the program to run on 2 processors:

python crystal_dock.py -receptor_pdb_file 1XDN.pdb -pocket_center_x 37.6 -pocket_center_y 23.2 -pocket_center_z 13.4 -pocket_radius 6 -output_directory ./my_output/ -use_microenvironments_of_3_residues true -use_microenvironments_of_4_residues false -use_microenvironments_of_5_residues false -user_specified_residue 371_ARG_A -user_specified_residue 118_ARG_A -user_specified_residue 292_ARG_A -user_specified_residue 406_TYR_A -user_specified_residue 347_TYR_A -filter_steps_one_and_two_tolerance 4.0 -filter_step_three_CA_rmsd_cutoff 4.0 -filter_step_four_side_chain_angle_cutoff 180.0 -filter_step_five_rmsd_cutoff 4.0 -filter_step_six_steric_clash_cutoff 4.0 -num_processors 2