SI2011 track 3 Multiple receptor virtual screening

From NBCRwiki

Jump to: navigation, search

Back to Track 3


Multi-Receptor Virtual Screening Tutorial

This tutorial describes how to use multiple receptors in a virtual screening workflow implemented in the CADD pipeline. Use of multiple receptor structures in virtual screening experiments is sometimes called "ensemble docking," and there are a number of ways to select receptor structures. For example, multiple wild-type crystal structures can be used as they were here. Alternatively, wild-type and mutant crystal structures may be combined to search for ligands that interact well with both the wild type and the mutant, circumventing resistance issues, as was done here. Another alternative makes use of multiple conformations generated during MD simulations, examples of which can be found here, here and here, see also here for a critical analysis. This tutorial focuses on utilizing conformations generated during MD simulations, and while this approach has a number of success stories (see references above for a handful of examples), you need not limit yourself to this method and are encouraged to think critically of the all the approaches.

  • Please download the file, which contains the required inputs and example outputs.
  • If you have any questions regarding any section of the tutorial, please ask a proctor.

Preparing the System

In this tutorial, two receptor conformations and two ligands will be used to perform 4 docking calculations. The small ligand library includes a Trypanosoma brucei RNA editing ligase 1 (TbREL1) inhibitor, and the ATP substrate. These 2 compounds will be docked into the TbREL1 crystal structure and the centroid of the first cluster, which you determined yesterday. To perform a docking experiment on a protein receptor using AutoDock4 (AD4), the CADD workflow requires 4 files.

  • The first is a PDB structure file describing the coordinates of the protein atoms.
  • The second is a template DPF file, that specifies docking parameters that influence the run-time behavior of AutoDock4.
  • The third is a template GPF file that tells AutoGrid where to position the grid center, how large to make the grid, and how densely the grid points should be spaced.
  • The fourth is a PDBQT file listing the Gasteiger charges and AD4 atom types for each ligand atom

See the AutoDock tutorials, found here, for more on DPF and GPF files and the PDBQT format.

We'll say more about ligand libraries below, but for the purpose of this exercise, the ligand PDBQT files have already been prepared, and we'll focus on preparing the appropriate PDB, GPF and DPF files. For the cluster centroid, nothing more needs to be done; all the necessary input files were generated yesterday and are included in the /SI_VS/Input directory. The crystal structure, on the other hand, will require a few steps of preparation.

Process the PDB in VMD

First, we need to acquire a pdb file of the TbREL1 crystal structure from the Protein Data Bank. VMD is a good tool for this. Launch VMD (by now we assume you are familiar with VMD. If you need help, ask a proctor!)

  • In the VMD Main window, select File -> New Molecule
  • Type 1XDN into the Filename box and hit enter. Under Determine file type, it should say Web PDB Download. Click Load. This downloads TbREL1 (PDB ID 1XDN) from the RCSB PDB databank

Have a look at the structure. You should see several waters floating around the protein, and if you look closely, you'll be able to pick out the ATP substrate and a magnesium cofactor.

  • If it's not open already, open the graphical user interface window (In the VMD main window: graphics->representation). In the select atoms box type protein. Using the Drawing Method pull down menu, select New Cartoon. Using the Coloring Method pull down menu, change the coloring method to secondary structure to make the helices and sheets stand out clearly.
  • Click on the Create Rep button at the top of the Graphical Representations window and type in resname ATP, set the drawing method to licorice, and the coloring method to name.
  • Create another representation with the Create Rep button, but this time type resname MG, set the drawing method to VDW, and the Coloring Method to name.

Things should be easier to see now. Can you pick out ATP and magnesium, both bound in the enzyme active site? For the purpose of docking, we want to make a separate file for just the protein. Although we can do this in many ways, we'll use VMD.

  • Open a TkCon window by selecting Extensions -> Tk Console. The TK Console command line functions similarly to the command line in Unix or DOS (for windows folks) or the MAC terminal window.
  • Navigate to the SI_VS/Input directory by using the 'cd' command in the TK Console window.
  • Next, select the protein and write it out to a PDB file:
set prot [atomselect top "protein"]
$prot writepdb 1xdn.pdb

GPF/DPF templates

GPF and DPF template preparation was discussed in the RMSD clustering tutorial. Feel free to refer back to it if you need a refresher.


We can use the same GPF template for the crystal structure and the cluster centroid. You should already have a GPF template from yesterday's RMSD clustering exercise in the SI_VS/Input directory entitled 1frame_8.gpf.

  • Using a text editor open SI_VS/Input/1frame_8.gpf, and verify that it contains the following:
 npts 86 72 78
 gridcenter 40.34 18.53 13.02
  • Now, save the file as 1xdn.gpf in the SI_VS/Input directory. You should now have two GPF files in your SI_VS/Input directory, 1xdn.gpf and 1frame_8.gpf, and their contents should be identical.


Similar to the GPF template, we can use the same DPF template for the crystal structure and the cluster centroid. You should already have a DPF template from yesterday's RMSD clustering exercise in the SI_VS/Input directory entitled 1frame_8.dpf.

  • Open SI_VS/Input/1frame_8.dpf in a text editor and verify that it contains the following:
 ga_pop_size 200 
 ga_run 10                          
 ga_num_evals 50000 
 rmstol 2.0
  • Save this file in the SI_VS/Input directory, calling it 1xdn.dpf . You should now have two DPF files in the SI_VS/Input directory, 1xdn.dpf and 1frame_8.dpf, and their contents should be identical.

Running the Virtual Screen

Small molecule databases

In virtual screening, you will dock many ligands from a library of small molecules to one or more receptors. Ligand libraries can be prepared with a simple script but many come already prepared. There are a few well-curated, publicly available small molecule databases that can be used for a variety of virtual screen or docking applications.

The NCI diversity set is distributed from the National Cancer Institute and contains approximately 1900 chemically diverse compounds. It is a good place to start for efficient "search in the dark" type applications. The NCI makes these compounds (and those listed in the full NCI database) available for the experimentalists to test, so it is a good place to start. It has been processed for use in AutoDock4 and is available from NBCR for your docking experiments. The ZINC database is maintained by the Shoichet lab at UCSF and is an excellently curated very large small molecule database. A variety of file formats are maintained. See . The NBCR also houses an AutoDock4 curated version of one of the recent ZINC library distributions. The Available Chemicals Directory (ACD) and Hit2Lead are other popular small molecule databases.

While these are good databases to consider for your own research, we won't have the time to carry out a screen on even the smallest of them. During today's exercise, you'll dock a TbREL1 inhibitor and ATP, a TbREL1 substrate.

Running the Workflow

  • Open the CADD pipeline
  • Choose Virtual Screening-> Load Workflow -> AutoDockVSlocal

This loads a virtual screening workflow that makes use of a "local" compound library stored on your computer. This workflow is ideal if you've generated a target specific or proprietary compound library that is housed locally. Alternatively, many of the databases mentioned above are available through the AutoDockVSpublic workflow, which functions similarly to the VSlocal workflow that will be used today. Once you can use the VSlocal workflow, applying the VSpublic workflow should be straightforward. The 2 compound library that will be docked today is located in the SI_VS/Library directory. There, you should find 2 PDBQT files, one for each compound.

The workflow in your CADD GUI should look similar to the image below.

  • If you double click on certain nodes, they will lead you to another workflow that is contained within that node.
  • Mouse over any node or port to see specific parameters about that node or port.


The following is a description of the nodes in the network that you will need to change to specify parameters.

  • Inputs
    • GetStructuresFromDir will get a list of PDBs/PQRs/PDBQTs from the user as a list of input receptors and each one of the these receptor files will be later used to run virtual screening. This directory should also contain template GPFs and template DPFs for each receptor. The workflow assumes that the receptor PDB files and their corresponding template GPFs/DPFs have the same base filename. Browse for the SI_multi/Input, the directory where you stored the PDB files and template GPFs and DPFs.
    • Local Ligand Directory Browser specifies where the local compound library is stored on your computer. Set this to SI_VS/Library
    • Output Directory Browser this indicates where the CADD pipeline will write the results. One directory will be written for each receptor. Each receptor directory will contain AutoDock4 files, as well as directories for each docked compound, which contain the corresponding docking results. Set this to SI_VS/Output
    • PreserveCharges? is a checkButton node so that the charges are preserved for PrepareReceptor if checked. Make sure this is checked.
    • FilterLigandNode allows the user to filter the ligands. Advanced users may want to use a customized filter. The user can get the custom filter parameter panel pop up by right clicking on the FilterLigandNode and then select Parameter Panel. The user can then use the thumbwheels to adjust values for different parameters. Note that our server is unable to support ligand libraries that have more than 2500 ligands. This option will be left at the default for this tutorial. Here is an image of the custom filter parameter panel:


  • Other
    • Iterate will go through the PDB list and perform virtual screening on each receptor.
    • InputValidation verifies that the input meets the criteria of the specified calculation(s).
    • DownloadSaveDir downloads the virtual screening results to the parent directory that contains the input directory to the GetStructuresFromDir node. The output will be organized by receptor. There will be directories created that correspond to each individual receptor.
    • The VirtualScreening node is where all the information is sent and the screen is ran. Here's a break down of what is contained in that node To see the following nodes, double click on the VirtualScreening node:




This node first runs the PDB2PQR web service, and then it runs the PrepareReceptor web service. The PQR web service assigns protons to the PDB structure, optimizes the hydrogen bond network, and will add a limited number of missing heavy atoms. A more detailed description can be found here. The PrepareRceptor web service then converts the PQR file to a PDBQT file, which AD4 requires to run. If the directory that contains a .pqr or a .pdbqt file with the same base name, then PDB2PQR/PrepareReceptor will be skipped, respectively.

  • Inputs
    • The receptor directory specified in the main workflow contains the input files. All of the PDBs in that directory are used as inputs here.
  • Outputs
    • Prepared receptor objects
    • URL strings to the prepared receptor or a path to a pdbqt file on the local machine, if already exists.

The user can double click on this macro PrepareReceptor to see how it is composed. The Pdb2pqrWS and PrepareReceptorWS nodes inside PrepareReceptor are also macro nodes composed of other nodes:




This node runs the PDB2PQR web service, which assigns the proper protonation states for the protein.




This node runs the PrepareReceptor web service, which prepares all the receptor pdbqt files that are required for AutoDock4.




The ComputeGrids node first runs PrepareGPF, which prepares a GPF based on the ligand library atom types and then runs AutoGrid.

  • Inputs
    • A LigandDB object, which may be an output from LocalLigandDirectory, PublicServerLigandDB , UrlLigandDB, or FilterLigandNode. In our case, this comes from the LocalLigandLigandDirectory node, which pulls PDBQT files from the directory we specified in the Local Ligand Directory Browser node specified in the main workflow.
    • A prepared receptor object, which is the output from the previous step, PrepareReceptor.
    • A gpf_template object, which you created while setting up the system. This is specified by the GetStructuresFromDir input node from the main workflow.
  • Outputs
    • AutoGrid results of the type autogrid_result
    • AutoGrid result as a string URL

Here is the inside of the ComputeGrids macro:




The AutodockVS node runs AutoDock based on the ligand library as a web service. The web service uses a ~200 processor processor cluster and executes AutoDock jobs in parallel.

  • Inputs
    • LigandDB object, which is again specified from the PublicServerLigandDB node on the main workflow.
    • AutoGrid object, which is the output from the ComputeGrids step.
    • A dpf_template object, which was created when you set up the system.
  • Outputs
    • URL to AutoDock virtual screening results. Note only the top 500 hits will be displayed by default.

Here is the inside of the AutodockVS node:


Click the yellow lightening bolt button on the task bar to run the workflow. You know the workflow is running when nodes are outlined in bright red. When no nodes have the bright red border, the virtual screen is complete. For every receptor for which you run the virtual screen, a directory will be made with all the results for that particular receptor. In our case, these directories will appear in the SI_multi directory.


When the run is finished, there are several things we need to do in order to evaluate the results for each individual receptor:

  • Look at the docking pose(s)
    • Read in the receptor PDB into VMD
    • Read the DLG file into VMD as a separate PDB file
    • Do the poses look reasonable? Was AD4 able to reproduce the crystal structure binding mode of ATP? This is a good example of the importance of tuning parameters for your system to ensure that the binding modes of positive controls can be reproduced. Here, the culprit is ga_num_evals parameter, which was set to low. If you want, go back and retry the docking run with a larger number for this parameter.
  • Look at the clusters
    • They're listed at the top of the DLG file, depicted as a text-style histogram
    • How many clusters are there? Is the clustering good? The ideal case is that the lowest energy cluster is also the most populated. How did we do?
  • Look at the energy
    • Is AutoDock able to repeatedly find a low-energy docking pose?
    • What is the energy of the first cluster?
    • What is the energy of the biggest cluster?

Back to Track 3

Personal tools