SI2011 track3 CADD QR factorization tutorial

From NBCRwiki

Jump to: navigation, search

Back to Track 3



Like RMSD clustering, which we explored in the last tutorial, QR factorization is a method to reduce the conformational redundancy of a MD trajectory. Similar to the RMSD tutorial, we provide a short, 15 frame trajectory of the the ATP-dependent RNA editing ligase from Trypanosoma brucei, and use it to illustrate how QR factorization is preformed within the CADD pipeline.

Please download, which contains the required inputs and example outputs.

QR factorization

QR factorization is a numerical method that removes conformational redundancy and reduces the number of structures that must be considered during analysis. This is accomplished by ordering the conformations in increasing linear dependence, resulting in a minimal basis set which spans the conformational space sampled during the MD simulation. For the mathematically inclined who'd like to explore the underpinnings of the method, a good place to start is Here. For those less excited about upper triangular matrices, householder transformations, and column pivoting, the take home message is that QR factorization eliminates redundancy, and reduces the MD ensemble to a manageable size. The difficult numerical methods are handled by MultiSeq, a VMD plugin called by the CADD pipeline, making QR factorization a straightforward post-proccessing technique.

Preparing the Files

Trajectory File

The first steps are similar to the preparation required for RMSD clustering. However, rather than preparing a single multi-structure trajectory PDB, QR factorization requires splitting the trajectory up into individual trajectory frames, one PDB file per frame.

  • Open the TbREL1.psf file, which is located in the SI_QR/Trajectory directory, in VMD.
  • Load the TbREL1.dcd into the PSF file. The DCD file is also found in the SI_QR/Trajectory directory.
  • Go to Extensions -> TkConsole. Make sure you are in the SI_QR/Input directory. Type:
source get-pdbs.tcl

This command will call get-pdbs.tcl, a TCL-TK script that loops over each trajectory structure in the DCD file, writing out the protein structures one at a time in PDB format. Each file is given the name TbREL1_N.pdb, where "N" is the trajectory frame number. If you're curious how it works, you can open the get-pdbs.tcl script in a text editor, it's just a few simple lines of code.

Also, note that we didn't have to align the trajectory like we did prior to running the RMSD workflow. The alignment is performed for us by the CADD workflow.

Template GPF and DPF files

Recall from the RMSD clustering tutorial that these template files are required to run AutoDock 4. They are prepared in the same way as they were for the RMSD clustering turorial, where more detail about their content can be found.


  • Open a text editor and type in:
 npts 86 72 78
 gridcenter 40.34 18.53 13.02
  • Now, save the file as TbREL1.gpf in the SI_QR/Input directory.


  • Open a text editor and type in the following:
 ga_pop_size 200 
 ga_run 10                          
 ga_num_evals 50000 
 rmstol 2.0
  • Save this file in the SI_QR/Input directory, calling it TbREL1.dpf. You should now have TbREL1.dpf and TbREL1.gpf in the /SI_QR/Input directory.

Running the Workflow

Now that the required inputs are prepared, we can run the QR-factorization workflow using the TrajQR Web service.


  • Inputs
    • Directory_Containing_Input_PDB_Files specifies where the trajectory PDB files are stored. You prepared these above and they're stored in the SI_QR/Input directory.
    • RMSD provides the RMSD cutoff for the QR factorization algorithm. We will be using a cutoff of 6 Angstroms. Similar to its role in RMSD-based clustering, the number of structures returned by the QR-factorization algorithm is sensitive to the RMSD value assigned here. As a rule of thumb, the larger the RMSD cutoff, the fewer the number of structures.
    • Output_Directory_Path is where the individual files that describe the reduced structural ensemble will be written. Choose SI_QR/Output.
    • GPF/DPFTemplate Browser are the locations of the GPF and DPF files that we prepared. For the GPF file, chose SI_QR/Input/TbREL1.gpf. For the DPF file, chose SI_QR/Input/TbREL1.dpf.

Back to Track 3

Personal tools