Protein Refinement in Discovery Studio 1.7. Presented by: Francisco Hernandez-Guzman, PhD

Protein Refinement in Discovery Studio 1.7 Presented by: Francisco Hernandez-Guzman, PhD February 2007

Agenda Part I: Protein Refinement in Discovery Studio 1.7 (20mins) The problem Novel Algorithms at Accelrys Part II: DEMO (25mins) Conclusion 2

Importance and Difficulties Importance of side-chain and loop refinement X-ray Missing data Data is constrained crystal contacts Proteins are dynamics entities Homology modeling Inherent errors and bias from template(s) Missing data In-Silico Structure Based Drug Design Evidence indicates the need to consider flexibility to open active sites, and allow proper docking Difficulties in side-chain and loop refinement Intrinsic flexibility difficult combinatorial problem 1 peptide 2-3 rot. bonds. 3

Protein Refinement in Discovery Studio 1.7 Protein refinement tools can be used to: Correct errors in exp. structs. due to missing exp. data. Provide alternate starting conformations for docking Discovery Studio 1.7 provides two novel algorithms for: Side-chain refinement (rebuilding) ChiRotor 1 Loop refinement (rebuilding) Looper 2 ACCELRYS Results of loop refinement protocol CHARMm based 4 1. V. Spassov, et al. The dominant role of side-chain backbone interactions in structural realization of amino-acid code. ChiRotor: a side-chain prediction algorithm based on side-chain backbone interactions Protein Science 2007. 2. V. Spassov,, et al. LOOPER : A CHARMm based Algorithm for Loop prediction (in preparation)

Side-Chain Refinement using ChiRotor ChiRotor Optimizes: -side-chains conf. systematic CHARMm E search and minimization. Selects: -Best conf. based on CHARMm E. Fast ab intio approach independent on the starting structure Rebuilds the side-chains 5 Animated figure

ChiRotor Methodology Start loop for i from 1 to n Choose Residue i Construct complete structure using lowest energy conformer of each residue. 6 Protein 3D Structure Select a Set of n Residues For Refinement Remove all side chain atoms of selected residues Sample side chain conformations of residue i varying χ 1 Energy Minimize side chain atoms of residue i in CHARMm Save 2 Best Conformations for residue i End loop for i Output: 2n partial structures Energy minimize all selected side chains Start loop for i from 1 to n Replace side chain conformation i with the 2 nd conformer and energy minimize Accept the structure if energy is lower. End loop for i Output: 1 Lowest Energy structure

ChiRotor Results Compared ChiRotor to SCWRL and SCAP 1 Benchmark includes 24 high resolution X-ray crystal structures selected using PISCES 2 Resolution < 1.0Å Sequence identity < 20% ChiRotor More accurate in the core region Relatively fast Computation time is similar to SCAP (fast mode) and much faster (10 to 20 times) than SCAP (slow mode) Not based on a rotamer library Predicted conformations are already minimized by CHARMm thus no further refinement is necessary Method ChiRotor SCWRL SCAP (Fast mode) SCAP (Slow mode) Core RMSD 0.8 1.1 0.9 0.7 All RMSD 1.7 1.7 1.6 1.5 7 1. Xiang, Z. and Honig, B. 2000. Extending the accuracy limits of prediction for side-chain conformations. J. Mol. Biol. 311: 421-430. 2. R. L. Dunbrack Jr.* and M. Muquit http://dunbrack.fccc.edu/pisces.php

Slide hidden during ppt. ChiRotor Results The CHARMm polar force field is more accurate and faster CPU time scales close to O(N2) where N is the number of residues ChiRotor RMSD 2.5 80 ChiRotor CPU Time 2 CPU (min) 70 60 50 40 30 20 CHARMm-Polar-H CHARMm RMSD (Angstroms) 1.5 1 10 0 #res 53 58 64 81 107 123 129 224 233 269 321 24 proteins sorted by sequence length 363 0.5 0 #res 53 58 64 81 CHARMm-Polar-H, core residues CHARMm-Polar-H, all residues CHARMm, core residues CHARMm, all residues 107 123 129 224 233 269 321 363 24 proteins sorted by sequence length 8

Loop Refinement using Looper The Looper loop refinement algorithm Systematically searches for backbone conformation Uses CHARMm minimization Ranks the conformation using CHARMm energy, including solvation energy term Fast, ab intio approach independent of the starting structure Rebuilds the loop de novo Loop Refinement protocol and results in Discovery Studio 1.7 9

Looper Methodology Looper first constructs and optimizes the loop backbone Systematic search of loop conformation by sampling a minimum set of backbone dihedral angles φ and ψ The loop is divided into two halves and each half is constructed independently from the end of the loop and then combined without the side-chain atoms The loops are minimized by CHARMm and ranked by CHARMm energy Continued 10

Looper Methodology Next, Looper constructs the loop side chains and optimizes the loops Construct the side-chains using the ChiRotor and the loop conformations are ranked by CHARMm energies Finally, Looper re-ranks the conformations CHARMm energy minimizations in the first two steps are done without including the solvation energy term In this step, each top ranking loop conformation is re-scored by adding the solvation energy term calculated using Generalized Born approximation (in CHARMm) 11

Looper Results Compared Looper to Fiser 1 and Jacobson 2 publications Benchmark Test Data Sets were taken from Fiser s 1 work For each loop length, 40 loops from 40 different proteins are modeled RMSD were calculated between the model and experimental structure for backbone heavy atoms using the global RMSD definition by Fiser Jacobson s calculation are done in crystal environment which may result in better accuracy, but not practical for model building 12 1. Fiser et al. Prot.Sci., 9:1753-1773, (2000) 2. Jacobson et al. Protens., 55:351-367 (2004)

Looper Results (accuracy) Looper is more accurate than Fiser s Jacobson results are better for longer loops MODELER 9.0 loop refinement method (includes the DOPE function) is better than Fiser s RMSD* [Angstrom] 8 7 6 5 4 3 2 1 0 Looper 1 2 3 4 5 6 7 8 9 10 11 12 Loop Length [residues] * bb atom rmsd MODELER 6.0 (Fiser) Jacobson 13

Looper Results (speed) Looper results are similar to Jacobson s stage 1, but much faster (>100 times faster) than stages 2 and 3 Jacobson s stage 3 results are better for longer loops, but with a high computational cost (>100 times slower than looper) Fiser s timing is similar to Jacobson s methods 25 Looper 20 Time [min] 15 10 5 0 2 3 4 5 6 7 8 9 10 11 12 Number of Residues in Loop 14 Jacobson

15 DEMO

Conclusions One of the main obstacles to any method for protein structure prediction: Combinatorial problem In the case of loop and side chain optimizations: intelligent assumptions the difficulty is reduced, but still significant We demonstrated that it is possible to create efficient algorithms for loop (Looper) and side-chain (ChiRotor) optimizations by reducing the search to the minimal number of initial conformations of each aminoacid residue in combination with energy minimization Continued 16

Conclusions Continued The CHARMm scripts for side-chain and loop predictions are developed entirely on physical principles and do not use any bases of structural data or rotamer libraries The script realization of the algorithms makes it easy to include them as part of any CHARMm modeling protocol The accuracy of the methods is comparable to the accuracy of other known algorithms In the case of ChiRotor, we have achieved an extremely fast code without losing accuracy 17

Acknowledgements R&D Velin Spassov Lisa Yan Paul Flook Marc Fasnacht Jürgen Koska Marketing Dana Haley-Vicente Accelrys 18