|
|
|
|
The goal of current research in our laboratory is to develop an improved model of intra and intermolecular interactions and to apply this improved model to the prediction and design of macromolecular structures and interactions. Prediction and design applications can be of great biological interest in their own right, and also provide very stringent and objective tests which drive the improvement of the model and increases in fundamental understanding.
The protein and design calculations are carried out using a computer program called Rosetta. At the core of Rosetta are the physical model of macromolecular interactions and algorithms for finding the lowest energy structure for an amino acid sequence (protein structure prediction) or a protein-protein complex and for finding the lowest energy amino acid sequence for a protein or protein-protein complex (protein design). Both the physical model and the search algorithms are continually being improved based on feedback from the prediction and design tests. There are considerable advantages in developing one computer program to treat these quite diverse problems: first, the different applications provide very complementary tests of the underlying physical model (the fundamental physical chemistry is of course the same in all cases), and second, many problems of current interest, such as flexible backbone protein design and protein-protein docking with backbone flexibility involve a combination of the different optimization methods.
In the following sections are brief summaries of recent progress and highlights in each of the different areas and illustrations of the development of the physical model.
Over the past several years, we have used our computational protein design method to dramatically stabilize several small proteins by completely redesigning every residue of their sequences (Dantas et al, 2003), to redesign protein backbone conformation (Nauli et al, 2002), and to convert a monomeric protein to a strand swapped dimer (Kuhlman et al, 2001). A highlight was the redesign of the folding pathway of protein G, a small protein containing two beta hairpins separated by an alpha helix. In the naturally occurring protein the first hairpin is disrupted and the second hairpin is formed at the rate limiting step in folding, but in a redesigned variant in which the first hairpin was significantly stabilized and the second hairpin destabilized, the order of events is reversed: the first hairpin is formed and the second hairpin disrupted in the folding transition state (Nauli et al, 2002).
Particularly exciting recently is the achievement of a grand challenge of computational protein design--the creation of novel proteins with arbitrarily chosen three dimensional structures. We developed a general computational strategy for creating such novel protein structures that incorporates full backbone flexibility into rotamer based sequence optimization. This was accomplished by integrating ab initio protein structure prediction, atomic level energy refinement, and sequence design in Rosetta. The procedure was used to design a 93 residue protein called Top7 with a novel sequence and topology. Top7 was found experimentally to be monomeric and folded, and the x-ray crystal structure of Top7 is strikingly similar (RMSD = 1.2 Å) to the design model (Kuhlman et al., Science 2003). The successful design of a new globular protein fold and the very close correspondence of the crystal structure to the design model have broad implications for protein design and protein structure prediction, and open the door to the exploration of the large regions of the protein universe not yet observed in nature.
We have chosen as a model system for redesigning protein-protein interaction specificity the high affinity complex between Colicin E7 Dnase and its cognate inhibitory immunity protein. We have computationally redesigned specificity at this protein-protein interface, experimentally demonstrated alterations in specificity in both in vitro and in vivo assays, and carried out crystallographic analysis of one of the redesigned interfaces. Novel Dnase-inhibitor protein pairs were generated using the physical model described above and a modification of our rotamer search based computational design strategy incorporating elements of both positive and negative design. The designed protein complexes exhibit sub-nanomolar affinities, are functional and specific in vivo, and have more than an order of magnitude affinity difference between cognate and non-cognate pairs in vitro. The crystal structure of a designed complex confirmed the computational model and highlights both the strengths and the limitations of the current methodology. The approach should be applicable to the design of interacting protein pairs with novel specificities for delineating and reengineering protein interaction networks in living cells.
In collaboration with Dr. Barry Stoddard and Dr. Ray Monnat’s research groups, we generated an artificial highly specific endonuclease by fusing domains of homing endonucleases I-DmoI and I-CreI through computational optimization of a new domain-domain interface between these normally non interacting proteins. The resulting enzyme, E-DreI (Engineered I-DmoI/I-CreI), binds a long chimeric DNA target site with nanomolar affinity, cleaving it precisely at a rate equivalent to its natural parents. The structure of an E-DreI/DNA complex demonstrated the accuracy of the protein interface redesign algorithm and revealed how catalytic function is maintained during the creation of the new endonuclease.
The picture of protein folding that motivates our approach to ab initio protein tertiary structure prediction is that sequence-dependent local interactions bias segments of the chain to sample distinct sets of local structures, and that nonlocal interactions select the lowest free-energy tertiary structures from the many conformations compatible with these local biases. In implementing the strategy suggested by this picture, we use different models to treat the local and nonlocal interactions. Rather than attempting a physical model for local sequence-structure relationships, we turn to the protein database and take the distribution of local structures adopted by short sequence segments (fewer than 10 residues in length) in known three-dimensional structures as an approximation to the distribution of structures sampled by isolated peptides with the corresponding sequences. The primary nonlocal interactions considered are hydrophobic burial, electrostatics, main-chain hydrogen bonding and excluded volume. Structures that are simultaneously consistent with both the local sequence structure biases and the nonlocal interactions are generated by minimizing the nonlocal interaction energy in the space defined by the local structure distributions using simulated annealing.
Rosetta was tested on 21 proteins whose structures had been determined but were not yet published in the CASP4 experiment. The predictions for these proteins, which lack detectable sequence similarity to any protein with a previously determined structure, were of unprecedented accuracy and consistency (Bonneau et al, 2002). Excellent predictions were also made more recently in the CASP5 experiment (Bradley et al., 2003). Encouraged by these promising results, we have generated models for all large protein families fewer than 150 amino acids in length (Bonneau et al, 2002). We are developing methods to (1) improve the accuracy of the models (which is still poor compared to a structure obtained using experimental data) and (2) extract functional insights from the models. These methods are being applied to parasite proteins as part of the SGPP structural genomics project at the UW and to essential yeast proteins of unknown function as part of the UW yeast resource center .
We have extended the ROSETTA ab initio structure prediction strategy to the problem of generating models of proteins using limited experimental data. By incorporating chemical shift and NOE information (Bowers et al, 2000) and more recently dipolar coupling information (Rohl et al, 2002) into the Rosetta structure generation procedure, it has been possible to generate much more accurate models than with ab initio structure prediction alone or using the same limited data sets with conventional NMR structure generation methodology. An exciting recent development is that the Rosetta procedure can also take advantage of unassigned NMR data and hence circumvent the difficult and tedious step of assigning NMR spectra (Meiler et al., 2003).
We have also developed a method for comparative modeling that was one of the top performing methods in the CASP4 experiment. The method utilizes a new protein sequence structure alignment method and structurally variable regions such as long loops not present in the structure of a homologue are built using a modification of the rosetta ab initio structure prediction methodology (Rohl et al, 2004). Both the ab initio and the comparative modeling methods have been implemented in a server called ROBETTA which was one of the best all around fully automated structure prediction servers in the CASP5 test (Chivian et al, 2003).
We have developed a new method to predict protein-protein complexes from the coordinates of the unbound monomer components (Gray et al, 2003). The method employs a low-resolution, rigid-body, Monte Carlo search followed by simultaneous optimization of backbone displacement and side-chain conformations with the Monte Carlo minimization procedure and physical model used in our high resolution structure prediction work. The results are promising and suggest that the method may soon be useful for generating models of biologically important complexes from the structures of the isolated components
Our current approach to improving energy functions involves a combination of quantum chemistry calculations on simple model compounds, traditional molecular mechanics approaches, and protein structural analysis. We have used such an approach to develop an improved hydrogen bonding potential (Kortemme et al., 2002; Morozov et al., 2004)—a particularly notable result is that the orientation dependence of the hydrogen bond in quantum chemistry calculations on formamide dimmers is remarkably similar to that seen in sidechain-sidechain hydrogen bonds in protein structures, but quite different from that in current molecular mechanics force fields which neglect the covalent character of the hydrogen bond. Feedback from the prediction and design calculations has provided a continual impetus and guidance for improving the energy function, for example inadequacies in our treatment of protein-protien interactions have led to the recent development of a rotamer based model for water mediated hydrogen bonds.
It is exciting that our prediction and design methods have now reached the point where they can be applied to important biological problems. In the next several years we aim to improve and extend the methods still further and to apply them to problems of particular biological interest. Areas of particular focus currently in our group are to improve the accuracy of high resolution structure prediction (which will be required if the models are to be generally useful) by improving the underlying physical model, to predict and redesign protein-DNA interaction specificity, and to design new protein small molecule interactions and catalysts.