
Deep Mutational Scanning to Analyze Protein Function
Understanding the functional and biophysical characteristics of proteins is of paramount importance. We have developed a method, deep mutational scanning (Figure 1), that makes use of protein display technology in conjunction with high-throughput sequencing. Deep mutational scanning enables the investigation of protein function on an unprecedented scale, facilitating the simultaneous measurement of the fitness of hundreds of thousands of mutants of a protein.
Protein display technologies physically link proteins and the DNA sequences that encode them. Protein display allows for selection among a large library of protein variants for those with a protein function. Protein display technology has been restricted in scope by the requirement for back-end DNA sequencing, which has limited the number of selected protein variants that can be identified to a few hundred. Deep mutational scanning alleviates this bottleneck by using high-throughput sequencing to sequence tens of millions of individual library members in parallel (Figure 1). The primary benefit of this approach is that millions of protein variants can be simultaneously identified and counted. Comparison of the frequency of a given variant in a selected library and in the input library yields an enrichment ratio that is an estimate of function. The key ingredientsprotein display, low-intensity selection and highly accurate, high throughput sequencingare simple and becoming widely available. Deep mutational scanning data can be used to construct protein sequencefunction maps, and systematic analysis of deep mutational scanning data can reveal fundamental protein properties. We have applied deep mutational scanning to a number of proteins in a variety of functional assays.

Systematic Analysis of Large Scale Fitness Data to Identify Mutations that Stabilize Proteins
Enhancing protein stability is often critical for industrial and pharmaceutical applications. Stabilizing mutations permit acquisition of other, destabilizing mutations that improve function. This phenomenon can be observed as epistasis, where multiple mutations combine with unpredictable fitness effects. We identify stabilizing mutations in a WW domain based solely on parallel measurement of the fitness of 47,000 variants to bind to a peptide ligand and subsequent calculation of >5,000 epistasis scores (Figure 2A). We introduce an epistasis-based metric, “partner potentiation,” which identified 15 candidate stabilizing mutations, including three known stabilizing mutations (Figure 2B). We tested six novel candidates by thermal denaturation and found two highly stabilizing mutations, one more stabilizing than any previously known mutation. Thus, systematic analysis of large-scale protein fitness data can reveal fundamental physicochemical properties such as stability.

Enrich: Software for Analysis of Protein Function by Enrichment and Depletion of Variants
We developed Enrich, a tool for analyzing deep mutational scanning data. Enrich identifies all unique variants (mutants) of a protein in high-throughput sequencing data sets and can correct for sequencing errors using overlapping paired-end reads. Enrich uses the frequency of each variant before and after selection to calculate an enrichment ratio, which is used to estimate fitness. Enrich provides an interactive interface to guide users. It generates user-accessible output for downstream analyses as well as several visualizations of the effects of mutation on function, thereby allowing the user to rapidly quantify and comprehend sequencefunction relationships. Enrich is implemented in Python, is available under a FreeBSD license and can be downloaded here<link to enrich page>. Enrich includes detailed documentation as well as a small example data set.
Understanding the Molecular Basis of Selectivity in the Protein Kinase A/AKAP-79 interaction (with the laboratory John Scott, HHMI and Dept. Pharmacology, University of Washington)
Protein Kinase A (PKA) is a central intracellular protein kinase that regulates the activity of many proteins involved in cellular metabolism. PKA activity is controlled via interactions with A Kinase Anchoring Proteins (AKAPs). AKAPs function by binding to the PKA regulatory subunit, localizing PKA within the cell. AKAPs can interact with either the alpha or the beta isoform of the regulatory subunit of PKA, or they can interact with both. The alpha and beta isoforms are highly similar, making it difficult to study the molecular determinants of selectivity between isoforms (Figure 3).

We are using phage display in combination with high-throughput sequencing to identify the sequence determinants of AKAP selectivity. We displayed a library of millions of mutagenized AKAP proteins on the surface of T7 phage and then subjected this library to selection against either the alpha or beta isoform of the regulatory subunit of PKA. By comparing the abundance of each variant before and after selection, we derived enrichment ratios for several hundred thousand variants. Most variants performed similarly in selections against both the alpha and beta isoforms. However, some variants displayed strong selectivity for either the alpha or beta isoform. We are using the results of this assay to develop highly alpha- and beta-specific AKAPs. These highly specific AKAPs will bind only to PKAs with the cognate regulatory isoform. If introduced into cells at high concentrations, they will disrupt the normal regulatory interaction for their cognate isoform, enabling us to study the biological significance of the isoforms.
•Doug Fowler,Carlos Araya & Jason Stephany
Published Results:
Fowler DM, Araya CL, Fleishman SJ, Kellogg EH, Stephany JJ, Baker D, Fields S. High-resolution mapping of protein sequence-function relationships. Nat Methods. 2010 Sep;7(9):741-6.
Fowler DM, Araya CL, Gerard W, Fields S. Enrich: Software for Analysis of Protein Function by Enrichment and Depletion of Variants. Bioinformatics. 2011 Oct 17. [Epub ahead of print]
In Vivo Deep Mutational Scanning of an RNA-Recognition Motif (RRM)
Throughout its life, an RNA molecule associates with diverse RNA-binding proteins that regulate its processing and function. A single RNA-binding protein typically recognizes a particular subset of RNA molecules and affects their collective fate by regulating one or more steps in RNA metabolism, from pre-mRNA splicing to mRNA localization, translation and decay. Since these functions underlie multiple fundamental cellular processes, genetic changes that disrupt RNA-binding protein function can lead to multifaceted human pathologies.
RNA-binding proteins use a relatively small repertoire of RNA-binding domains to fulfill their function. A high degree of specificity is achieved by the spatial organization of the RNA-binding domains within a single RNA-binding protein and by the small sequence variations between structurally related domains.
We are using deep mutational scanning, an experimental strategy developed in our lab, to study the effects of sequence variations on the function of a very common RNA-binding domain called the RNA Recognition Motif (RRM) (Figure 1).

Our goal is to combine the power of high throughput sequencing with functional assays to determine the effects of practically all of the possible single amino acid substitutions and many of the double and triple mutations on the function of an RNA-binding domain. These methods follow a similar scheme. In the first step, a library of many mutant variants of an RBP is prepared in an expression vector. In the second step, the library is transformed into the appropriate expression system, and the recipient is exposed to a functional selection. In the third step, the input and the selected library inserts are isolated, fragmented and sequenced by Illumina technology. In the fourth step, the number of sequencing reads is counted for each variant in the pool, allowing the effect of each mutation on the function of the RNA-binding domain to be determined.
For one functional assay, we are exploiting the necessity of a functional poly(A) binding protein (Pab1) for yeast viability to test the effect of numerous mutations in the Pab1 RNA recognition motif-2 (RRM2) domain on survival. In this system, tetracycline is added to a yeast culture: whose chromosomal PAB1 is deleted; that expresses the wild-type PAB1 gene from a tetracycline-regulated plasmid; and that carries variant PAB1 genes with mutations of the Pab1 RRM2. Since tetracycline shuts off the expression of the wild-type PAB1, the effect of each mutation on protein function will be mirrored by the proliferation rate of the cells in the presence of tetracycline. Mutations that interfere with RRM2 function will result in no growth or slow growth rate, while mutations with little effect or those that improve RRM2 function will result in normal to accelerated growth.
Deep mutational scanning combines the power of high throughput DNA sequencing with assays of protein function to determine the effects of most of the possible single amino acid substitutions and many of the double and triple mutations on the function of a single protein. We made use of the necessity of a functional poly(A) binding protein (Pab1) for yeast growth and survival to test the in vivo effects of numerous mutations in the Pab1 RRM2 domain. In our system, the endogenous PAB1 gene has been deleted and replaced with a plasmid expressing the wild-type PAB1 from a tetracycline-regulated promoter. A second plasmid within these cells expresses one of many variants carrying a random mutation in the PAB1 RRM2. Adding a tetracycline analog to the culture shuts off the expression of the wild-type gene, making the cells completely reliant on the mutant PAB1 performance for growth. High throughput sequencing of the library variants before and after addition of the tetracycline analog allows us to measure the change in frequency of each variant, which in turn can be used as a proxy for the function of the mutant PAB1 RRM domain.
To date, we have obtained information on the fitness effects of nearly 106 RRM2 mutation variants. These data have allowed us to identify functionally important residues that were previously unknown and to define a functionality-based consensus sequences for the two RNA-binding motifs within RRM2 (Figure 2). We have also been able to create a structurefunction map of RRM2 showing that the most important feature within this domain is the beta-sheet structure that is involved in poly(A) binding (Figure 3). Finally, we are using this dataset to identify mutations that cause an unexpected fitness effect when combined with other mutations in a single variant (epistasis). Further classification of these mutations (e.g. rescuing, destructive, synergistic etc.) will increase our understanding of RRM2 function (Figure 4).


•Daniel Melamed & David Young
High throughput proteome-wide search for ubiquitinated proteins in yeast
Our goal is to improve upon existing biochemical approaches for finding sites of ubiquitin attachment on all yeast proteins.
Presently, the approach is to purify proteins with an 8xHis-tagged ubquitin followed by tryptic digest to generate peptide fragments for LC/LC-MS/MS analysis. The tryptic fragments that contain a lysine residue that had been ubiquitinated can be identified by a mass shift corresponding to 2 glycine residues left attached to an internal lysine. Historically, only a relatively small number of GG-peptides have been found due to low abundance of GG-peptide as compared to unmodified peptides.
We seek to reduce the complexity of the resulting peptides by using the chemical NTCB (2-nitro-5-thiocyanatobenzoic acid) which cleaves at cysteine residues (Figure 1B). Since ubiquitin does not have any cysteines, the proteome can be cleaved prior to affinity purification ubiquitinated proteins (Figure 1C) and typsinization (Figure 1D). This step will remove many non-ubiquitinated fragments therefore reducing the complexity of the sample. We hope to further decrease the complexity of the peptides by separating forked peptides from linear peptides by strong-cation exchange chromatography (Figure 1E) prior to injection into the mass spectrometer.

Using the technique outlined above we found a total of 965 unique sites of ubiquitin attachment on 410 yeast proteins. We are able to gain a 1.5X enrichment of GG-peptides using the NTCB cleavage strategy. Most of the improvement in sequencing GG-peptides is achieved by using the high-mass accuracy Orbitrap-LTQ (Table 1).

•Lea Starita & Russell Lo
Published Results:
Starita, L.M., Lo, R.S., Eng, J.K., Von Haller, P.D. and Fields, S. Sites of ubiquitin attachment in Saccharomyces cerevisiae. 2011 Proteomics, in press.
Interrogation of E3 ubiquitin ligase catalysis by deep mutational scanning
Ubiquitin signaling is an important mechanism is nearly every cellular process. E3 ubiquitin ligases specify the substrate and catalyze the transfer from an E2 ubiquitin conjugating enzyme to that substrate. Though E3 enzymes are a large and well-studied class of proteins, little is known about how the E3 catalyzes this transfer. We are interrogating the functional consequences of mutation at every amino acid of an E3 ligase domain. To this end, we have chosen the U-box domain of the E3 ligase UBE4B, and constructed a library of 1 million UBE4B mutants displayed on the surface of T7 phage. We used the UBE4B phage in in vitro ubiquitination reactions to test their E3 activity. By adding E1 and E2 enzymes, Flag-tagged ubiquitin, and ATP, we could get the UBE4B to catalyze auto-ubiquitination. The Flag-tagged ubiquitin allowed us to select for enzymatically active UBE4B-phage by incubation with anti-Flag beads. Nonspecifically bound phage were washed away and bound phage were eluted by competition with Flag peptide. The eluted phage were amplified and subjected to more rounds of selection.

Using high throughput sequencing, we determined the genotypes of phage in the input pool versus those selected for their ability to catalyze auto-ubiquitination, allowing us to track how each mutant performs during the selection experiments. This experiment gives us insight into the function of the U-box domain of UBE4B, revealing elements of E3 catalysis.

•Lea Starita
High-throughput Analysis of a Protein Degradation Signal
The ubiquitin proteasome system (UPS) governs most of the regulated proteolysis in eukaryotes. Substrates destined for proteasomal degradation are often modified with ubiquitin, which is attached to these substrates by a series of enzymes called E1, E2, and E3. A primary degradation signal of UPS substrates that is recognized by E3 enzymes is known as a degron.
We designed a high-resolution strategy to map the sequencefunction relationships of a known degron in a systematic manner, by combining a simple genetic tool with high-throughput sequencing. Our system is based on the fact that yeast cells that express the URA3 gene grow in the absence of uracil, but die in the presence of 5-FOA because the Ura3 enzyme converts 5-FOA to a toxin. We can alter the stability of the Ura3 enzyme by fusing it to a degron that leads to rapid degradation, and thus alter the growth sensitivity of the yeast cells. We optimized this system using a well-characterized degradation signal, Deg1 from the Matα2 protein, fused to Ura3. To query mutations of every amino acid in the degron for their ability to stabilize or destabilize Ura3, we replaced the wild type Deg1 sequence with a library, constructed from doped oligonucleotides, that was designed to have a million different mutations in the N-terminal 33 amino acid region of Deg1. Plasmids containing the degron clones were prepped from cultures after uracil selection and subjected to Illumina sequencing. By comparing the number of times each degron mutant appears in the input pool versus in the selected pool, we can gain insight into how each mutation affects the stability of Ura3. This simple but powerful technique is also being applied to other biological questions that revolve around protein stability.

•Griffin Kim & Sarah Bernards
|
       |
|
|