Towards testing all 35,397 possible missense variants of BRCA1 for function
BRCA1 is a breast and ovarian cancer-specific tumor suppressor gene and has been subject to much diagnostic sequencing. Multiple cancer-predisposing mutations have been identified along with >500 missense variants classified as Variants of Uncertain Significance or VUS. BRCA1 is an 1863 amino acid protein with two recognizable domains. The N-terminus contains a RING domain and is part of an active ubiquitin ligase and the C-terminus has tandem BRCT (BRCA1 C-Terminus) repeats that bind to phosphorylated peptides and activate transcription. BRCA1 is required for double-strand DNA break repair via homologous recombination, and mutations throughout the protein have deleterious effects on this function. We have devised several assays to score all of the 35,397 possible missense variants of BRCA1 for effects on the protein’s biochemical and cellular functions using a method of deep mutational scanning
We scored 2413 of the possible 5757 missense variants (40%) of the N-terminal 304 amino acids of BRCA1 for ubiquitin ligase function using a phage display system that selects for active variants in an in vitro autoubiquitination reaction. Within these variants 57 have been identified in patients as VUS. The range of ubiquitin ligase function of the VUS variants varied from nearly completely inactive to fully functional, suggesting that some of the variants of BRCA1 that are classified as VUS are nonfunctional ubiquitin ligases.
To assess the effect of mutation on ubiquitin ligase function a library of coding variants of the RING domains of BRCA1-BARD1 is fused to the T7 bacteriophage coat protein. The E3-phage are subjected to in vitro ubiquitination reactions followed by selection for phage coding for active E3 ligase (as outlined in the flow diagram below). Phages harboring active E3 ligases increase in abundance throughout selection whiles phages harboring E3 ligases with deleterious mutations decrease in abundance. These changes are measured by sequencing the input and selected populations. Enrichment ratios (E) are calculated by dividing the frequency at which each variant occurs in the selected population by its frequency in the input population.
We then compared the Enrichment ratio (E) scores for each variant from the deep mutational scan of the RING domain of BRCA1 to the BRCA1 informational database classification. 2356 variants were never found in the human population and, as expected, the E scores for these variants ranged from completely inactive to highly active. 57 of the variants were classified as VUS and many of these are nonfunctional ubiquitin ligases in our assay.
Finally, we are using a cell-based assay to score the effect of missense mutations in full-length BRCA1 on the ability of these variant BRCA1 proteins to rescue homologous recombination when the endogenous protein is repressed. To this end we are optimizing the molecular manipulations to build the libraries of variants with single amino acid changes into lentiviral vectors to transduce into a homologous recombination-reporter cell line.
•Lea Starita & Justin Gullingsrud
Randomer and ORFome Approach to Define the Specificity of E3 ligases to Substrates of the Ubiquitin Proteasome System
The ubiquitin proteasome system (UPS) is a complex pathway in which hundreds of regulatory proteins are involved in recognizing protein substrates, tagging them with ubiquitin, and degrading them by the proteasome in order to remove unwanted proteins from the cell. A deeper understanding of the regulatory mechanism of this system is key to developing treatments for UPS-related diseases, such as cancer and neurodegenerative disorders. A fundamental question in this field is the determination of which substrates are processed by which regulators. For example, SCF complexes define the largest family of E3 ubiquitin ligase, and show substrate specificity by exchanging the F-box subunit. Among more than 20 F-box proteins in yeast, only a few substrates have been assigned to specific F-box proteins. To delineate substrate specificity of these F-box proteins, we are applying the Stable-seq method that we have developed. In this study, we fuse either a synthetic degradation signal or an open reading frame (ORF) to a nutritional marker. The degradation signal or ORF determines the stability of the fusion, such that selection for the nutritional marker leads to differential growth rates. The cells grow slower in a wild type strain in which the degradation signal or ORF is unstable, but they grow faster in a strain lacking an E3 enzyme that is crucial for the degradation.
To this end, we fused random sequences (a stretch of 20 NNK codons) with the LEU2 gene and generated more than a million unique fusions to be assayed by Stable-seq. At the same time, nearly every yeast ORF was transferred from a movable-ORF (MORF) library to a destination vector to be fused with the LEU2 gene by Gateway cloning. Pilot screening of the synthetic degron library shows differential growth rates of the fusions in the wild type strain, and much better growth in one of the F-box deletion strain.
These fusion libraries will be transformed into the 18 non-essential F-box deletion strains, and the pool of plasmids from each strain after selection will be identified and analyzed by high-throughput sequencing. The results will give us insights into how the F-box proteins contribute to the UPS regulatory network.
•Griffin Kim & Christina Miller
Uncovering the Structural Basis of GPCR Functional Selectivity through Deep Mutational Scanning
G Protein Coupled Receptors (GPCRs) are a diverse family of plasma membrane bound proteins that all share 7 transmembrane helices, 3 intracellular loops, and 3 extracellular loops. There are close to 800 human GPCR genes which are responsible for a large proportion of the cellular communication in our species. Approximately 369 of these are non-sensory, making them current or potential drug targets. An estimated 40-60% of current therapeutic drugs target at least one GPCR, so advances in our understanding of signal transduction through GPCRs have potentially widespread clinical ramifications.
It has recently become clear that for any given G Protein Coupled Receptor (GPCR), multiple signaling pathways might be activated and multiple mechanisms might lead to receptor internalization at different rates depending on the specific ligand being used. This phenomenon is called “functional selectivity” or “biased agonism,” and its molecular and structural basis is only just starting to be elucidated. Though recent advances in crystallographic techniques have lead to an increasing number of structures for both active and inactive GPCRs, much remains unclear about the mechanisms of functional selectivity.
We are currently developing a set of high throughput assays for interrogating the effects of all single mutations in a GPCR on receptor expression, internalization, and signaling in a system that has already been shown to display functional selectivity: the Mu Opioid Receptor (MOR). This receptor is clinically important, as it’s the major target of opioid analgesics. Functionally distinct opioid agonists result in different amounts of tolerance development. Also, it is thought that several of the negative side effects of opioids, such as constipation and respiratory depression, might be mediated by a different pathway than their analgesic effects, which would make functional selectivity in this receptor particularly interesting clinically.
We have created a normalizable mammalian expression system for the MOR and cloned several mutants with known defects in cell surface expression into a lentiviral vector. We have demonstrated the feasibility of separating mutants based on their surface expression using a fluorescent antibody and flow cytometry. By binning cells by the amount of fluorescence, we can separate poorly expressed mutants from highly expressed mutants, and we can determine the contents of each bin by sequencing. We are currently generating a library of all single mutants of the MOR. Additional assays for receptor internalization and inhibition of calcium release will be developed after the library is created and functional selectivity will be examined by comparing results between assays using different agonists. Data generated in this project will complement structural data based on NMR and X ray crystallography by providing a functional map to overlay on the spatial one.
In Vivo Deep Mutational Scanning of an RNA-Recognition Motif (RRM)
Throughout its life, an RNA molecule associates with diverse RNA-binding proteins that regulate its processing and function. A single RNA-binding protein typically recognizes a particular subset of RNA molecules and affects their collective fate by regulating one or more steps in RNA metabolism, from pre-mRNA splicing to mRNA localization, translation and decay. Since these functions underlie multiple fundamental cellular processes, genetic changes that disrupt RNA-binding protein function can lead to multifaceted human pathologies. We are using deep mutational scanning, an experimental strategy that couples high throughput DNA sequencing with assays of protein function, to study the effects of sequence variations on the function of a common RNA-binding domain called the RNA Recognition Motif (RRM). Specifically, we made use of the necessity of a functional poly(A)-binding protein (Pab1) for yeast growth and survival to test the in vivo effects of numerous mutations in the Pab1 RRM2 domain (Figure 1). In this system, the endogenous yeast PAB1 gene has been deleted and replaced with a plasmid expressing the wild-type Pab1 from a tetracycline-regulated promoter. A second plasmid in these cells expresses one of many variants carrying random mutations in the Pab1 RRM2. Adding a tetracycline analog to the culture shuts off the expression of the wild-type gene, making the cells completely reliant on the mutant Pab1 performance for growth. High throughput sequencing of the library variants before and after addition of the tetracycline analog allows us to measure the change in frequency of each variant, which in turn can be used as a proxy for the function of the mutant Pab1 RRM domain.
One of the major outputs of this experiment is a single amino acid substitution matrix representing all possible 19 single amino acid substitutions at each residue in the RRM2 domain of Pab1 (Figure 2). This matrix points to the β-strands as the most important for the in vivo function of RRM2, which agrees with their essential role in poly(A) binding.
To gain a better understanding of Pab1 RRM2 function, we observed the ratio scores of about 200 single amino acid substitutions that occur in other Pab1 homologous sequences (Figure 3). These scores, which can be viewed also as the output of a large-scale inter-species complementation assay, revealed that while most of the natural changes were neutral in their effects, a few substitutions were deleterious. Mapping these mutations on the RRM2 structure revealed that most of them affect residues at the protein surface. We suspected that this approach allowed the identification of protein interaction sites that diverged throughout evolution. Indeed, we found that about half of these mutations interfere with the interaction between Pab1 and the translation initiation factor eIF4G.
Overall, we suggest that extracting functional scores of naturally occurring substitutions from deep mutational scanning experiments can facilitate the identification of surface residues that were likely to co-evolve with their binding partner.
High-throughput Analysis of a Protein Degradation Signal
Determining the half-life of proteins is critical for an understanding of virtually all cellular processes. Current methods for measuring in vivo protein stability, including large-scale approaches, are limited in their throughput or in their ability to discriminate among small differences in stability. We developed a new method, Stable-seq, which uses a simple genetic selection combined with high-throughput DNA sequencing to assess the in vivo stability of a large number of variants of a protein. The variants are fused to a metabolic enzyme, which here is the yeast Leu2 protein. Plasmids encoding these Leu2 fusion proteins are transformed into yeast, with the resultant fusion proteins accumulating to different levels based on their stability and leading to different doubling times when the yeast are grown in the absence of leucine. Sequencing of an input population of variants of a protein and the population of variants after leucine selection allows the stability of tens of thousands of variants to be scored in parallel. By applying the Stable-seq method to variants of the protein degradation signal Deg1 from the yeast Matα2 protein, we generated a high-resolution map that reveals the effect of ~30,000 mutations on protein stability. The scores determined by Stable-seq of variants carrying single mutations are visualized in this heat map, with cell growth rates that would correspond to scores shown in the inset boxes.
We identified mutations that likely affect stability by changing the activity of the degron, by leading to translation from new start codons, or by affecting N-terminal processing. Stable-seq should be applicable to other organisms via the use of suitable reporter proteins, as well as to the analysis of complex mixtures of fusion proteins.
•Griffin Kim, Christina Miller & David Young
Kim I, Miller CR, Young DL, Fields S. High-throughput Analysis of in vivo Protein Stability. Mol Cell Proteomics. 2013 Jul 29. [Epub ahead of print]
•Doug Fowler & Carlos Araya
Activity-enhancing mutations in an E3 Ubiquitin ligase discovered by deep mutational scanning
Although ubiquitination plays a critical role in virtually all cellular processes, understanding of the mechanistic details of ubiquitin transfer is still rudimentary. To identify the molecular determinants with E3 ligases that modulate activity, we developed a high-throughput assay (Figure 1) to measure the activity of nearly 100,000 protein variants of the U-box domain of murine Ube4b and found rare mutations that enhanced activity both in vitro and in cellular p53 degradation assays. Our results highlight the utility of high-throughput mutagenesis in delineating the molecular basis of enzyme activity.
•Lea Starita & Russell Lo
Starita LM, Pruneda JN, Lo RS, Fowler DM, Kim HJ, Hiatt JB, Shendure J, Brzovic PS, Fields S, Klevit RE. Activity-enhancing mutations in an E3 ubiquitin ligase identified by high-throughput mutagenesis. Proc Natl Acad Sci U S A. 2013 Mar 18. download pdf
Enrich: Software for Analysis of Protein Function by Enrichment and Depletion of Variants
We developed Enrich, a tool for analyzing deep mutational scanning data. Enrich identifies all unique variants (mutants) of a protein in high-throughput sequencing data sets and can correct for sequencing errors using overlapping paired-end reads. Enrich uses the frequency of each variant before and after selection to calculate an enrichment ratio, which is used to estimate fitness. Enrich provides an interactive interface to guide users. It generates user-accessible output for downstream analyses as well as several visualizations of the effects of mutation on function, thereby allowing the user to rapidly quantify and comprehend sequencefunction relationships. Enrich is implemented in Python, is available under a FreeBSD license and can be downloaded here. Enrich includes detailed documentation as well as a small example data set.
Fowler DM, Araya CL, Gerard W, Fields S. Enrich: Software for Analysis of Protein Function by Enrichment and Depletion of Variants. Bioinformatics. 2011 Oct 17. [Epub ahead of print]
Deep Mutational Scanning to Analyze Protein Function
Understanding the functional and biophysical characteristics of proteins is of paramount importance. We have developed a method, deep mutational scanning (Figure 1), that makes use of protein display technology in conjunction with high-throughput sequencing. Deep mutational scanning enables the investigation of protein function on an unprecedented scale, facilitating the simultaneous measurement of the fitness of hundreds of thousands of mutants of a protein.
Protein display technologies physically link proteins and the DNA sequences that encode them. Protein display allows for selection among a large library of protein variants for those with a protein function. Protein display technology has been restricted in scope by the requirement for back-end DNA sequencing, which has limited the number of selected protein variants that can be identified to a few hundred. Deep mutational scanning alleviates this bottleneck by using high-throughput sequencing to sequence tens of millions of individual library members in parallel (Figure 1). The primary benefit of this approach is that millions of protein variants can be simultaneously identified and counted. Comparison of the frequency of a given variant in a selected library and in the input library yields an enrichment ratio that is an estimate of function. The key ingredientsprotein display, low-intensity selection and highly accurate, high throughput sequencingare simple and becoming widely available. Deep mutational scanning data can be used to construct protein sequencefunction maps, and systematic analysis of deep mutational scanning data can reveal fundamental protein properties. We have applied deep mutational scanning to a number of proteins in a variety of functional assays.
Systematic Analysis of Large Scale Fitness Data to Identify Mutations that Stabilize Proteins
Enhancing protein stability is often critical for industrial and pharmaceutical applications. Stabilizing mutations permit acquisition of other, destabilizing mutations that improve function. This phenomenon can be observed as epistasis, where multiple mutations combine with unpredictable fitness effects. We identify stabilizing mutations in a WW domain based solely on parallel measurement of the fitness of 47,000 variants to bind to a peptide ligand and subsequent calculation of >5,000 epistasis scores (Figure 2A). We introduce an epistasis-based metric, “partner potentiation,” which identified 15 candidate stabilizing mutations, including three known stabilizing mutations (Figure 2B). We tested six novel candidates by thermal denaturation and found two highly stabilizing mutations, one more stabilizing than any previously known mutation. Thus, systematic analysis of large-scale protein fitness data can reveal fundamental physicochemical properties such as stability.
•Doug Fowler & Carlos Araya
Araya CL, Fowler DM, Chen W, Muniez I, Kelly JW, Fields S. A fundamental protein property, thermodynamic stability, revealed solely from large-scale measurements of protein function. Proc Natl Acad Sci U S A. 2012 Oct 16;109(42):16858-63. download pdf
Understanding the Molecular Basis of Selectivity in the Protein Kinase A/AKAP-79 interaction
(with the laboratory John Scott, HHMI and Dept. Pharmacology, University of Washington)
Protein Kinase A (PKA) is a central intracellular protein kinase that regulates the activity of many proteins involved in cellular metabolism. PKA activity is controlled via interactions with A Kinase Anchoring Proteins (AKAPs). AKAPs function by binding to the PKA regulatory subunit, localizing PKA within the cell. AKAPs can interact with either the alpha or the beta isoform of the regulatory subunit of PKA, or they can interact with both. The alpha and beta isoforms are highly similar, making it difficult to study the molecular determinants of selectivity between isoforms (Figure 3).
We are using phage display in combination with high-throughput sequencing to identify the sequence determinants of AKAP selectivity. We displayed a library of millions of mutagenized AKAP proteins on the surface of T7 phage and then subjected this library to selection against either the alpha or beta isoform of the regulatory subunit of PKA. By comparing the abundance of each variant before and after selection, we derived enrichment ratios for several hundred thousand variants. Most variants performed similarly in selections against both the alpha and beta isoforms. However, some variants displayed strong selectivity for either the alpha or beta isoform. We are using the results of this assay to develop highly alpha- and beta-specific AKAPs. These highly specific AKAPs will bind only to PKAs with the cognate regulatory isoform. If introduced into cells at high concentrations, they will disrupt the normal regulatory interaction for their cognate isoform, enabling us to study the biological significance of the isoforms.
•Doug Fowler & Jason Stephany