Many of the projects described below rely on a method developed in our lab called Deep Mutational Scanning (DMS). For a summary of the assay, please click here.
Published paper on Deep Mutational Scanning:
Fowler DM, Fields S. Deep mutational scanning: a new style of protein science. Nature Methods. 2014 Aug;11(8):801-7.
Towards Testing all 35,397 Possible Missense Variants of BRCA1 for Function
BRCA1 is a breast and ovarian cancer-specific tumor suppressor gene and has been subject to much diagnostic sequencing. Multiple cancer-predisposing mutations have been identified along with >500 missense variants classified as Variants of Uncertain Significance or VUS. BRCA1 is an 1863 amino acid protein with two recognizable domains. The N-terminus contains a RING domain and is part of an active ubiquitin ligase and the C-terminus has tandem BRCT (BRCA1 C-Terminus) repeats that bind to phosphorylated peptides and activate transcription. BRCA1 is required for double-strand DNA break repair via homologous recombination, and mutations throughout the protein have deleterious effects on this function. We have devised several assays to score all of the 35,397 possible missense variants of BRCA1 for effects on the protein’s biochemical and cellular functions using a method of deep mutational scanning
We scored 2413 of the possible 5757 missense variants (40%) of the N-terminal 304 amino acids of BRCA1 for ubiquitin ligase function using a phage display system that selects for active variants in an in vitro autoubiquitination reaction. Within these variants 57 have been identified in patients as VUS. The range of ubiquitin ligase function of the VUS variants varied from nearly completely inactive to fully functional, suggesting that some of the variants of BRCA1 that are classified as VUS are nonfunctional ubiquitin ligases.
To assess the effect of mutation on ubiquitin ligase function a library of coding variants of the RING domains of BRCA1-BARD1 is fused to the T7 bacteriophage coat protein. The E3-phage are subjected to in vitro ubiquitination reactions followed by selection for phage coding for active E3 ligase (as outlined in the flow diagram below). Phages harboring active E3 ligases increase in abundance throughout selection whiles phages harboring E3 ligases with deleterious mutations decrease in abundance. These changes are measured by sequencing the input and selected populations. Enrichment ratios (E) are calculated by dividing the frequency at which each variant occurs in the selected population by its frequency in the input population.
We then compared the Enrichment ratio (E) scores for each variant from the deep mutational scan of the RING domain of BRCA1 to the BRCA1 informational database classification. 2356 variants were never found in the human population and, as expected, the E scores for these variants ranged from completely inactive to highly active. 57 of the variants were classified as VUS and many of these are nonfunctional ubiquitin ligases in our assay.
Finally, we are using a cell-based assay to score the effect of missense mutations in full-length BRCA1 on the ability of these variant BRCA1 proteins to rescue homologous recombination when the endogenous protein is repressed. To this end we are optimizing the molecular manipulations to build the libraries of variants with single amino acid changes into lentiviral vectors to transduce into a homologous recombination-reporter cell line.
•Lea Starita & Justin Gullingsrud
High-throughput Analysis of a Protein Degradation Signal
Determining the half-life of proteins is critical for an understanding of virtually all cellular processes. Current methods for measuring in vivo protein stability, including large-scale approaches, are limited in their throughput or in their ability to discriminate among small differences in stability. We developed a new method, Stable-seq, which uses a simple genetic selection combined with high-throughput DNA sequencing to assess the in vivo stability of a large number of variants of a protein. The variants are fused to a metabolic enzyme, which here is the yeast Leu2 protein. Plasmids encoding these Leu2 fusion proteins are transformed into yeast, with the resultant fusion proteins accumulating to different levels based on their stability and leading to different doubling times when the yeast are grown in the absence of leucine. Sequencing of an input population of variants of a protein and the population of variants after leucine selection allows the stability of tens of thousands of variants to be scored in parallel. By applying the Stable-seq method to variants of the protein degradation signal Deg1 from the yeast Matα2 protein, we generated a high-resolution map that reveals the effect of ~30,000 mutations on protein stability. The scores determined by Stable-seq of variants carrying single mutations are visualized in this heat map, with cell growth rates that would correspond to scores shown in the inset boxes.
We identified mutations that likely affect stability by changing the activity of the degron, by leading to translation from new start codons, or by affecting N-terminal processing. Stable-seq should be applicable to other organisms via the use of suitable reporter proteins, as well as to the analysis of complex mixtures of fusion proteins.
•Griffin Kim, Christina Miller & David Young
Kim I, Miller CR, Young DL, Fields S. High-throughput Analysis of in vivo Protein Stability. Mol Cell Proteomics. 2013 Jul 29. [Epub ahead of print]
Stable-Seq: A New Approach to Define the Specificity of E3 Ligases to Substrates of the Ubiquitin Proteasome System
The ubiquitin proteasome system (UPS) is a complex pathway in which hundreds of regulatory proteins are involved in recognizing protein substrates, tagging them with ubiquitin, and degrading them by the proteasome. A deeper understanding of the regulatory mechanism of this system is key to developing treatments for UPS-related diseases, such as cancer and neurodegenerative disorders. A fundamental question in this field is the determination of which substrates are processed by which regulators. Among more than 100 E3 ligases in yeast, which are primarily responsible for substrate recognition, only a few substrates have been assigned to specific E3 ligases. To delineate substrate specificity of these E3 ligases, we are applying a method we have developed, called Stable-seq. In this method, we fuse either a random sequence or an open reading frame (ORF) to a nutritional marker. The random sequence or ORF determines the stability of the fusion, such that selection for the nutritional marker leads to differential growth rates. The cells grow slower in a wild type strain in which the degradation signal or ORF is unstable, but they grow faster in a strain lacking an E3 enzyme that is crucial for the degradation.
To this end, we fused random sequences (a stretch of 20 NNK codons) to the LEU2 gene and assayed by Stable-seq. The selection plate (-Leu Ura) shows that the random sequence fusion library results in differential growth rates in the wild type strain (Figure 1), and synthetic degradation signals (synD) identified by the high-throughput sequencing and analysis from a pilot experiment have been confirmed by spotting assay (Figure 2).
At the same time, nearly every yeast ORF was transferred from a movable-ORF (MORF) library to a destination vector, which fuses them to the LEU2 gene by Gateway cloning. To determine how well this proxy for stability works in yeast knockout (YKO) strains, we tested the Deg1-Leu2 fusion. Deg1-Leu2 fusion plasmids were transformed into a pool of 130 YKO strains. The deletion of the DOA10 gene, encoding the relevant E3 enzyme, resulted in the greatest increase in stability (Figure 3). We also tested a small library of 30 ORF-Leu2 fusions. By assaying the library in a doa10 deletion strain, we identified potential substrates whose stability increased compared to that in the wild type strain (Figure 4). Stable-seq may enable a proteome-wide effort to measure in vivo protein stability and to pair E3 enzymes with their substrates.
•Griffin Kim & Christina Miller
Uncovering the Structural Basis of GPCR Functional Selectivity through Deep Mutational Scanning
G Protein Coupled Receptors (GPCRs) are a diverse family of plasma membrane bound proteins that all share 7 transmembrane helices, 3 intracellular loops, and 3 extracellular loops. There are close to 800 human GPCR genes which are responsible for a large proportion of the cellular communication in our species. Approximately 369 of these are non-sensory, making them current or potential drug targets. An estimated 40-60% of current therapeutic drugs target at least one GPCR, so advances in our understanding of signal transduction through GPCRs have potentially widespread clinical ramifications.
It has recently become clear that for any given G Protein Coupled Receptor (GPCR), multiple signaling pathways might be activated and multiple mechanisms might lead to receptor internalization at different rates depending on the specific ligand being used. This phenomenon is called “functional selectivity” or “biased agonism,” and its molecular and structural basis is only just starting to be elucidated. Though recent advances in crystallographic techniques have lead to an increasing number of structures for both active and inactive GPCRs, much remains unclear about the mechanisms of functional selectivity.
We are currently developing a set of high throughput assays for interrogating the effects of all single mutations in a GPCR on receptor expression, internalization, and signaling in a system that has already been shown to display functional selectivity: the Mu Opioid Receptor (MOR). This receptor is clinically important, as it’s the major target of opioid analgesics. Functionally distinct opioid agonists result in different amounts of tolerance development. Also, it is thought that several of the negative side effects of opioids, such as constipation and respiratory depression, might be mediated by a different pathway than their analgesic effects, which would make functional selectivity in this receptor particularly interesting clinically.
We have created a normalizable mammalian expression system for the MOR and cloned several mutants with known defects in cell surface expression into a lentiviral vector. We have demonstrated the feasibility of separating mutants based on their surface expression using a fluorescent antibody and flow cytometry. By binning cells by the amount of fluorescence, we can separate poorly expressed mutants from highly expressed mutants, and we can determine the contents of each bin by sequencing. We are currently generating a library of all single mutants of the MOR. Additional assays for receptor internalization and inhibition of calcium release will be developed after the library is created and functional selectivity will be examined by comparing results between assays using different agonists. Data generated in this project will complement structural data based on NMR and X ray crystallography by providing a functional map to overlay on the spatial one.
DNA shuffling methods for identify functional protein residues
DNA shuffling is a method by which similar sequences are fragmented and reassembled to create chimeric versions of the two sequences. These chimeras can be screened or selected for function and sequenced, allowing the identification of the residues responsible for the phenotype of interest by analyzing allele frequencies. We are applying these methods in yeast to study protein co-evolution with regard to the competition between virus and host defense, and mapping quantitative trait loci at the level of functional nucleotides.
Human cells employ various mechanisms to combat viral infection. For instance, in response to double-stranded RNA, protein kinase R (PKR) stops translation by phosphorylating the translation initiation factor EIF2a. To avoid this cellular response, poxviruses express proteins that mimic EIF2a, like the vaccinia protein K3L, and PKR phosphorylates these instead of EIF2a. Elde et al. (Nature 2009) showed that these proteins are under fast positive selection, and that there is a differential response across the primate lineage for response to EIF2a mimics. Elde et al. defined the residues in PKR that are responsible for the divergence between human, gibbon, and orangutan PKR using yeast expression to measure the effect of K3L on PKR. We are coupling DNA shuffling to further map the competition between K3L and PKR, first by shuffling the human and gibbon PKRs and comparing our results after screening for K3L evasion.
We are also applying DNA shuffling to finely map quantitative trait loci. After coarsely mapping QTLs, a number of further genetic analyses -- like reciprocal hemizygosity mapping, allele swapping, and site-directed mutagenesis are usually used to finely map the locus to identify the specific variants causing the phenotype. Expressing a library of chimeric sequences in a knockout strain should accomplish most of these analyses. We are currently applying this methodology to a mapped QTL for ammonium toxicity.
In Vivo Deep Mutational Scanning of an RNA-Recognition Motif (RRM)
Throughout its life, an RNA molecule associates with diverse RNA-binding proteins that regulate its processing and function. A single RNA-binding protein typically recognizes a particular subset of RNA molecules and affects their collective fate by regulating one or more steps in RNA metabolism, from pre-mRNA splicing to mRNA localization, translation and decay. Since these functions underlie multiple fundamental cellular processes, genetic changes that disrupt RNA-binding protein function can lead to multifaceted human pathologies. We are using deep mutational scanning, an experimental strategy that couples high throughput DNA sequencing with assays of protein function, to study the effects of sequence variations on the function of a common RNA-binding domain called the RNA Recognition Motif (RRM). Specifically, we made use of the necessity of a functional poly(A)-binding protein (Pab1) for yeast growth and survival to test the in vivo effects of numerous mutations in the Pab1 RRM2 domain (Figure 1). In this system, the endogenous yeast PAB1 gene has been deleted and replaced with a plasmid expressing the wild-type Pab1 from a tetracycline-regulated promoter. A second plasmid in these cells expresses one of many variants carrying random mutations in the Pab1 RRM2. Adding a tetracycline analog to the culture shuts off the expression of the wild-type gene, making the cells completely reliant on the mutant Pab1 performance for growth. High throughput sequencing of the library variants before and after addition of the tetracycline analog allows us to measure the change in frequency of each variant, which in turn can be used as a proxy for the function of the mutant Pab1 RRM domain.
One of the major outputs of this experiment is a single amino acid substitution matrix representing all possible 19 single amino acid substitutions at each residue in the RRM2 domain of Pab1 (Figure 2). This matrix points to the β-strands as the most important for the in vivo function of RRM2, which agrees with their essential role in poly(A) binding.
To gain a better understanding of Pab1 RRM2 function, we observed the ratio scores of about 200 single amino acid substitutions that occur in other Pab1 homologous sequences (Figure 3). These scores, which can be viewed also as the output of a large-scale inter-species complementation assay, revealed that while most of the natural changes were neutral in their effects, a few substitutions were deleterious. Mapping these mutations on the RRM2 structure revealed that most of them affect residues at the protein surface. We suspected that this approach allowed the identification of protein interaction sites that diverged throughout evolution. Indeed, we found that about half of these mutations interfere with the interaction between Pab1 and the translation initiation factor eIF4G.
Overall, we suggest that extracting functional scores of naturally occurring substitutions from deep mutational scanning experiments can facilitate the identification of surface residues that were likely to co-evolve with their binding partner.
•Daniel Melamed, David Young & Christina Miller
Melamed D, Young DL, Gamble CE, Miller CR, Fields S. Deep mutational scanning of an RRM domain of the Saccharomyces cerevisiae poly(A)-binding protein. RNA. 2013 Nov;19(11):1537-51. Epub 2013 Sep 24. download pdf
Using the Yeast Mating Pathway as a Model for Complex Trait Genetics
Uncovering the genetic underpinnings of complex traits has proven difficult. From crop yield to autism, variants identified in genome-wide association studies (GWAS) explain only a small fraction of the heritable phenotypic variation, leaving a significant gap in our understanding. Using the mating pathway of Saccharomyces cerevisiae (Fig. A), we seek to develop a model for testing hypotheses about complex trait genetics. For example: Does most variation underlying complex traits act additively or epistatically? What proportion of mutational effects are subject to environment? Do known genetic modifiers like the chaperone Hsp90 act on this variation? We make controlled modifications to the genetic architecture of mating and examine phenotypic output to develop expectations for the translation of genotype to phenotype. To do so, we utilize deep mutational scanning, a method that links a phenotypic output to a library of genetic variants via high-throughput sequencing. This method allows us to identify small-effect mutations in individual genes, as well as combinatorial effects of many small-effect mutations across multiple genes.
Effects of mutations in individual mating pathway components (Fig. B) are systematically determined by introducing tens of thousands of protein variants into large populations of yeast which are then subjected to selection for mating efficiency (Fig. C). Furthermore, variants are tested in the absence of strong genetic modifiers like the protein chaperone Hsp90 as well as under varying stress conditions to uncover variants with genetic and environmental dependencies, respectively. After determining individual effects of very large pools of variants, we test mutant libraries for each mating gene in combination (Fig. D) in order to empirically determine the role of epistasis between mating genes. This design allows us to comprehensively show how additive genetic variation, epistatic interactions, and environmental factors contribute to a complex trait.