High-throughput Analysis of a Protein Degradation Signal
Determining the half-life of proteins is critical for an understanding of virtually all cellular processes. Current methods for measuring in vivo protein stability, including large-scale approaches, are limited in their throughput or in their ability to discriminate among small differences in stability. We developed a new method, Stable-seq, which uses a simple genetic selection combined with high-throughput DNA sequencing to assess the in vivo stability of a large number of variants of a protein. The variants are fused to a metabolic enzyme, which here is the yeast Leu2 protein. Plasmids encoding these Leu2 fusion proteins are transformed into yeast, with the resultant fusion proteins accumulating to different levels based on their stability and leading to different doubling times when the yeast are grown in the absence of leucine. Sequencing of an input population of variants of a protein and the population of variants after leucine selection allows the stability of tens of thousands of variants to be scored in parallel. By applying the Stable-seq method to variants of the protein degradation signal Deg1 from the yeast Matα2 protein, we generated a high-resolution map that reveals the effect of ~30,000 mutations on protein stability. The scores determined by Stable-seq of variants carrying single mutations are visualized in this heat map, with cell growth rates that would correspond to scores shown in the inset boxes.
We identified mutations that likely affect stability by changing the activity of the degron, by leading to translation from new start codons, or by affecting N-terminal processing. Stable-seq should be applicable to other organisms via the use of suitable reporter proteins, as well as to the analysis of complex mixtures of fusion proteins.
•Griffin Kim, Christina Miller & David Young
Kim I, Miller CR, Young DL, Fields S. High-throughput Analysis of in vivo Protein Stability. Mol Cell Proteomics. 2013 Jul 29.
Stable-Seq: A New Approach to Define the Specificity of E3 Ligases to Substrates of the Ubiquitin Proteasome System
The ubiquitin proteasome system (UPS) is a complex pathway in which hundreds of regulatory proteins are involved in recognizing protein substrates, tagging them with ubiquitin, and degrading them by the proteasome. A deeper understanding of the regulatory mechanism of this system is key to developing treatments for UPS-related diseases, such as cancer and neurodegenerative disorders. A fundamental question in this field is the determination of which substrates are processed by which regulators. Among more than 100 E3 ligases in yeast, which are primarily responsible for substrate recognition, only a few substrates have been assigned to specific E3 ligases. To delineate substrate specificity of these E3 ligases, we are applying a method we have developed, called Stable-seq. In this method, we fuse either a random sequence or an open reading frame (ORF) to a nutritional marker. The random sequence or ORF determines the stability of the fusion, such that selection for the nutritional marker leads to differential growth rates. The cells grow slower in a wild type strain in which the degradation signal or ORF is unstable, but they grow faster in a strain lacking an E3 enzyme that is crucial for the degradation.
To this end, we fused random sequences (a stretch of 20 NNK codons) to the LEU2 gene and assayed by Stable-seq. The selection plate (-Leu Ura) shows that the random sequence fusion library results in differential growth rates in the wild type strain (Figure 1), and synthetic degradation signals (synD) identified by the high-throughput sequencing and analysis from a pilot experiment have been confirmed by spotting assay (Figure 2).
At the same time, nearly every yeast ORF was transferred from a movable-ORF (MORF) library to a destination vector, which fuses them to the LEU2 gene by Gateway cloning. To determine how well this proxy for stability works in yeast knockout (YKO) strains, we tested the Deg1-Leu2 fusion. Deg1-Leu2 fusion plasmids were transformed into a pool of 130 YKO strains. The deletion of the DOA10 gene, encoding the relevant E3 enzyme, resulted in the greatest increase in stability (Figure 3). We also tested a small library of 30 ORF-Leu2 fusions. By assaying the library in a doa10 deletion strain, we identified potential substrates whose stability increased compared to that in the wild type strain (Figure 4). Stable-seq may enable a proteome-wide effort to measure in vivo protein stability and to pair E3 enzymes with their substrates.
•Griffin Kim & Christina Miller
DNA shuffling methods for identify functional protein residues
DNA shuffling is a method by which similar sequences are fragmented and reassembled to create chimeric versions of the two sequences. These chimeras can be screened or selected for function and sequenced, allowing the identification of the residues responsible for the phenotype of interest by analyzing allele frequencies. We are applying these methods in yeast to study protein co-evolution with regard to the competition between virus and host defense, and mapping quantitative trait loci at the level of functional nucleotides.
Human cells employ various mechanisms to combat viral infection. For instance, in response to double-stranded RNA, protein kinase R (PKR) stops translation by phosphorylating the translation initiation factor EIF2a. To avoid this cellular response, poxviruses express proteins that mimic EIF2a, like the vaccinia protein K3L, and PKR phosphorylates these instead of EIF2a. Elde et al. (Nature 2009) showed that these proteins are under fast positive selection, and that there is a differential response across the primate lineage for response to EIF2a mimics. Elde et al. defined the residues in PKR that are responsible for the divergence between human, gibbon, and orangutan PKR using yeast expression to measure the effect of K3L on PKR. We are coupling DNA shuffling to further map the competition between K3L and PKR, first by shuffling the human and gibbon PKRs and comparing our results after screening for K3L evasion.
We are also applying DNA shuffling to finely map quantitative trait loci. After coarsely mapping QTLs, a number of further genetic analyses -- like reciprocal hemizygosity mapping, allele swapping, and site-directed mutagenesis are usually used to finely map the locus to identify the specific variants causing the phenotype. Expressing a library of chimeric sequences in a knockout strain should accomplish most of these analyses. We are currently applying this methodology to a mapped QTL for ammonium toxicity.
In Vivo Deep Mutational Scanning of an RNA-Recognition Motif (RRM)
Throughout its life, an RNA molecule associates with diverse RNA-binding proteins that regulate its processing and function. A single RNA-binding protein typically recognizes a particular subset of RNA molecules and affects their collective fate by regulating one or more steps in RNA metabolism, from pre-mRNA splicing to mRNA localization, translation and decay. Since these functions underlie multiple fundamental cellular processes, genetic changes that disrupt RNA-binding protein function can lead to multifaceted human pathologies. We are using deep mutational scanning, an experimental strategy that couples high throughput DNA sequencing with assays of protein function, to study the effects of sequence variations on the function of a common RNA-binding domain called the RNA Recognition Motif (RRM). Specifically, we made use of the necessity of a functional poly(A)-binding protein (Pab1) for yeast growth and survival to test the in vivo effects of numerous mutations in the Pab1 RRM2 domain (Figure 1). In this system, the endogenous yeast PAB1 gene has been deleted and replaced with a plasmid expressing the wild-type Pab1 from a tetracycline-regulated promoter. A second plasmid in these cells expresses one of many variants carrying random mutations in the Pab1 RRM2. Adding a tetracycline analog to the culture shuts off the expression of the wild-type gene, making the cells completely reliant on the mutant Pab1 performance for growth. High throughput sequencing of the library variants before and after addition of the tetracycline analog allows us to measure the change in frequency of each variant, which in turn can be used as a proxy for the function of the mutant Pab1 RRM domain.
One of the major outputs of this experiment is a single amino acid substitution matrix representing all possible 19 single amino acid substitutions at each residue in the RRM2 domain of Pab1 (Figure 2). This matrix points to the β-strands as the most important for the in vivo function of RRM2, which agrees with their essential role in poly(A) binding.
To gain a better understanding of Pab1 RRM2 function, we observed the ratio scores of about 200 single amino acid substitutions that occur in other Pab1 homologous sequences (Figure 3). These scores, which can be viewed also as the output of a large-scale inter-species complementation assay, revealed that while most of the natural changes were neutral in their effects, a few substitutions were deleterious. Mapping these mutations on the RRM2 structure revealed that most of them affect residues at the protein surface. We suspected that this approach allowed the identification of protein interaction sites that diverged throughout evolution. Indeed, we found that about half of these mutations interfere with the interaction between Pab1 and the translation initiation factor eIF4G.
Overall, we suggest that extracting functional scores of naturally occurring substitutions from deep mutational scanning experiments can facilitate the identification of surface residues that were likely to co-evolve with their binding partner.
•Daniel Melamed, David Young & Christina Miller
Melamed D, Young DL, Gamble CE, Miller CR, Fields S. Deep mutational scanning of an RRM domain of the Saccharomyces cerevisiae poly(A)-binding protein. RNA. 2013 Nov;19(11):1537-51. Epub 2013 Sep 24. download pdf
Transcriptional Engineering of Ethanol-Tolerant Yeast Strains
Alcohols cause pleiotropic cellular stress by disrupting the cell membrane and non-specifically destabilizing proteins. In yeast over 1000 genes have been implicated in increasing alcohol tolerance. Given such complexity, methods like transcriptional engineering that modulate cellular processes genome-wide are ideal tools to analyze this trait. In 2006, Alper and colleagues showed that variants of the yeast TATA-binding protein (Spt15) could improve viability at 6% ethanol. Spt15 regulates the expression of nearly all genes, so while its variants modulate many genes that are necessary for alcohol tolerance, they likely have off-target and possibly deleterious effects. To examine the possibility that variants of less ubiquitous transcription factors can also be used to increase ethanol tolerance, we created libraries consisting of over one million variants for three alcohol-responsive yeast transcription factors, Asr1, Msn2 and Msn4 and selected yeast containing these factors at 7.5% ethanol.
Many non-synonymous and frameshift mutations in the ASR1 and MSN genes enriched over the course of selection. We are continuing selections and confirming the tolerance of highly-enriched mutations. After this confirmation, we plan to use RNA-sequencing to analyze the transcriptional changes underlying the tolerance phenotype, in an effort to elucidate the molecular basis of yeast ethanol tolerance. We also believe that, if successful, this approach could be used to investigate the molecular basis of other complex traits.
Deep Mutational Scanning to Analyze Protein Function
Understanding the functional and biophysical characteristics of proteins is of paramount importance. We have developed a method, deep mutational scanning (Figure 1), that makes use of protein display technology in conjunction with high-throughput sequencing. Deep mutational scanning enables the investigation of protein function on an unprecedented scale, facilitating the simultaneous measurement of the fitness of hundreds of thousands of mutants of a protein.
Protein display technologies physically link proteins and the DNA sequences that encode them. Protein display allows for selection among a large library of protein variants for those with a protein function. Protein display technology has been restricted in scope by the requirement for back-end DNA sequencing, which has limited the number of selected protein variants that can be identified to a few hundred. Deep mutational scanning alleviates this bottleneck by using high-throughput sequencing to sequence tens of millions of individual library members in parallel (Figure 1). The primary benefit of this approach is that millions of protein variants can be simultaneously identified and counted. Comparison of the frequency of a given variant in a selected library and in the input library yields an enrichment ratio that is an estimate of function. The key ingredientsprotein display, low-intensity selection and highly accurate, high throughput sequencingare simple and becoming widely available. Deep mutational scanning data can be used to construct protein sequencefunction maps, and systematic analysis of deep mutational scanning data can reveal fundamental protein properties. We have applied deep mutational scanning to a number of proteins in a variety of functional assays.
Systematic Analysis of Large Scale Fitness Data to Identify Mutations that Stabilize Proteins
Enhancing protein stability is often critical for industrial and pharmaceutical applications. Stabilizing mutations permit acquisition of other, destabilizing mutations that improve function. This phenomenon can be observed as epistasis, where multiple mutations combine with unpredictable fitness effects. We identify stabilizing mutations in a WW domain based solely on parallel measurement of the fitness of 47,000 variants to bind to a peptide ligand and subsequent calculation of >5,000 epistasis scores (Figure 2A). We introduce an epistasis-based metric, “partner potentiation,” which identified 15 candidate stabilizing mutations, including three known stabilizing mutations (Figure 2B). We tested six novel candidates by thermal denaturation and found two highly stabilizing mutations, one more stabilizing than any previously known mutation. Thus, systematic analysis of large-scale protein fitness data can reveal fundamental physicochemical properties such as stability.
Understanding the Molecular Basis of Selectivity in the Protein Kinase A/AKAP-79 interaction
(with the laboratory John Scott, HHMI and Dept. Pharmacology, University of Washington)
Protein Kinase A (PKA) is a central intracellular protein kinase that regulates the activity of many proteins involved in cellular metabolism. PKA activity is controlled via interactions with A Kinase Anchoring Proteins (AKAPs). AKAPs function by binding to the PKA regulatory subunit, localizing PKA within the cell. AKAPs can interact with either the alpha or the beta isoform of the regulatory subunit of PKA, or they can interact with both. The alpha and beta isoforms are highly similar, making it difficult to study the molecular determinants of selectivity between isoforms.
We are using phage display in combination with high-throughput sequencing to identify the sequence determinants of AKAP selectivity. We displayed a library of millions of mutagenized AKAP proteins on the surface of T7 phage and then subjected this library to selection against either the alpha or beta isoform of the regulatory subunit of PKA. By comparing the abundance of each variant before and after selection, we derived enrichment ratios for several hundred thousand variants. Most variants performed similarly in selections against both the alpha and beta isoforms. However, some variants displayed strong selectivity for either the alpha or beta isoform. We are using the results of this assay to develop highly alpha- and beta-specific AKAPs. These highly specific AKAPs will bind only to PKAs with the cognate regulatory isoform. If introduced into cells at high concentrations, they will disrupt the normal regulatory interaction for their cognate isoform, enabling us to study the biological significance of the isoforms.
•Doug Fowler & Jason Stephany
Enrich: Software for Analysis of Protein Function by Enrichment and Depletion of Variants
We developed Enrich, a tool for analyzing deep mutational scanning data. Enrich identifies all unique variants (mutants) of a protein in high-throughput sequencing data sets and can correct for sequencing errors using overlapping paired-end reads. Enrich uses the frequency of each variant before and after selection to calculate an enrichment ratio, which is used to estimate fitness. Enrich provides an interactive interface to guide users. It generates user-accessible output for downstream analyses as well as several visualizations of the effects of mutation on function, thereby allowing the user to rapidly quantify and comprehend sequencefunction relationships. Enrich is implemented in Python, is available under a FreeBSD license and can be downloaded here. Enrich includes detailed documentation as well as a small example data set.
•Doug Fowler & Carlos Araya
Fowler DM, Araya CL, Gerard W, Fields S. Enrich: Software for Analysis of Protein Function by Enrichment and Depletion of Variants. Bioinformatics. 2011 Oct 17.
Activity-Enhancing Mutations in an E3 Ubiquitin Ligase Discovered by Deep Mutational Scanning
Although ubiquitination plays a critical role in virtually all cellular processes, understanding of the mechanistic details of ubiquitin transfer is still rudimentary. To identify the molecular determinants with E3 ligases that modulate activity, we developed a high-throughput assay (Figure 1) to measure the activity of nearly 100,000 protein variants of the U-box domain of murine Ube4b and found rare mutations that enhanced activity both in vitro and in cellular p53 degradation assays. Our results highlight the utility of high-throughput mutagenesis in delineating the molecular basis of enzyme activity.
Investigating the HIV-1 Tat-TAR Interaction
The HIV-1 Tat protein is integral to the viral life-cycle as it can induce efficient transcription of the virus after binding a folded element of the HIV LTR called TAR. Previous studies have elucidated the effects of some mutations of Tat, but the overall depth and density of the studied mutations is low. We are investigating the Tat-TAR interaction using deep mutational scanning, a high-throughput technology recently developed in the lab.
By creating a library of hundreds of thousands of variants of Tat and selecting for binding to TAR using a yeast three-hybrid assay, we are examining the relationship between the sequence of Tat and its TAR-binding function at an unprecedented resolution. The Tat-TAR interaction is thought to be driven by an enrichment of basic residues in the core of the protein rather than a specific amino acid sequence, but it is not known if point mutations outside of this core region can affect the TAR interaction. This study of mutations that affect the affinity of Tat to TAR can contribute to our understanding of both protein-RNA interactions, as well as the mechanism of HIV transcription and activation.
•Daniel Melamed, Matt Rich & Christina Miller
Length-Agnostic, Barcode-Directed Assembly of Gene Haplotypes
Barcoding and in silico assembly of mutagenized sequences has expanded the scale at which DNA sequences can be functionally assayed, as sequence spanning multiple short, high-throughput sequencing reads can be assembled into a single allele using unique barcodes (Patwardhan, Hiatt, et al. Nat Biotechnology (2012)). Although this method enabled the resolution of haplotypes of sequences of up to approximately one kilobase (the cluster-generating limit of an Illumina sequencer), current techniques require further molecular biological manipulations, such as the combinatorial removal of internal regions, to resolve the haplotypes of longer sequences. We are developing a method for barcoded assembly of sequences that circumvents this 1 kb limit. We linearize the plasmid containing our barcoded gene of interest upstream of the gene's promoter, then use an endonuclease to digest the fragment from each end. After removing the remaining plasmid backbone, we are left with many barcoded fragments of different lengths. Recircularization brings the endonuclease-digested end of the gene proximal to the barcode. Single-ended, short PCR followed by single-stranded ligation creates sequences that can be clustered and sequenced on an Illumina high-throughput sequencer. These reads can then be merged by their barcode sequences. We will apply this technique first to the selections for the function of large, mutagenized yeast transcription factors, like those encoded by the ADR1 and MSN2 genes, but the technique should be generalizable to many applications that require the resolved haplotypes of large genes.
Functional Screening of Soil Metagenomic Libraries
Most of the genomes of environmental microorganisms are inaccessible because they cannot be cultured in the lab by standard techniques. However, we can access the genomes of these unculturable organisms by extracting DNA directly from environmental samples and creating metagenomic DNA libraries. Using this approach, DNA from thousands of environmental microbes can be functionally screened for a variety of abilities. We are pursuing 2 different strategies to create and functionally screen environmental DNA libraries.
Using standard methods to functionally screen DNA libraries in an E. coli host, we are investigating the antibiotic resistance mechanisms coded in uncultured soil microorganisms. In particular, we have identified sequences from an environmental DNA library that allow growth of an E. coli host in the presence of 6 different antibiotics that function by targeting varied cellular pathways. We have found new sequences coding for many families of antibiotic resistance proteins including the antibiotic modifying enzymes rifampin ADP-ribosylases and aminoglycoside acetyltransferases, transporter proteins that are able to pump antibiotics out of the cell, as well as proteins like dihydrofolate reductases that are able to evade antibiotics when exogenously expressed in E. coli. We hope to use these new sequences to learn more about the evolution and functions of these protein families.
We are additionally interested in developing new ways to screen environmental DNA libraries in order to overcome the limitations associated with functional screening in a laboratory host. Standard screening requires that a heterologously expressed protein is functional in the host bacteria, and also requires the availability of an assay to test the function of interest on a large scale. Cloning an environmental DNA library into a phage backbone and screening the phage library via affinity selection would allow more permissive and efficient screening since protein domains that bind to a substrate of interest could be recovered without relying on the function of the encoded protein in a foreign host. This screening strategy will be widely applicable to a variety of binding and catalytic functions. We hope to use affinity selection of a metagenomic phage display library to search for antibiotic resistance proteins and inhibitors of these resistance proteins.
Figure 1. We are using two strategies to functionally screen soil metagenomic DNA libraries.
Figure 2. Antibiotic resistance profiling of a soil metagenomic library. A. Number of resistant clones recovered against each antibiotic. A library of 1.4e06 clones with an average insert size of 1.5 kb was screened against 6 antibiotics. A total of 41 resistant clones have been identified. B. Distribution of amino acid identities for 41 resistance genes recovered from soil samples compared to the most similar gene from any organism in GenBank.
High Throughput Proteome-Wide Search for Ubiquitinated Proteins in Yeast
Our goal is to improve upon existing biochemical approaches for finding sites of ubiquitin attachment on all yeast proteins.
Presently, the approach is to purify proteins with an 8xHis-tagged ubquitin followed by tryptic digest to generate peptide fragments for LC/LC-MS/MS analysis. The tryptic fragments that contain a lysine residue that had been ubiquitinated can be identified by a mass shift corresponding to 2 glycine residues left attached to an internal lysine. Historically, only a relatively small number of GG-peptides have been found due to low abundance of GG-peptide as compared to unmodified peptides.
We seek to reduce the complexity of the resulting peptides by using the chemical NTCB (2-nitro-5-thiocyanatobenzoic acid) which cleaves at cysteine residues (Figure 1B). Since ubiquitin does not have any cysteines, the proteome can be cleaved prior to affinity purification ubiquitinated proteins (Figure 1C) and typsinization (Figure 1D). This step will remove many non-ubiquitinated fragments therefore reducing the complexity of the sample. We hope to further decrease the complexity of the peptides by separating forked peptides from linear peptides by strong-cation exchange chromatography (Figure 1E) prior to injection into the mass spectrometer.
Using the technique outlined above we found a total of 965 unique sites of ubiquitin attachment on 410 yeast proteins. We are able to gain a 1.5X enrichment of GG-peptides using the NTCB cleavage strategy. Most of the improvement in sequencing GG-peptides is achieved by using the high-mass accuracy Orbitrap-LTQ (Table 1).
•Lea Starita & Russell Lo
Starita, L.M., Lo, R.S., Eng, J.K., Von Haller, P.D. and Fields, S. Sites of ubiquitin attachment in Saccharomyces cerevisiae. 2012 Proteomics, Jan;12(2):236-240. Epub 2011 Nov 22 download pdf
Genome-Wide Analysis of Nascent Transcription in Saccharomyces cerevisiae
Most studies of eukaryotic gene regulation have examined mature, steady-state mRNA levels. However, steady-state mRNA levels result from the action of two opposing processes: RNA synthesis and RNA degradation. An accurate assessment of RNA synthesis is important for understanding the mechanisms that regulate gene expression.
The nuclear run-on (NRO) assay is the traditional method to directly measure RNA synthesis. We have combined the in vivo RNA labeling of this assay with high throughput DNA sequencing to examine RNA polymerase activity genome-wide in exponentially growing yeast. In parallel, we sequenced total RNA to monitor transcript abundance and compare nascent transcript and steady-state transcript levels (Figure 1A).
To analyze RNA polymerase activity within genes, we examined read density along transcribed regions. We find that in contrast to total RNA libraries, NRO libraries show a high density of reads near the 5’ ends of the transcript models, with a peak ~50 bp downstream of the transcription start site (TSS) (Figure 1B), as has been observed in human and Drosophila cells. This peak in read depth near TSSs likely indicates a promoter-proximal accumulation of paused RNA polymerase, suggesting that pausing plays a significant role in the regulation of yeast transcription. Analysis of expression levels allows us to classify genes into four classes by their activity and pausing (Figure 1C). Ranking genes by the significance of pausing reveals that histone genes are among the 5% most paused genes, suggesting that transition to productive elongation is necessary for rapid induction of histone synthesis in S phase. By calculating the ratio of NRO transcription to total RNA for each gene, we can estimate nascent transcript stabilities. This analysis has revealed that the most stable and unstable transcripts encode proteins whose functional roles are consistent with these stabilities.
Parallel analysis of nascent transcripts and steady-state transcripts with high throughput sequencing allows a genome-wide assessment of RNA polymerase activity in yeast, identifying regulatory steps of RNA synthesis and inference of RNA stabilities. We anticipate that this approach will be useful to measure changes that occur in transcription in response to environmental or genetic perturbations.
•Anastasia McKinlay & Carlos Araya (former lab members)
McKinlay, A., Araya, C.L. and Fields, S. Genome-wide analysis of nascent transcription in Saccharomyces cerevisiae. 2011 G3: Genes, Genomes, Genetics Dec;1(7):549-58. Epub 2011 Dec 1. download pdf
An Integrated Metabolomic Approach to Understanding Drug Function
Metabolites are a unique and highly diverse class of elements and compounds that constitute the “business end” of biochemistry. For example, the budding yeast S. cerevisiae is estimated to contain thousands of unique metabolites at a wide range of concentrations. In addition to being both substrates for and products of protein action, metabolites have profound regulatory effects ranging from simple enzymatic product inhibition to allostery to initiation of complex signaling cascades that regulate gene expression programs. Furthermore, exogenous metabolites, acquired for nutritive purposes or used as chemical defenses, greatly expand the diversity of metabolites a cell might encounter. Thus, we hypothesized that examination of the effect of excess metabolites on a drug phenotype could provide rich, systems-level information about both cellular and drug function.
To test this principle, we screened a small pilot library of about 50 metabolites in a yeast-based model against lovastatin. Statin drugs inhibit HMG-CoA reductase, which is the rate-limiting enzyme in the synthesis of cholesterol. Consequently, they are among the most widely prescribed drugs in the world, used to treat high cholesterol and atherosclerosis. We chose to investigate statins because, despite being one of the first drugs designed with a specific molecular target in mind, statins have poorly understood pleiotropic effects. For example, in addition to lowering cholesterol by inhibiting HMG-CoA reductase, statins can reduce the risk of death from stroke. Statins can also have significant side effects including musculoskeletal deterioration and rhabdomyolysis, but how these deleterious effects occur is not known.
Statins are effective in inhibiting the yeast orthologs of HMG-CoA reductase and lowering levels of the yeast cholesterol analog, ergosterol. Statin treatment produces dose-dependent growth inhibition in yeast, presumably owing to the requirement for ergosterol for generation of new membrane. We screened our pilot metabolite library against a S. cerevisiae model of statin action. Our metabolite-statin screen revealed that the divalent metal ions zinc, copper and manganese were all effective in alleviating statin mediated growth inhibition. We characterized metal mediated statin rescue using an integrated approach that included biochemical, metabolomic and genomic approaches.
Metabolite Profiling in Yeast
Metabolism encompasses all the processes by which a cell generates energy and other essential molecules from nutrients. These pathways rely on hundreds of genes and involve thousands of small molecule intermediates, vitamins and cofactors. Interest in these molecules has led to development of technologies that allow high-throughput profiling of metabolic intermediates. We have optimized capillary electrophoresis methods for profiling amines, thiols and organic acids in the yeast Saccharomyces cerevisiae. Using these protocols, we have screened the yeast deletion collection and shown that clustering based on metabolite profiles allows us to identify related genes and pathways.
Figure 1: Amino acid profiling of a wild-type yeast extract using fluorescent derivatization of amine groups in combination with capillary electrophoresis separation.
Figure 2: Panel A shows amino acid profiling of the yeast deletion collection clustered by common metabolite profile. Panel B shows that arginine mutants show low levels of arginine and accumulation of arginine precursors such as ornithine and lysine. This cluster is also enriched for mitochondrial genes. Since arginine biosynthesis occurs in the mitochondria we propose that genes affecting mitochondrial function also affect arginine biosynthesis.
We have also begun metabolite profiling using gas chromatography and mass spectrometry (GCxGC-TOF). Preliminary experiments demonstrate that we can identify hundreds of unique compounds, including amino acids, sugars, organic acids and sterols. We are currently applying this method to better understand sterol biosynthesis in yeast. These complementary approaches provide a systematic view of metabolism in yeast. Other applications of this technology that we are working on include metabolite profiling in human urine samples from individuals with kidney disease and characterizing natural variation in yeast by assaying metabolic profiles along with transcription and protein levels in wild yeast strains.
Figure 3. Two-dimensional gas chromatography with mass spectrometry is used for identification and quantification of a couple of hundred intracellular small molecule metabolites.
•Sara Cooper and Sven Nelson (former lab members)
Cooper SJ, Finney GL, Brown SL, Nelson SK, Hesselberth J, Maccoss MJ, Fields S. High-throughput profiling of amino acids in strains of the Saccharomyces cerevisiae deletion collection. Genome Res. 2010 Sep;20(9):1288-96. download pdf
Functional Chromosomal Interactions
The topologies and spatial relationships of eukaryotic chromosomes are poorly understood. Together with the labs of Tony Blau, Bill Noble and Jay Shendure at the University of Washington, we developed a high-throughput method to globally capture intra- and inter-chromosomal interactions, and applied it to generate a map at kilobase resolution of the haploid genome of the budding yeast Saccharomyces cerevisiae. The map recapitulates known features of genome organization, thereby validating the method, and identifies new features. Extensive regional and higher order folding of individual chromosomes is observed. Chromosome XII exhibits a striking conformation that implicates the nucleolus as a formidable barrier to interaction between DNA sequences at either end. Inter-chromosomal contacts are anchored by centromeres and include interactions among tRNA genes, among origins of early DNA replication and among sites where chromosomal breakpoints occur. Finally, we constructed a three-dimensional model of the yeast genome. Our findings provide a glimpse of the interface between the form and function of a eukaryotic genome.
Figure 1. Inter-chromosomal interactions. A, Circos diagram showing interactions between chromosome I and the remaining chromosomes. All 16 yeast chromosomes are aligned circumferentially, and arcs depict distinct inter-chromosomal interactions. Bold red hatch marks correspond to centromeres. B, Circos diagram, generated using the intra-chromosomal interactions depicting the distinct interactions between a small and a large chromosome (I and XIV, respectively). Most of the interactions between these two chromosomes primarily involve the entirety of chromosome I, and a distinct region of corresponding size on chromosome XIV.
Figure 2. Three-dimensional model of the yeast genome. Chromosomes are colored individually. Centromeres and telomeres are marked by lighter and darker red dots, respectively. All chromosomes cluster via centromeres at one pole of the nucleus (the area within the dashed oval), while chromosome XII extends outward toward the nucleolus, which is occupied by rDNA repeats (indicated by the white arrow). After exiting the nucleolus, the remainder of chromosome XII interacts with the long arm of chromosome IV.
•Kevin Schultz (former lab member)
Duan Z, Andronescu M, Schutz K, McIlwain S, Kim YJ, Lee C, Shendure J, Fields S, Blau CA, Noble WS. A three-dimensional model of the yeast genome. Nature. 2010 May 20;465(7296):363-7. download pdf
Capture and Sequence Analysis of RNAs Containing 3' Cyclic Phosphate Termini
Standard techniques used to isolate and identify RNA from cellular extracts have traditionally relied upon hybridization to oligo-dT or T4 RNA ligase-based methodologies. These methods have been successful in isolating populations of RNAs that are modified with poly-adenosine tracts or have hydroxyl moieties (-OH) at their 3’ terminus. It is possible that these two classes represent the majority of the cellular ‘RNA universe.’ However, with the development of advanced sequencing technologies, it is also clear that the RNA universe is more complex than previously appreciated. Therefore, there is a need to develop new technologies to further profile this complexity.
With this in mind we developed a technology that is capable of specifically isolating 2’,3’ cyclic phosphate-terminated RNAs from complex RNA mixtures. RNAs with these termini are generated as the product of particular RNA endonucleases or during ribonucleolytic cleavage. This technology uses the Arabidopsis thaliana tRNA ligase to add an adaptor oligonucleotide to RNAs that terminate in 2’,3’ cyclic phosphates. The adaptor allows specific priming by reverse transcriptase, which is followed by additional steps for PCR amplification and high throughput DNA sequencing. This method may identify processing events previously undetected by other RNA cloning techniques.
•Kevin Schutz (former lab member)
Schutz K, Hesselberth JR, Fields S. Capture and sequence analysis of RNAs with terminal 2',3'-cyclic phosphates. RNA. 2010 Mar;16(3):621-31. download pdf, supplemental figures, supplemental figure legends, supplemental raw seq data
Genomewide Identification of Transcription Factor Binding Sites by DNAseI Footprinting
The complement of DNA-binding proteins and their occupancy of sites throughout the genome determine an organism’s programs of gene expression, DNA replication and other chromosome-based processes. A detailed picture of factor binding on a genome-wide basis exists for Saccharomyces cerevisiae, obtained by a combination of transcriptional profiles, chromatin immunoprecipitation of more than 200 transcription factors, computational analyses and other assays. In an alternative approach, we have used digestion of chromatin by DNase I followed by high throughput DNA sequencing to identify sites of increased nuclease accessibility throughout the yeast genome. The resulting set of more than 10 million sequence reads provides both a global view of chromatin architecture as well as a gene-by-gene view of regulatory sequences protected from digestion by the presence of bound proteins. Unlike the case with results from chromatin immunoprecipitation, these gene-by-gene DNase I footprints can be used to directly identify transcription factor binding sites, and thereby infer their motifs. We found previously unknown binding sites in the genome for well-characterized factors, and observed other annotated binding sites that appear not to be protected from nuclease digestion under our conditions. This approach has the potential to characterize the transcriptional regulatory network of a poorly characterized organism given only its genome sequence.
Hesselberth JR, Chen X, Zhang Z, Sabo PJ, Sandstrom R, Reynolds AP, Thurman RE, Neph S, Kuehn MS, Noble WS, Fields S, Stamatoyannopoulos JA. Global mapping of protein-DNA interactions in vivo by digital genomic footprinting. Nat Methods. 2009 Apr;6(4):283-9. download pdf
•Jay Hesselberth & Zhihong Zhang (former lab members)
DrnI is a Novel Debranching Enzyme-Associated Nuclease with a Role in Intron Turnover
The turnover of introns spliced from pre-mRNA occurs first by debranching the lariat intron followed by destruction of the linear intron by other nucleases. We have identified a novel component of intron turnover, DRN1 (YGR093W). Using yeast two-hybrid screens, we found that Drn1 interacts with the debranching enzyme, Dbr1, and another spliceosomal component, Syf1. Sequence alignments revealed that Drn1 is homologous to the metallophosterase domain of Dbr1, and Drn1 has RNA endonuclease activity in vitro. Deletion of DRN1 results in the accumulation of lariat introns spliced from some ribosomal protein genes. We have identified genetic interactions between DRN1 and mutant alleles of PRP43, suggesting that Drn1 plays a role in the turnover of the spliceosomal complexes containing lariat introns. Intriguingly, the subset of Drn1-effected introns use RNA structural elements to stabilize conformations productive for splicing. We propose a model in which the nuclease activity of Drn1 is required for the efficient turnover of these large, structured introns, whose hyperstability may hinder the dissociation of lariat intron complexes by the spliceosomal DEAD/H-box ATPases.
•Jay Hesselberth & (former lab member)
Yeast Protein Interaction:
We constructed an array of ~6000 yeast transformants, each designed to express one of the S. cerevisiae open reading frames as a fusion to the Gal4 activation domain (AD). Using robotics, we can carry out a genomewide two-hybrid screen. A yeast strain expressing Gal4 DNA-binding domain (DBD) fused to any protein of interest is mated to the transformants in the array. Diploid cells are selected, pinned onto media selective for the two-hybrid interaction (-histidine) and scored for growth . Only cells that express interacting proteins should grow in the selective plates. The identity of interacting proteins is revealed by the positions of His+ colonies in the array. The initial description of the array, along with a collaborative effort by CuraGen, Inc. to carry out high throughput two-hybrid screens of yeast proteins, is described in Uetz et al. (2000).
*The data set can be downloaded here*
To date we have screened over 1000 yeast and several non-yeast proteins against the array. We are continuing this work on a collaborative basis through the Yeast Resource Center (YRC). Researchers with a specific gene of interest can contact us via the YRC website to request a collaboration. The gene of interest needs to be cloned into an appropriate two-hybrid DBD vector that we can provide. The identification of putative protein-protein interactions should further guide investigation of function.
In addition, we are working towards the development of a ‘dual-reporter’ Y2H system that should allow us to eliminate technical false positives that do not arise from a true interaction between the bait and prey proteins. Unlike the current 2-hybrid reporter system, this second reporter is not based on transcriptional activation, but rather on the reconstitution of a protein whose function can be selected or screened for. N-terminal and C-terminal fragments of proteins such as GFP and luciferase have been previously shown to function in split-reporter systems. We are in the process of fusing these domains to a well-established yeast two-hybrid protein pair and are currently testing different combinations in order to identify the optimal conditions that will allow us to distinguish between true and false positives.
Other publications describing yeast protein interactions that include work from the laboratory are listed here.
How the two-hybrid system works
How to clone 6000 ORFs
Primers • Vectors: pOBD2 and pOAD • Yeast strains • Recombination Cloning Protocols
•Sara Cooper and Sven Nelson (former lab members)
Dual-Reporter 2-Hybrid Assay:
Although the two-hybrid assay has been useful in discovering protein-protein interactions, a major disadvantage of the approach is the large number of false-positives that arise. Technical false positives activate the transcription of the HIS3 reporter gene but do not represent physical interaction of the two proteins (“bait” and “prey”). To address this issue, we have been working on incorporating a second reporter into the existing system. The second reporter relies on the re-association of the firefly luciferase protein, which has been split in two halves and fused to the bait and prey constructs. With a true bait and prey protein-protein interaction, the two halves of luciferase are brought into proximity and should reconstitute enzyme activity. Thus, true physical interactions should be both transcription positive and luminescence positive, whereas false positives would yield only the transcriptional signal.
Initial tests revealed that the two halves of luciferase were capable of self-associating even in the absence of interacting bait and prey proteins. Splitting of the luciferase protein also led to a large decrease in luminescence upon re-association of the two halves, compared to the intact protein. To address these issues, I performed random mutagenesis on the luciferase gene in order to identify mutations that could either boost luminescence or decrease self-association. Mutations from these two classes were then coupled and the resulting constructs are being tested on several known bait/prey protein pairs as well as on yeast prey proteins that are known false positives. Validation of this new system will require a two-hybrid screen of a library to be performed, yielding results that are enriched in true positives.
Random Addressed Protein Arrays
In the last decade, sequencing technologies have seen prodigious improvements in throughput and costs yet, protein activity and enzymatic assays and hence our ability to query gene function have not enjoyed concomitant advances. We are interested in applying the high-throughput sequencing technologies to generate an array of clusters containing DNA templates for in vitro transcription/translation (IVTT) reactions. Our goal is to identify each template in the array by sequencing and then generate a high-density protein array by capturing in vitro translated proteins proximal to the template cluster.
Towards this goal we have established methods for clonal amplification of DNA on solid surfaces (solid-phase amplification and emulsion amplification), in vitro transcription and translation methods using home-made cellular extracts, and are working on strategies for capturing proteins on surfaces.
Notably, an array platform containing millions of features will allow an increase in throughput three orders of magnitude above that of current protein arrays. Such increased throughput will enable experiments aimed at mapping protein interactions, generating enzyme activity profiles with activity-based probes and identifying novel enzyme activities to be performed at new scales. The ability to perform massively parallel protein assays should open novel research opportunities in proteomics, metagenomics and directed protein evolution. In addition, such development would also show how high-throughput sequencing platforms can be integrated into new technologies for functional assays, with end-points distinct from sequencing reads.
•Carlos Araya and Doug Fowler (former lab members)
Malaria Protein Arrays
Malaria is one of the world’s most devastating human diseases, affecting an estimated 500 million people and resulting in 2.5 million deaths globally each year. It is caused by four species of the protozoan parasite, Plasmodium, with the most deadly being Plasmodium falciparum. Progress in understanding malaria has been limited by the parasite’s complex life cycle and by difficulties expressing and purifying functional P. falciparum proteins in heterologous systems. With the completion of the P. falciparum genome sequence, a useful tool would be a protein array in which individual parasite proteins are displayed and available for functional assays.
To prepare a P. falciparum protein array, we are using 1000 P. falciparum open reading frames generated by the Structural Genomics of Pathogenic Protozoa (SGPP) consortium. These ORFs will be translated in vitro and printed onto glass slides. Our goal is to use these protein arrays in assays that benefit from rapid, simultaneous and sensitive screening of large numbers of proteins. For example, antigen profiling assays can identify P. falciparum proteins that govern host immune responses against the pathogen. Another application for the array is the biochemical characterization of the “hypothetical” proteins that represent more than 60% of P. falciparum genome, many of which are unique to P. falciparum. To characterize these proteins we will profile arrayed proteins with fluorescent substrate profiling probes.
•Anastasia Gridasova (former lab member)
Identifying Protein Targets of Ubiquitin Ligases
Ubiquitin (Ub) is a 76 amino acid protein that when attached to a target protein can alter its fate in multiple ways. Ub is an essential signaling molecule in nearly every pathway in eukaryotic cells. The E1 ubiquitin activating enzyme, E2 ubiquitin conjugating enzymes and E3 ubiquitin ligases act in concert in order to covalently attach a Ub moiety to the epsilon amine on a lysine side chain or less commonly, the free amino group at the N-terminus of a substrate protein via an isopeptide bond. The E3 gives the enzyme cascade substrate specificity. There are 35 E3s in S. cerevisiae and nearly 1000 putative E3s in the human genome. Many proteins important for human disease are E3s, such as the breast cancer-specific tumor suppressor BRCA1 (breast cancer-1) and the early onset-Parkinson’s disease protein Parkin. Therefore, knowing the specific protein substrates of individual E3 enzymes would be of great importance in medicine.
One approach I am taking to determine specific substrates of E3s is as follows:
Individual E3 ubiquitin ligases will be fused to the bacterial biotin ligase, BirA, which attaches biotin to proteins that contain a biotin-acceptor peptide. A second fusion protein will be generated where ubiquitin is fused to a biotin-acceptor peptide fragment. In this arrangement, when the biotin-acceptor/ubiquitin fusion protein is brought to the E3 ubiquitin ligase/BirA fusion protein, the attached BirA will biotinylate the ubiquitin, thus marking it as having been acted on by that specific ubiquitin ligase. Substrate proteins to which the biotinylated ubiquitin is subsequently attached can then be purified using streptavidin chromatography and identified by mass spectrometry. Applying this approach to each individual E3 ubiquitin ligase will provide a critically needed proteome-wide view of ubiquitin ligases and their substrates. Ultimately, I would adapt the system for the study of ubiquitin ligases in mammalian tissue culture cells.
•Lea Starita & Russell Lo
Genomewide Identification of Spliced Introns Using a Tiling Microarray
One hallmark of eukaryotic gene structure is the presence of introns, which are spliced out of pre-mRNAs prior to translation. Introns excised from pre-mRNA molecules by the spliceosomal machinery are released in the form of lariats, in which the 5’ end of the intron RNA is linked via a phosphodiester bond to the 2’ hydroxyl of an internal adenosine residue. The lariat must be debranched by 2’-5’ phosphodiesterase prior to their turnover. In the absence or knockdown of the debranching enzyme, these lariat RNAs accumulate. We have carried out a genomewide identification of spliced intron using a genomic tiling microarray in Saccharomyces cerevisiaes by comparison of total RNA between DBR+ and dbr1 strains. This approach identified 141 of 272 known introns, confirmed three previously predicted introns, predicted four novel introns (of which two were experimentally confirmed), and led to the reannotation of four others.
DBR homologs and DBR-mediated lariat degradation were also found in other organisms. Currently, we are working on adapting the tiling array approach for genome-wide identification of introns in Drosophila and human cells. It has been reported that knockdown of the debranching enzyme in Drosophila via RNAi can cause lariat stabilization. We also applied this approach to human cell cultures and observed a similar but modest effect. We are now testing different RNAi knockdown approaches in both organisms to improve the efficiency of lariat accumulation. Analysis of lariat accumulation in these complex organisms will not only contribute to their genome annotation, but also extend our understanding of regulated and alternative splicing in these species.
•Zhihong Zhang & Jay Hesselberth (former lab members)
Zhang Z, Hesselberth JR, Fields S. Genome-wide identification of spliced introns using a tiling microarray.
Genome Res. 2007 Mar 9. download pdf, supplemental data
We are interested in turning the process of recombination to our ends, using it to facilitate gene therapy and genome engineering. To this end, we have developed a system that allows for the selection of yeast in which two overlapping parts of the selectable marker for Kanamycin (KanR) are brought together. Our system features a genomic recipient locus and a plasmid donor construct (Fig 1, new window). Recombination between these elements results in the reconstitution of a functional KanR marker when the elements are united through the mating of two yeast strains, each of which carries one of the two elements. We have made and tested a variety of donor structures (Fig 2,new window), together with different effector proteins that localize to the two DNA elements of the system.
The greatest total efficiency of conversion to the KanR phenotype was obtained with the donor construct having the most homology to the recipient locus. We are therefore using this construct to screen a random yeast genomic GAL4 library to isolate peptides that promote homologous repair. To do this, UAS GAL sequences have been cloned into the middle of the recipient locus and on the flanks of the longest donor construct we currently possess; yeast expressing peptides that promote HR when bound to the donor or recipient DNA will more frequently convert to the KanR phenotype. As genome engineering will most likely feature exogenously created linear donor DNA and may also utilize lesion-targeting endonucleases, we are employing the homing endonuclease I-SceI to introduce dsDNA breaks in both the donor and recipient DNA. While this will produce a background, we will screen serially derived libraries for clones enriched by repeated selection for the ability to promote HR.
•Clem Stanyon (former lab member)
The budding yeast Saccharomyces cerevisiae serves as a useful organism for studying factors that determine cellular longevity. The aging of mitotically active cells in higher eukaryotes can be modeled by the replicative life span of yeast mother cells, whereas aging of post-mitotic cells more closely resembles the chronological survival of quiescent yeast during stationary phase (Figure 1, new window). We are interested in using high-throughput technologies to identify and characterize genes that modify both aspects of cellular life span.
Measurement of yeast replicative life span requires micromanipulation of daughter cells away from mother cells following each mitotic cycle. The time-consuming nature of this assay has precluded large-scale analyses of replicative aging. In collaboration with Dr. Brian Kennedy (Department of Biochemistry, University of Washington), we have developed a method to allow semi-quantitative measurement of replicative life span based on the aging properties of a small number of cells. To date, we have determined the replicative life span phenotypes for approximately 20% (~1000 strains) of the ORF deletion collection. Completion of this analysis, in collaboration with the Kennedy lab, should take approximately 2 years.
Based on our analysis to date, we have already made several important discoveries, including the surprising finding that life span extension by calorie restriction (CR) does not require the NAD-dependent histone deacetylase, Sir2. The Sir2-indepenent nature of CR is demonstrated two ways: first, calorie restriction and overexpression of Sir2 increase life span additively, and second, CR increases life span to a greater extent in cells lacking Sir2 (and Fob1) than in wild type cells (Figure 2, new window).
Kaeberlein, M., Kirkland, K.T., Fields, S. and Kennedy, B.K. (2004) Sir2-independent life span extension by calorie restriction in yeast. PLoS Biology Sep;2(9):E296. download pdf
We have also determined that, contrary to a prior model proposed by Lin, Guarente, and colleagues, CR does not increase yeast life span by enhancing respiration. Yeast cells completely lacking mitochondrial DNA either have a normal life span or a dramatically shortened life span. In both cases, however, CR dramatically enhances longevity, demonstrating that respiration is not required for life span extension by CR.
Kaeberlein M, Hu D, Kerr EO, Tsuchiya M, Westman EA, Dang N, Fields S, Kennedy BK. Increased Life Span due to Calorie Restriction in Respiratory-Deficient Yeast. PLoS Genet. 2005 Nov 25;1(5):e69. download pdf
Of the first 564 single-gene deletion strains examined, 14 show a significant increase in replicative life span relative to the parental strain. This set includes two overlapping ORFs along with several genes that code for proteins with functions related to the nutrient responsive kinases Tor and Sch9. Of particular interest is the finding that three genes involved in ribosome biogenesis (a Tor and Sch9-regulated process) were among our set of long-lived deletion strains: REI1, RPL31A, and RPL6B. Rpl31a and Rpl6b are protein components of the large ribosomal subunit and Rei1 is a protein of unknown function that we have determined plays a role in large subunit biogenesis. This has led us to propose a model whereby CR increases life span by decreasing Tor and Sch9 activity which results in decreased ribosome biogenesis and translation (Figure 3, new window).
Kaeberlein M, Powers RW 3rd, Steffen KK, Westman EA, Hu D, Dang N, Kerr EO, Kirkland KT, Fields S, Kennedy BK. Regulation of yeast replicative life span by TOR and Sch9 in response to nutrients. Science. 2005 Nov 18;310(5751):1193-6. download pdf
In order to examine post-mitotic survival in a high-throughput manner, we have developed a method that allows for the simultaneous determination of chronological life span for several thousand yeast strains in a highly quantitative manner (Figure 4, new window). We have used this technology to screen the ORF deletion collection for genes whose deletion affects chronological aging. From this analysis, we have identified several genes (Table 1, new window) implicated in the TOR pathway that extend chronological life span when deleted (Figure 5, new window). The TOR proteins are highly conserved from yeast to humans and promote cellular growth in response to nutrients, especially amino acids. We have found that limitation of amino acids in the media, or pharmacological inhibition of TOR using rapamycin or methionine sulfoximine (MSX) (Figure 6, new window) can extend chronological life span, similar to deletion of TOR pathway components. Additionally, many of these interventions correlate with an increased nuclear accumulation of the stress-responsive transcription factor Msn2, and a resistance to heat and oxidative stresses. We propose a model by which decreased TOR activity up-regulates the activity of stress-response transcription factors (including Msn2) and thus promotes longevity (Figure 7, new window).
•Trey Powers and Matt Kaeberlein (former lab members)
Powers RW 3rd, Kaeberlein M, Caldwell SD, Kennedy BK, Fields S. Extension of chronological life span in yeast by decreased TOR pathway signaling. Genes Dev. 2006 Jan 15;20(2):174-84. download pdf
Plasmodium Protein-Protein Interaction Project
Plasmodium falciparum is a mosquito-borne protozoan parasite responsible for the most severe form of malaria. Over 500 million people worldwide are afflicted with malaria, and each year more than one million people- most of them children - die from these infections. Despite the importance of malaria in global health, much remains to be discovered about the molecular biology of these pathogens. Of the ~5,300 proteins predicted from the P. falciparum genome sequence 60% are classified as hypothetical; this designation means that they have never been studied in Plasmodium and do not have sufficient similarity to characterized proteins in other organisms to allow functional assignments to be made. To begin to understand the functions of these novel proteins, we have identified a large number of protein-protein interactions using high-throughput yeast two-hybrid searches with protein fragments derived from genes expressed in the intraerythrocytic stages of P. falciparum. In collaboration with Prolexys Pharmaceuticals, we performed over 32,000 searches and we identified 2,846 interactions involving 1,308 proteins, which corresponds to approximately a quarter of the proteins predicted from the P. falciparum genome. We identified clusters of interacting proteins likely involved in important processes for the survival and infectivity of the parasite, such as gene regulation and host cell invasion. A large fraction of our interactions involve uncharacterized proteins and thus could lead to a new understanding of the functions of those proteins.
In addition to this, we have performed 10,000 searches with P. falciparum baits against human activation domain libraries, and 11,000 searches with P. vivax baits against P. vivax activation domain libraries. We are currently analyzing these datasets.
This work was funded in part by the Structural Genomics of Pathogenic Protozoans effort led by Wim Hol in the Department of Biochemistry.
•Doug LaCount and Marissa Vignali (former lab members)
Lacount DJ, Vignali M, Chettier R, Phansalkar A, Bell R, Hesselberth JR, Schoenfeld LW, Ota I, Sahasrabudhe S, Kurschner C, Fields S, Hughes RE. A protein interaction network of the malaria parasite Plasmodium falciparum. Nature. 2005 Nov 3;438(7064):103-7.
A Network of WW Domain Interactions Constructed Using Protein Microarrays
We used protein microarray technology to generate a protein interaction map for twelve of the thirteen WW domains present in proteins of the yeast Saccharomyces cerevisiae (Example figure, new window). We observed a total of 1,158 interactions with these 12 domains, most of which have not previously been described. We analyzed the representation of functional annotations within the network, identifying enrichments for proteins with vacuolar and peroxisomal localization, as well as proteins involved in cofactor biosynthesis and protein turnover. The conservation of primary sequence motifs known to be recognized by WW domains was analyzed in the context of the network, and a comparative genomics approach used to dissect the occurrence of such motifs within the dataset. We analyzed the PY (Pro-Pro-Xaa-Tyr) motif in detail, and propose a novel consensus for the motif based on its conservation among orthologs of the interacting protein. The comparative approach revealed that one of the WW domain-containing proteins has an evolutionarily conserved PY motif, possibly indicating a role for WW domain multimerization in the propagation of signals derived from WW domain binding events.
•Jay Hesselberth and John Miller (former lab members)
Hesselberth JR, Miller JP, Golob A, Stajich JE, Michaud GA, Fields S. Comparative analysis of Saccharomyces cerevisiae WW domains and their interacting proteins. Genome Biol. 2006 Apr 10;7(4):R30. download pdf
Yeast Membrane Protein Array
We have applied the split-ubiquitin system originally described by Johnsson and Varshavsky (1994) to investigate interactions between integral membrane proteins. In brief, one protein is fused to the N-terminal half of ubiquitin (N-Ub) and a second protein is fused to the C-terminal half of ubiquitin (C-Ub). If the membrane proteins exist in close proximity, they bring the two halves of ubiquitin back together, and endogenous ubiquitin C-terminal hydrolases recognize this “reconstituted” ubiquitin and cleave the peptide bond following the last amino acid residue of ubiquitin. In the modified system of Stagljar et al (1998) there is a transcription factor fused to this residue and, upon cleavage, the transcription factor is released from the membrane to enter the nucleus to activate reporter genes.
We have generated a collection of 705 yeast proteins that are annotated as being in an “integral membrane” environment (643 proteins) along with proteins having amino acid homology to these (62). These proteins were made both as fusions with N-Ub, and as fusions to C-Ub with the transcription factor (C-UbPLV).
We tested our transformants for the successful insertion of an in-frame ORF into our C-UbPLV by screening them for an interaction with a generic “positive control”, a N-UbI fusion protein. Of the 705 proteins generated as C-UbPLV fusions, 365 showed an interaction with this wild-type version, N-UbI, which does not require a protein-protein interaction to bind to C-Ub. This result suggests that the 365 fusions bear an insert; the insert is in frame with the C-UbPLV moiety; there are no nonsense mutations in the insert; and the fusion protein is oriented such that the COOH-terminus bearing the fusion with C-UbPLV is exposed to the cytoplasm.
These 365 integral membrane proteins were screened for interactions against the full set of 705 proteins fused to N-UbG, a mutant form of the N-Ub with an isoleucine to glycine mutation at position 13 (Johnsson and Varshavsky (1994)). The pair-wise interactions between these 365 proteins and the 705 N-Ub fusion proteins were assayed similar to the two-hybrid study of Uetz et al. (2000). A set of 1985 putative interactions between 463 NubG fusions and 270 of the C-UbPLV fusions was found. This number of interactions per protein (~8 on average) is likely high due to the false-positive rate associated with this assay (the magnitude of which is not known). These false-positives are likely to result from the high effective concentration of integral membrane proteins due to both sequestration to a two-dimensional lipid bilayer, and co-transport of membrane proteins along the stages of the secretory pathway; additionally, over-expression of the fusion proteins from an episomal plasmid with an ectopic promoter will likely promote some non-physiological interactions.
An advantage of the membrane-based yeast two-hybrid system is that interactions between proteins can be detected at the physiological site of the interaction. This allows the observation of interactions occurring between proteins of most if not all subcellular membranous compartments (Figure1, new window). However, we also observe interactions between proteins whose native localization is to distinct compartments, and the system by itself is not informative in regards to the location within the cell of the interaction. In some instances (e.g., potentially the Mst27 and Tna1 interaction in the figure) the interaction may occur in early compartments of the secretory pathway, despite the mature proteins involved having disparate ultimate destinations. In other cases the observed interaction is likely to be non-physiological, and results from mislocalization of one or both of the proteins to an inappropriate compartment due to the fusion moiety.
In order to characterize our dataset with the goal of isolating those interactions that are more probably true-positives, we collaborated with Asa Ben-Hur in William Stafford Noble's group. The approach was to use a learning algorithm, the support vector machine (SVM), to classify the interactions based on the statistics of the assay as well as other datasets from the literature (e.g., synthetic-lethality, localization studies, Gene Ontology annotations, etc.). The SVM was trained using interactions found in this study that are also identified by independent experiments as examples of "true-positives". In addition to these 34 interactions, we included 22 that are supported by one computational approach (Deane et al., 2002), and 7 by a different computational analysis (Jansen et al., 2003). These 63 interactions constitute our highest confidence interactions, and are used by the SVM to identify features of the remaining interactions that support their classification as true interactions.
An interesting outcome from multiple SVM analyses is that 138 interactions are always classified as true positives by the algorithm, and 939 are never classified as true (Figure2, new window). We will therefore examine the interactions consistently predicted to be true as well as the intermediate interactions to identify potential physiological interactions. A comparison of the features of the 138 interactions that are always classified as true with those of the 138 "worst" interactions that are never classified as true shows that the SVM selects, for most features, the values that would be expected to indicate more physiological interactions as shown in the heat map (Figure3, new window).
•John Miller (former lab member) and Russell Lo
Miller J.P., Lo R.S., Ben-Hur A, Desmarais C, Stagljar I, Stafford Noble W, Fields S. Large-scale identification of yeast integral membrane protein interactions. Proc Natl Acad Sci U S A. 2005 Aug 23;102(34):12123-12128. supplemental data, download pdf
Specific Interests of the Yeast 2-Hybrid Array Screening
We were interested in using functional genomics tools to decipher two key biological processes that are dysregulated in cancer chromosome segregation and chromatin modifications. The focusing of genome-wide tools on specific biological processes has several benefits including: 1) bringing an unbiased approach to investigate the process; 2) discovering new players involved in a process; and 3) providing information to model the process. A further aspect of using focused functional genomics is we can come much closer to extracting information that is saturating for a particular method.
The core tool we used is the genome-wide two-hybrid array technology that has been successfully used in various projects in the lab and in collaboration with other labs. The projects that we have outlined below were of high interest.
Comprehensive two-hybrid interaction map of spindle-associated proteins:
The protein interaction information for kinetochore and associated spindle proteins is extensive but far from complete. Using the two-hybrid system, we started the comprehensive analysis of protein-protein interactions involving kinetochore and spindle proteins in collaboration with the Drubin and Barnes labs (UC Berkeley). These interactions are uncovering novel connections within the kinetochore and other cellular pathways.
Chromatin modification dependent protein interaction map:
Chromatin is subject to a variety of modifications and the specificity these modifications impart upon protein binding is still poorly understood. In collaboration with Min-Hao Kuo (Michigan State), we were working towards charting protein-protein interactions that are dependent on chromatin modifications. The combination of high-throughput techniques such as the tethered catalysis two-hybrid system and selective experiments will help elucidate the function of the various types of chromatin modifications.
Guo, D., Hazbun, T.R., Xu, X.-J., Ng, S.-L., Fields, S. and Kuo, M.-H. (2004) A tethered catalysis two-hybrid system to identify protein-protein interactions requiring post-translational modifications. Nature Biotechnology Jul;22(7):888-892. download pdf
Yeast Unknown ORF Project:
In a collaborative effort, the two-hybrid array technology was used in conjunction with groups affiliated with the YRC to decipher the roles of 100 essential and uncharacterized yeast genes (Hazbun et al., 2003). The integration of two-hybrid data with data from additional protein-based technologies such as co-purification and mass spectrometry, localization, and protein structure prediction enabled the functional annotation of a large fraction of these yeast genes. The parallel analysis of genes by four complementary technologies has also enhanced our understanding of the properties of the two-hybrid genome-wide array. The overlap of protein-protein interactions identified by mass spectrometry compared with two-hybrid was very low, although they both predicted similar cellular roles that agreed with localization data and protein structure prediction. The two-hybrid interactions tended to occur between proteins that were annotated in more broadly related biological processes, resulting in Go term assignments (Table 1, new window) that were lower in resolution than the mass spectrometry-based terms. This possibly reflects the tendency of two-hybrid to identify interactions with proteins in related biological processes that are not necessarily part of a core complex. For example, protein interactions identified by mass spectrometry for two unknown proteins suggested a role for inter-related complexes (Figure 1, new window) in DNA repair whereas two-hybrid interactions suggested a role in DNA repair as well as links with other biological processes such as chromosome segregation, sumoylation and ubiquitination.
•Tony Hazbun (former lab member)
Hazbun, T.R., Malmström, L., Anderson, S., Graczyk, B.J., Fox, B., Riffle, M., Sundin, B.A., Aranda, J.D., McDonald, W.H., Chun, C., Snydsman, B.E., Bradley, P., Muller, E.G.D., Fields, S., Baker, D., Yates, J.R. III and Davis, T.N. (2003) Assigning function to yeast proteins by integration of technologies. Molecular Cell 12:1353-1365. download pdf
A Yeast Screen for P. falciparum Mefloquine Resistance Genes
Mefloquine is an effective antimalarial drug. Unfortunately, resistant strains of Plasmodium falciparum are beginning to arise. The P. falciparum multi-drug resistant gene (Pfmdr1) encodes an ABC transporter that is often altered in mefloquine resistant strains and is presumed to act as a drug efflux pump. Little else is known about the parasite's mefloquine resistance mechanisms. However, mefloquine-resistant strains have been reported that contain no Pfmdr1 alterations, suggesting that additional genes are involved in mefloquine resistance. As the yeast Saccharomyces cerevisiae is sensitive to mefloquine, I have used it to screen for P. falciparum genes that can confer increased mefloquine resistance. Yeast was transformed with a P. falciparum cDNA library under the control of an S. cerevisiae galactose-inducible promoter, followed by selection on mefloquine. Several mefloquine resistance candidate genes were isolated in this screen. The four with the strongest phenotype were chosen for further analysis. These encode an uncharacterized multi-transmembrane-spanning protein, two small uncharacterized proteins, and a putative Rab GTPase activator. Each was analyzed for degree of mefloquine resistance and multidrug resistance. In addition, the mefloquine resistant P. falciparum strain W2-Mef and its sensitive parent W2 were analyzed by semi-quantitative RT-PCR to determine if any of these candidate genes is upregulated in the resistant strain. One candidate was thus regulated and it has been cloned for expression and drug testing in P. falciparum.
•Mara Jeffress (former lab member)
Jeffress M, Fields S. (2005) Identification of putative Plasmodium falciparum mefloquine resistance genes. Mol Biochem Parasitol. Feb;139(2):133-9. download pdf
Interactions of Human Toll-like Receptors
Toll-like receptor ‘sensor’ proteins are expressed in epithelia and antigen presenting cells. They are localized to the endoplasmic reticulum, the plasma membrane, and phagosome-lysosome membranes. The family of 10 human receptors, named TLR1 through TLR10, detects various microbial antigens or endogenous 'danger' signals, and subsequently triggers an ancient, highly conserved, innate immune response. TLRs are activated by ligand-induced oligomerization that recruits cytoplasmic signaling molecules to the receptors’ intracellular domains. Among the recruits are the MyD88 protein, and other adapters (TIRAP/Mal, TRIF/TICAM1, TRAM and SARM) that preferentially associate with certain activated receptors to impart some level of signaling specificity. All TLR cytoplasmic domains, as well as all identified signal adapters, harbor a canonical TIR (toll - interleukin - response) domain that mediates protein-protein interactions. All ten TLRs, as well as the tumor necrosis factor receptor and some interleukin receptors, activate the NFkB transcription factor. The proteins that impart specificity to the Toll signal transduction pathways are not fully delineated (Figure 1, new window).
Experiments in mammalian cells usually measure TLR activation by expression of an NFkB -driven reporter gene. However, any given cell may express many different receptors that use MyD88 as an adapter to activate NFkB, and of course other pathways, unrelated to the TLRs, can also activate NFkB. In contrast, the simple eukaryotic yeast S. cerevisiae does not possess endogenous TLRs or any recognizable Toll signaling pathway; therefore in yeast we can directly test protein-protein interactions without interference from endogenous proteins or other signaling pathways. We are expressing human TLR TIR domains in yeast, and using the yeast two-hybrid system to study protein interactions in the signal transduction pathway.
We screened for new proteins that bind to the receptors’ cytoplasmic domains, and have found many new interacting proteins that appear to specifically associate with certain TLR cytoplasmic domains. In particular, we have found novel and specific interactions for the closely related group of TLRs 1, 2, 6, & 10 (Table 1, new window). These are candidate proteins that may affect signaling by TLR2 heterocomplexes.
We also performed a structure-function study of the TLR-MyD88 interaction. We mapped the amino acids required for TLR association with this ‘universal’ adapter protein by swapping pieces of TLR2 into the closely related TLRs 1, 6, and 10. This creates chimaeric proteins with new (MyD88-binding) function, and is allowing us to define the exact amino acid differences between MyD88-binding and non-binding TLRs (Figure 2, new window).
The TLR-TIR domains are homologous to each other but not identical; therefore, examining protein interactions with these domains will lend insight into interaction specificity and how structure relates to function for the human TLRs.
•Victoria Brown-Kennerly and Rachel Brown (former lab members)
Brown V, Brown RA, Ozinsky A, Hesselberth JR, Fields S. Binding specificity of Toll-like receptor cytoplasmic domains. Eur J Immunol. 2006 Mar;36(3):742-53. download pdf
Genetic Screening Using Leucine Zippers
We developed a novel type of genetic screening method to identify proteins that function in a common pathway or process. The screen takes advantage of the observation that cellular processes are often initiated when a signal or upstream event causes two or more proteins to physically interact (Figure1, new window). These are usually part of a cascade of interactions that ultimately lead to the activation of the cellular process. We tested the idea that it might be possible to artificially force these interactions to occur and activate a process in the absence of its normal signal (i.e., cause a gain-of-function phenotype).
We were artificially forcing proteins to interact by fusing them to the leucine zippers from the mammalian Fos and Jun proteins (Figure2, new window). Fos and Jun leucine zippers form a stable heterodimer that can act as a tether to bring the attached proteins into close physical proximity. Using GFP and proteins that occur in specific subcellular localizations, we have shown that Fos and Jun can cause proteins to co-localize in yeast (Figure3, new window).
If artificial tethering mimics normal protein-protein interactions and recreates an activity that normally requires an upstream signal or event, it can then be used as a method to genetically screen for unknown members of a process (Figure4, new window). We can tether every yeast protein to a known component of the pathway (i.e. by coexpressing the known protein as a fusion with the Jun leucine zipper and a library of all yeast proteins fused to Fos) and look for a phenotype associated with activation of the process under study. The process should only be activated when the normal components are tethered to the known one. Since these will be fused to the Fos leucine zipper on a plasmid they can be easily identified.
A collection of all yeast proteins fused to the Fos leucine zipper might also be a useful reagent for tagging proteins. The tag can be added simply by introducing the tag (GFP, protein A, or any other polypeptide tag) fused to the Jun leucine zipper into yeast also expressing Fos fusions.
•Mike DeVit and Meg Branson (former lab members)
Devit M, Cullen PJ, Branson M, Sprague GF Jr, Fields S. (2005) Forcing interactions as a genetic screen to identify proteins that exert a defined activity. Genome Res. Apr;15(4):560-5. download pdf
Chemical Profiling of the Yeast Deletion Collection
Understanding the actions of drugs and toxins in a cell is of critical importance to medicine, yet many of the molecular events involved in chemical resistance are relatively uncharacterized. In order to identify the cellular processes and pathways targeted by chemicals, we took advantage of the haploid Saccharomyces cerevisiae deletion strains. Although ~4800 of the strains are viable, the loss of a gene in a pathway affected by a drug can lead to a synthetic lethal effect in which the combination of a deletion and a normally sublethal dose of a chemical results in loss of viability. We carried out genome-wide screens to determine quantitative sensitivities of the deletion set to four chemicals: hydrogen peroxide, menadione, ibuprofen, and mefloquine. Hydrogen peroxide and menadione induce oxidative stress in the cell, whereas ibuprofen and mefloquine are toxic to yeast by unknown mechanisms. Here we report the sensitivities of 659 deletion strains that are sensitive to one or more of these four compounds, including 163 multi-chemical sensitive strains, 394 strains specific to hydrogen peroxide and/or menadione, 47 specific to ibuprofen, and 55 specific to mefloquine. We correlate these results with data from other large-scale studies to yield novel insights into cellular function.
•Chandra Tucker (former lab member)
Tucker, C.L. and Fields, S. (2004) Quantitative genome-wide analysis of yeast deletion strain sensitivities to oxidative and chemical stress. Comparative and Functional Genomics 5:216-224. supplemental data, download pdf