Elucidating a code for RNA sequence recognition
The ability to design a protein that can bind specifically to any RNA and regulate its fate would enable numerous research and therapeutic applications. However, decoding RNA-binding specificity for most types of RNA-binding domains is challenging, as these domains associate with RNA via complex networks of interactions. In addition, many RNA-binding domains associate with RNA by weak contacts with only a small number of bases (usually 4-6), which does not allow for the specificity and affinity required of an engineered domain.
However, a few RNA-binding domains bind by recognition mechanisms that make them ideal candidates for protein design. The modular architecture of the PUF RNA-binding domain is one such example (Figure 1). The PUF domain almost always contains 8 functional copies of a 36 amino acid long alpha-helical repeat. Each PUF repeat recognizes a single RNA base via three amino acids at conserved locations.
With their natural ability to bind selectively and with high affinity to their 8-mer recognition sequence, the PUF domain is an attractive candidate for a deep mutational scan, in which we elucidate the RNA-binding preference of a large number of variants of one of these domains. We can combine the yeast three-hybrid method with next generation sequencing technology to score the binding activities of variants of a PUF domain and a cognate RNA sequence. In particular, we can score each mutated PUF repeat for its ability to bind to each of the 4 possible RNA bases. This approach will allow us to characterize the specificity and the affinity of each PUF repeat for any RNA base, and to progress towards uncovering a complete code for RNA recognition.
•Daniel Melamed & Christina Miller
Synonymous Variation and Fitness in Yeast
Variation in a gene’s synonymous codon usage can lead to subtle alterations in protein production and exert phenotypic consequences. However, contexts and mechanisms by which codon usage impacts translation are not well defined. We are pursuing two projects that employ high-throughput sequencing technology to investigate factors important to synonymous codon usage and protein production in eukaryotes.
In Project 1 we are using the yeast HIS3 gene, encoding an enzyme required for the synthesis of histidine, as a model gene for examining the impact of codon usage on protein production and function. Having constructed a plasmid library of synonymous HIS3 variants, we submit yeast cells carrying these synonymous variants to a growth assay in media lacking histidine. During the course of this competitive growth assay, cells in the population that efficiently and accurately translate the HIS3 transcripts increase within the population, whereas cells carrying transcripts with synonymous variation detrimental to translation proportionally decrease within the population. After recovering plasmids from population samples, we use high-throughput DNA sequencing to measure the relative abundance of each variant and to calculate variant enrichment scores. With these data, we can begin to explore the relative impact of factors such as adaptation to tRNA pool abundance, mRNA secondary structure, and sensitivity to environmental stress.
In Project 2, we are collaborating with the Grayhack laboratory at the University of Rochester to identify insertions in a GFP reporter that impair translation efficiency. To this end, the Grayhack lab has carried out Fluorescent Activated Cell Sorting (FACS) on randomized libraries of codon insertions in the N-terminal region of GFP. We are sequencing library variants from FACS expression bins to identify codon insertions with cell distributions skewed toward lower GFP expression values. These types of insertions may represent specific codon pairs or codon combinations that are translated less efficiently by the ribosome machinery.
Deep Mutational Scanning of a tRNA
tRNAs are of fundamental importance in translating the information contained in our genes into cellular and organismal function. A given tRNA must adopt a specific and conserved 3-dimensional structure in order to interact with the ribosome, with elongation factors, and with its corresponding amino acid tRNA synthetase. A good deal of cellular energy is also devoted to extensively modifying the bases of a tRNA during its maturation. Despite these contraints on tRNA shape and sequence, there are about 500 different human tRNAs and about 275 different yeast tRNAs, and significant sequence diversity both within and between species. In order to generate a set of all functional variants of a single tRNA and thereby determine the extent to which it can tolerate mutation, we have collaborated with the Phizicky and Mathews labs at the University of Rochester to adapt deep mutational scanning to the study of tRNA function.
The assay relies on the ability of a suppressor tRNA, which recognizes a stop codon, to allow the ribosome to “read through” the stop codon on an mRNA instead of stopping translation at the stop codon and releasing the mRNA. We modified yeast by the addition of two plasmids, one containing a Green Fluorescent Protein (GFP) reporter and one containing a mutant version of the tyrosine tRNA that recognizes the ochre stop codon (UAA). The GFP reporter contains an ochre stop codon at the beginning of its sequence, such that it fails to be translated into a functioning protein unless a working tRNA suppressor gene is also present to read through the stop codon. In this way, the function of the ochre suppressor tRNA can be determined by observing the fluorescence of the cell. The ochre suppressor tRNA was mutated extensively, and the library of mutants was transformed into yeast along with the GFP reporter. The yeast transformants were sorted by fluorescence into bins in a Fluorescence Activated Cell Sorter, and the plasmids from the yeast in each bin were sequenced on an Illumina MiSeq. For a given mutant, the percentage of MiSeq reads in each bin, along with the average fluorescence of the bins, can be used to determine the average fluorescence due to that mutant tRNA. The weighted average fluorescence was used to stratify the mutants by function.
We have obtained functional data for every possible single mutation, for about 13,000 double mutations, and for about 30,000 more highly mutated variants. Surprisingly, 43% of the single mutants showed near wild type levels of fluorescence, and the majority of the single mutants retained at least some function. In addition, large numbers of double mutants showed near wild type levels of fluorescence, indicating that despite all of the modifications and structure constraints, tRNA function is relatively robust to mutation. We have also examined mutant performance in a yeast strain with a mutated rapid tRNA decay quality control pathway, which degrades misfolded or unmodified tRNA. By comparing the performance of tRNA variants in the wild type strain and the decay pathway mutant, we have identified many new targets of this pathway. We are currently examining the double mutants in order to gain insight into the relationships between positions within the tRNA. We have seen some expected relationships; for example, deleterious mutations that abolish base pairing in one of the stems are rescued by changes that restore base pairing in another stem. Other interactions occur within or between loops. By examining positional interactions in various backgrounds, we hope to gain a greater understanding of the determinants of tRNA structure and function.
Investigating the HIV-1 Tat-TAR interaction
The HIV-1 Tat protein is integral to the viral life-cycle as it can induce efficient transcription of the virus after binding a folded element of the HIV LTR called TAR. Previous studies have elucidated the effects of some mutations of Tat, but the overall depth and density of the studied mutations is low. We are investigating the Tat-TAR interaction using deep mutational scanning, a high-throughput technology recently developed in the lab.
By creating a library of hundreds of thousands of variants of Tat and selecting for binding to TAR using a yeast three-hybrid assay, we are examining the relationship between the sequence of Tat and its TAR-binding function at an unprecedented resolution. The Tat-TAR interaction is thought to be driven by an enrichment of basic residues in the core of the protein rather than a specific amino acid sequence, but it is not known if point mutations outside of this core region can affect the TAR interaction. This study of mutations that affect the affinity of Tat to TAR can contribute to our understanding of both protein-RNA interactions, as well as the mechanism of HIV transcription and activation.
•Daniel Melamed, Matt Rich & Christina Miller