Elucidating a Code for RNA Sequence Recognition
The ability to design a protein that can bind specifically to any RNA and regulate its fate would enable numerous research and therapeutic applications. However, decoding RNA-binding specificity for most types of RNA-binding domains is challenging, as these domains associate with RNA via complex networks of interactions. The challenge of engineering a domain with high specificity and affinity is further complicated by the typically weak associations these RNA-binding domains make with only a small number of RNA bases.
However, a few RNA-binding domains bind by recognition mechanisms that make them more ideal candidates for protein design. The modular architecture of the PUF RNA-binding domain is one such example (Figure 1). The PUF domain almost always contains 8 functional copies of a 36 amino acid long alpha-helical repeat. Each PUF repeat recognizes a single RNA base via three amino acids at conserved locations referred to here as a tripartite recognition motif (TRM) (Campbell ZT et al. 2014).
With its natural ability to bind selectively and with high affinity to its 8-mer recognition sequence, the PUF domain is an attractive candidate for deep mutational scanning, in which we elucidate the RNA-binding preference of a large number of variants. We can combine the yeast three-hybrid method (Figure 2A) with next generation sequencing technology to score the binding activities of variants of a PUF domain and a cognate RNA sequence. We adapted the traditional yeast-three hybrid expression system to deep mutational scanning by combining both the tested PUF protein and the RNA expression modules into a single, centromeric plasmid (Figure 2B). We showed that binding the specificity of each PUF repeat in this system recapitulates the in vivo specificity of this domain, and therefore we can score each mutated PUF repeat for its ability to bind to each of the 4 possible RNA bases (Figure 2C). Indeed, selection of libraries containing variants of the PUF domain and the RNA on plates that require HIS3 reporter gene activation identified TRM:RNA base combinations that are likely to interact (Figure 2D). This approach will allow us to characterize the specificity and the affinity of each PUF repeat for any RNA base and to progress towards uncovering a complete code for RNA recognition.
•Daniel Melamed & Christina Miller
Codon Context and Translation Efficiency in a Yeast GFP Assay
Because of degeneracy in the genetic code, several codons can encode the same amino acid. Yet variation in a gene’s synonymous codon usage can result in protein production differences with phenotypic consequences for the cell. The contexts and mechanisms by which codon usage impacts translation are not well defined.
In collaboration with the Grayhack lab at University of Rochester, we have sought to experimentally identify non-optimal codons and codon combinations in yeast. We generated two integrated libraries, each containing a three-codon insertion near the N-terminus of superfolder GFP. We performed fluorescence-activated cell sorting followed by high-throughput sequencing of the insertions to estimate the mean expression level for a total of 35,811 GFP variants. We identified a subset of codons that was frequently found in low expression variants. We also found that for a small number of adjacent codon pairs, most variants containing these pairs had low expression levels, whereas most other variants exhibited high levels of expression. Reconstructed variants with these pairs had reduced GFP fluorescence levels relative to a synonymous construct. Overall, we identified 20 pairs with evidence of a general inhibitory impact. Additionally, the directionality of a pair was often central to inhibitory effects. Thus, we have identified codon pairs that are likely to reduce translation efficiency due to the pair’s impact on translation dynamics within the ribosome, and we are currently following-up on these pairs with tRNA suppression experiments to better understand the mechanisms of codon pair-mediated inhibition.
Deep Mutational Scanning of a tRNA
tRNAs are of fundamental importance in translating the information contained in our genes into cellular and organismal function. A given tRNA must adopt a specific and conserved 3-dimensional structure in order to interact with the ribosome, with elongation factors, and with its corresponding amino acid tRNA synthetase. A good deal of cellular energy is also devoted to extensively modifying the bases of a tRNA during its maturation. Despite these constraints on tRNA shape and sequence, there are about 500 different human tRNAs and about 275 different yeast tRNAs, and significant sequence diversity both within and between species. In order to generate a set of all functional variants of a single tRNA and thereby determine the extent to which it can tolerate mutation, we have collaborated with the Phizicky and Matthews labs at the University of Rochester to adapt deep mutational scanning to the study of tRNA function.
The assay relies on the ability of a suppressor tRNA, which recognizes a stop codon, to allow the ribosome to “read through” the stop codon on an mRNA instead of stopping translation at the stop codon and releasing the mRNA. We modified yeast by the addition of two plasmids, one containing a Green Fluorescent Protein (GFP) reporter and one containing a mutant version of the tyrosine tRNA that recognizes the ochre stop codon (UAA). The GFP reporter contains an ochre stop codon at the beginning of its sequence, such that it fails to be translated into a functioning protein unless a working tRNA suppressor gene is also present to read through the stop codon. In this way, the function of the ochre suppressor tRNA can be determined by observing the fluorescence of the cell. The ochre suppressor tRNA was mutated extensively, and the library of mutants was transformed into a yeast strain containing the GFP reporter. The yeast transformants were sorted by fluorescence into bins in a Fluorescence Activated Cell Sorter, and the plasmids from the yeast in each bin were sequenced on an Illumina MiSeq. For a given mutant, the percentage of MiSeq reads in each bin, along with the average fluorescence of the bins, can be used to determine the average fluorescence due to that mutant tRNA. The weighted average fluorescence was used to stratify the mutants by function.
We have obtained functional data for every possible single mutation, for about 14,000 double mutations, and for about 30,000 more highly mutated variants. Surprisingly, 37% of the single mutants retained at least some function. In addition, around 10% of double mutants showed near wild type levels of fluorescence, indicating that despite all of the modifications and structure constraints, tRNA function is relatively robust to mutation. We have also examined mutant performance in a yeast strain with a mutated Rapid tRNA Decay (RTD) quality control pathway, which degrades misfolded or unmodified tRNA. By comparing the performance of tRNA variants in the wild type strain and the decay pathway mutant, we have identified many new targets of this pathway. The majority of these new decay pathway targets are located in parts of the tRNA not previously known to be monitored by the RTD system, such as the anticodon and D stems. By examining the double mutants in relation to their constituent singles, we were able to gain some insight into the relationships between positions within the tRNA. We have seen some expected relationships; for example, deleterious mutations that abolish base pairing in one of the stems are rescued by changes that restore base pairing. Other interactions occur within or between loops. In at least one case, the variable loop can shorten to restore base pairing at the beginning of the anticodon stem. We have also tested the library at various temperatures to identify heat sensitive tRNA variants and to determine the relationship between temperature sensitivity and RTD targeting. By modeling RTD and temperature sensitivity as functions of tRNA structure and free energy, we are able to predict these values for untested tRNA mutants. We are currently validating these predictions for different mutants in different tRNAs. By examining positional interactions in various backgrounds, we hope to gain a greater understanding of the determinants of tRNA structure and function. In addition, the application of this assay to other tRNAs will provide a high throughput means of identifying potential disease causing variants.
Assessing the Impact of Synonymous Mutations in TP53 and JAK3
The increased availability of DNA sequences from cancer genomes has led to the identification of mutations in oncogenes and tumor suppressor genes associated with cancer development. Traditionally, the focus has been on identifying mutations within these genes that alter the protein coding sequence, known as non-synonymous mutations. However, recent evidence has demonstrated that synonymous mutations, those in the coding regions that do not alter the protein sequence, are implicated in the development of various human diseases. Further, several oncogenes and tumor suppressor genes isolated from cancer genomes have been noted to contain more synonymous mutations than would be expected, indicating a possible role of synonymous mutations acting as driver mutations for the development of cancer. Our goal is to directly assess the potential effects of synonymous mutations in oncogenes and tumor suppressor genes through a deep mutational scanning approach: creating a library of cells containing all possible synonymous mutations for one exon of a gene and assaying the library for changes in the level of protein expression.
We are specifically investigating TP53, a tumor suppressor gene associated with a multitude of somatic cancers and a hereditary cancer syndrome, and JAK3, an oncogene recently linked with development of several leukemias. We are creating a mini-gene reporter system, in which an exon of interest flanked by one intron and exon on either side is tethered to an in-frame GFP and introduced into a human cell line via plasmid. By systematically mutating the wobble position of each codon within the exon of interest, we create a library of cells containing all possible synonymous mutations.
With this reporter structure, any mutations that cause changes in the level of exon expression, stability of the mRNA, splice-site skipping, or splice-site enhancement are reflected by changes in GFP level within cells. Therefore, we can use Fluorescence Activated Cell Sorting (FACS) to sort the library of cells into bins of varying fluorescence depending on the level of GFP in each cell. By high-throughput sequencing of sorted cells, we are able to determine the effect of each synonymous mutation on protein levels in the cell. Identified mutations can then be further investigated individually to characterize the possible mechanisms by which they act. This technique allows us to make more direct approximations of how much of an effect synonymous mutations can have within these genes, and expand upon the limited understanding of the potential for synonymous mutations to act as drivers for cancer development.