Lecture notes for February 7

We've been handling mutations and alleles as abstract concepts; this lecture will look at the molecular biology behind the abstraction.

Genetic Code.

The structure of the genetic code means that a few first-position mutations and many third-position mutations are silent-they do not change the protein. Silent mutations are not guaranteed to be neutral, as they may change the regulation of the gene, but they often are.

The ratio of synonymous (silent) to non-synonymous (coding) mutations is an indicator of the type of selection. First we have to allow for the fact that there are more possible coding changes than silent changes. The solution is to calculate silent changes per silent position and coding changes per coding position.

We expect D_S to be much greater than D_N (equivalently, for their ratio to be greater than 1) in most coding genes, since most coding changes are likely to be rapidly removed or fixed by selection. In a gene which is not under selection at all (for example, a pseudogene) the ratio will be close to one.

Gene regions where D_S/D_N is less than one are interesting. This means that coding substitutions are actually more likely to be fixed than silent ones. This is seen in genes which are selected for diversity, such as the outer coat of influenza virus, or antibody genes in mammals. Often only one part of the gene (such as the active site) has excess coding substitutions, while other parts have mainly silent substitutions.

The genetic code itself must have evolved. It is not quite universal- mitochondria and chloroplasts, and some bacteria, have slight variants. Probably only a creature with a tiny genome can afford to change its code. The organism may go through a phase where it does not use a specific codon at all, and then re-introduce that codon with a new meaning.

Amino acids with similar chemical properties are somewhat clustered in the code. It may have evolved to reduce the rate of serious errors, but this is controversial.

RNA-coding genes

A subset of genes code not for protein but for a useful RNA, such as ribosomal RNA. These do not have the first/second/third codon distinction. Instead, they have paired stems and unpaired loops. Single mutations in the stem are selectively disfavored, but rare double mutations which change both partners are possible.

Fast-evolving versus slow-evolving genes

Mutation rates may vary across the genome, but probably not hugely. Substitution rates-the rates of fixed mutations-vary a lot. Genes like cytochrome c hardly vary across all of life (except for silent substitutions) whereas genes like influenza coat protein vary drastically. The difference is that some genes can tolerate many changes without losing function, and/or have a function that is not critical, whereas other genes must be very precise to work, and are essential for life. RNA-coding genes tend to be quite conservative. Conservative genes are useful for understanding the basic relationships of all life; quick-changing genes are useful for understanding the detailed relationships among close relatives. [We will come back to this in a few weeks.]

Dominance and recessiveness.

Mutations come in various kinds. These become a lot easier to understand when the biochemical basis can be seen. You can classify mutations by what kind of molecular changed caused them, or by their biochemical effect.

Loss-of-function mutations

Loss of function mutations eliminate the protein product, or produces a useless one. This can happen by deletion of the gene or a critical part; damage to its control regions that keep it from being read; frameshift or premature stop codons; or a single substitution that destroys function. Such mutations are usually recessive, since the normal copy of the gene still makes functional protein. A few are dominant or semi-dominant because one copy is not enough (called haplo-insufficiency).

Partial loss-of-function mutations

Partial loss-of-function mutationsare similar, but gene function is reduced instead of eliminated. It may be lost only under certain circumstances (for example, temperature-sensitive mutations) or only partially lost (the protein works, but has a lower effectiveness than wild type). These are also usually recessive.

Overexpression mutations

Overexpression mutations produce more of a gene product, or a gene product that is more active. These can be recessive, semi-dominant or dominant, depending on the details of what the extra protein is doing.

Often the problems caused by under- or over-expression are because of imbalance. It would be okay to have ten times more gene A product if only you had ten times more gene B product as well, but an imbalance is harmful. This is thought to be why duplication of a whole chromosome is usually disadvantageous

Regulatory mutations

Regulatory mutations change the time or place of gene expression. Lysosyme is expressed in human tears and keeps the eyes clean, but it's expressed in cow stomachs and digests grass.

Regulatory mutations come in two kinds. Some affect the regions of a gene that allow it to be controlled, particularly the promoter and enhancers. These are usually recessive if they remove expression in a particular tissue, time, or situation, and dominant if they add expression. They affect only the gene copy to which they are attached.

One way to get this type of regulatory mutation is a rearrangement of the genome that puts one gene next to another gene's control sequences. Lysosyme might have moved next to a stomach-enzyme gene and thus become controlled by stomach-specific sequences.

The second kind of regulatory mutation affects a regulating gene. Such mutations are often dominant if they increase the gene's ability to regulate, and recessive if they eliminate it. For example, lacI is a regulatory gene that controls the expression of other lac genes. If lacI is damaged and stops working, the lac genes are recessively uncontrolled. If lacI is mutated so that it binds too tightly and cannot be removed, this is a dominant mutation. (Bacteria are normally haploid, but we can assess dominance/recessiveness by artificially adding another copy of the gene.) This type of regulatory mutation affects both copies of the target gene, not just the one it is linked to.

Gain-of-function mutations

Gain-of-function mutations create a protein with a new function. These are rare but evolutionarily important. They are often dominant. (If the new gene lacks the old function, they can be overdominant-you need both an old copy and a new copy.)

One relatively common way to evolve a new function is to combine parts of two working genes.

Introns and exons.

Mutations in introns are usually silent, unless they damage the sequences needed to accurately cut the intron out. A few introns contain regulatory sequences and can be the site of regulatory mutations.

Why are introns present in the first place? Many eukaryotic genes do not function correctly without their introns. Processed pseudogenes are genes which were created by reverse-transcribing processed mRNA back into DNA and integrating it back into the genome. They lack introns and have poly-A tails like mRNA. Usually they are inactive. Apparently if the intron- removal process is skipped, the mRNA does not get processed correctly.

However, most bacteria and some eukaryotes (such as yeast) manage fine without introns.

It has been suggested that introns are useful because they break genes into functional pieces which evolution can mix and match to make new genes. This is controversial because the creation of new genes is quite a slow process and it's not clear that selection in favor of new-gene-creation would be strong enough to maintain introns. Even if a creature with no introns cannot easily evolve new gene functions, this wouldn't seem to eliminate it from the gene pool quickly. Lack of introns could become fixed by drift before selection had a chance to act.

Also, bacteria have few introns and yet manage to evolve very well!

Codon bias and nucleotide compositional bias.

Most organisms have noticable preferences for some codons over others. This is most pronounced in genes that are expressed at high rates. It is thought that selection favors codons which the organism can translate efficiently. Some researchers (including me) are surprised that this selection is strong enough to produce codon bias. An alternative suggestion is that the organism's mutation process is biased, but this doesn't explain why high-expression genes show the most bias.

Fairly closely (we think) related bacteria can have quite different codon bias. It is believed that if a gene is transferred among bacterial species it will slowly develop the codon bias of its new host. This has been used to try to estimate the rate of gene transfer among bacteria; high- expression genes with deviant codon bias are assumed to be recent imports. This method suggests that 1% or more of the E. coli genome may be foreign DNA.

The proportion of A/T versus C/G also varies. Mitochondria are very A/T rich. Bacteria and archaea evolved to live in near-boiling water are very G/C rich. G/C bases bind more tightly to each other than A/T; perhaps A/T richness speeds up replication, and G/C richness protects against heat-induced DNA unwinding. It is not clear whether the bias is due to differences in mutation rate, differences in fixation rate, or both.

Things to think about.

What would an organism with a much more error-prone genetic system be like? (RNA viruses are a real-world example.) What would an organism with a much less error-prone genetic system be like? (None are currently known.)
A few organisms, mostly viruses, have a lot of overlapping genes that are read in more than one frame. How do these creatures evolve?
Why are there twenty standard amino acids-why not ten, or thirty?