Lecture Summary February 14 2001: Genome Evolution

Genome size

Genome size is often expressed in weight units (1 picogram = 10-15 g) or in Gigabases (Gb), Megabases (Mb), or kilobases (kb), 1 picogram is roughly 980 Mb). Sometimes the genome size is specified as the C-value ((C)onstant value because in a species it is rather constant). )

Prokaryotes

The known genome sizes vary over a range of about 20-30 fold.
Eubacteria 580-13200 kb
Mycoplasma 580-1800 kb
Gram negative 650-7800 kb
Gram positive 1600-11600 kb
Cyanobacteria 3100-13200 kb
Archaebacteria 1600-4800 kb

Eukaryotes

The range of the genome size in Eukaryotes is much larger (up to a few thousand fold) [a few examples]
Drosophila melanogaster (fruitfly) 0.18 Gb
Chicken 1.2 Gb
Carp 1.7 Gb
Boa constrictor (snake) 2.1 Gb
Rat 2.9 Gb
Human 3.4 Gb
Tobacco 3.8 Gb
Onion 18.0 Gb
Amphiuma (salamander) 84.0 Gb
Lungfish 140.0 Gb
Ophioglossum (Fern) 160 Gb
Amoeba dubia 670 Gb

What increases genome size

Polyploidization

Two different processes can form complete genome duplicates.

Autopolyploid: During meiosis the genome doubles and then segregates into haploid genome sets (in the gametes). When this process goes wrong (e.g. stopping of the segregation through some chemicals or genetic defects) we can end up instead of having 4 haploid gametes with diploid gametes, these often endreduplicate and form a tetraploid organism.

Allopolyploid: By hybridizing two species A, B and endoreduplication one can form a tetraploid that contains 2 sets from species A and two sets from species B.

Polyploidisation is a common tool in plant breeding and almost all our crop plants are polyploid. The wheat we use for bread flower was created by a hybrid crossing between two diploid species with 14 pairs (2n=14) of chromosomes and an endoreduplication (2n=28), this allotetraploid hybrid (which is used for semolina flower (pasta)) then was crossed another diploid species with 2n=14. The resulting hybrid is hexaploid (2n=42).

Tetraploids (and other even-ploids) are normally fertile and produce viable offspring. Triploids (diploid gamete joined by a haploid gamete) normally are healthy individuals, but because in meiosis the chromsome pairing gets messed up, triploids are often sterile. This is used in production of seedless crop, such as bananas, water melons, grapes.

Gene duplications, Transposons

see lecture on Multi-gene families

How much of the genome is used?

Analyses showed that we can distinguish between three major groups of genes

The repetitive DNA is most non-coding.

Genome sizes and amount of single copy genes
Species Genome size (Mb) %scDNA
Drosophila melanogaster 0.2 60
Bufo bufo (Toad) 6.9 20
Natrix natrix (snake) 2.5 47
Gallus domesticus (Chicken) 1.2 80
Human 3.4 64

Birds have a rather small genome that is remarkable similar in size among all birds. It is believed that because of their high metabolic rate (much higher than in many mammals and amphibians) copying large unused chunks of DNA would cause a fitness reduction. An alternative to this view would be that the small genome size in birds is just a historical accident, their closest living relative crocodiles have a genome that is not much larger than the birds.

Maintenance of genome size

Genome sizes vary widely among species, and even among closely related species, so one can wonder what maintains the genome size. Several hypotheses were put forward, and the case is still not decided, some of them most likely apply, at least partly

GC-content

The contents of Guanine and Cytosine in DNA in many species has some typical frequency, it is often higher than what we would expect when all the mutations between the different nucleotides were equal. For example in bacteria, E. coli it is 50%, in Mycobacterium tuberculosis 67%, but in some it is very low (e.g. Clostridium 26%). Similar striking differences are seen also in eukaryotes: Gardia has about 80% GC content whereas Plasmodium falciparum <20% (Malaria causing organism). Several hypotheses were put forward and it seems that GC contents may be related to selection for thermal stability: codons with more GC code for more thermostable aminoacids. UV and radiation also produces dimers between T - T that produces a DNA defect. An alternative hypothesis is the one that the bias is caused by differences of the mutation rate from A and T to G and C.

We can explore these hypotheses using the genetic code. The expectation is that changes at the 3rd position are neutral because any change does not change the protein, but differences at the first, the second position will change the protein and may not be neutral. If the GC content is neutral we would expect the same ratio on all nucleotides in the codon. This ratio should be dependent on the overall GC content. The observation is that the third position follows a neutral expectation whereas the first and, even stronger, the second nucleotide show that the ratio in high GC content species is lower than we would expect.