Lecture notes for February 28: Phylogenetic Trees continued


Consensus tree. If we have several trees, we can show a tree diagram that indicates points of agreement. There are several ways to do this. Strict consensus shows only the parts that all trees share. Unfortunately most strict-consensus trees look like stars with no interesting structure. Majority-rule consensus shows all the parts that appear on 50% or more of the trees. (If the 50% parts don't fill out the whole tree, it is common to add additional high-scoring groups until the tree is connected. It would be more strictly correct to show the remaining part as a star.)

Bootstrap results are generally shown as majority-rule consensus trees with each branch labeled with its bootstrap score.

It can be shown mathematically that the groups which score above 50% are guaranteed to all be compatible with the same tree.

Using trees. What questions can we ask about a tree? We can bootstrap it (see previous notes) to see if it is well supported, and to identify the shaky parts.

We can compare trees inferred using different assumptions. The likelihood ratio test compares two maximum-likelihood trees made with different models, and can help establish which model is better. This test can only be used if the two models are nested which means that one is a special case of the other. One example is the model where all lineages evolve at the same rate, which is a special case of the model where each lineage has its own rate. The tree will always fit the more general model better (the more general model has more stretch, and so can be made to fit more tightly) but we can test whether the improvement in going to the more general model is significant or not. This is a standard test of the molecular clock.

The test uses log-likelihoods (the logarithm of the likelihood) which are standard for talking about likelihood methods (they are more reasonable numbers to handle than likelihoods, which are often terribly small numbers). Take the difference between the log likelihood of the tree under the general and specific models, double it, and compare it to a chi-square distribution with degrees of freedom equal to the number of extra parameters in the more complex model. (That's why the models have to be nested; otherwise we couldn't compute the degrees of freedom. There are ways to compare non-nested models, but they are more difficult.)

[I'm not going to ask you to do likelihood ratio tests on an exam.]

If the general model is significantly better, we can reject the hypothesis that a molecular clock applies to these data. Similiar tests can be done for other nested hypotheses: for example, that some sites evolve faster than others, or that transitions are favored over transversions.

We can also compare the trees themselves, though different tests are needed because trees are not nested hypotheses. For example, in the famous human mtDNA data set of Vigilant et al. we can ask whether there exist trees not significantly worse than the best tree which do not show an African root for humanity. (As it turns out, such trees do exist, though they are rarer than the African-root kind.)

We can compare trees for different loci and see if they disagree. The discovery that trees for mitochondrial and chloroplast genes were very different than the trees for their containing organisms led to the realization that mitochondria and chloroplasts are probably foreign DNA acquired by eukaryotes sometime in their evolution.

Some interesting results from phylogenetics.

The amount of genetic difference associated with a taxonomist's idea of orders, classes, genuses, species, etc. varies hugely across the natural world. Humans and chimpanzees differ by less than 5%. In most other groups this level of difference would indicate close sibling species (perhaps we are making a mistake when we consider humans and chimps to be in different families?) At the other extreme, strains of E. coli can vary hugely, 20% or more.

Organisms with conservative body plans (living fossils like the horseshoe crab) have just as much evidence of genetic change as other organisms. Whatever keeps them looking the same, it does not keep all their genes the same.

Phylogenies can help interpret traits. Many salamanders can project their tongues to catch insects. This trait does not fit well on molecular trees of salamanders; it seems to have appeared three times. On examination, the three groups of salamanders do not use the same physical mechanism to project their tongues; they were being clumped together due to a superficial understanding.

We've already seen how the distribution of asexual lineages on the tree suggests that asexual species are short-lived.

The comparative method analyzes traits on a tree to sort out the difference between traits that are correlated due to natural selection, and traits that are correlated because of family relationships among species. For example, large lizards use a different style of head-bobbing dance than small lizards. Is this because one dance is better adapted for large lizards and the other for small, or is it because all the large lizards are related and happened to inherit the dance from their common ancestor? The comparative method factors out similarity due to ancestry, allowing us to see if there is any additional similarity which could indicate selection.