Lecture Summary March 2: Coalescence (first part of Part 1)

Wright-Fisher population model

Under the assumption that there are N diploid individuals (2N gene copies) and that there is no selection, and each gene copy has the same chance to be propagated into the next generation we can draw a genealogy for all the the gene copies, if we can observe and record everything. The system allows also to calculate how long it will take until one gene gets fixed by chance (see your simul8 exercises). This is a rather tedious way of tracing every lineage of a gene that eventually gets extinct. An easier way to look at this, instead from the past to the future, to search for relationship from today back into the past.

Probability of descent

Sewall Wright showed that two randomgene copies have the same parental gene copy with probability of 1/(2N). In every generation we toss a coin and hope for heads (that they have the same parent) with a probability of 1/(2N). Eventually the two lineages come together (coalesce). This process follows a Geometric distribution with Expectation of 2N. The geometric can be approximated to good accuracy by the exponential distribution.

The coalescent

In 1982 J. F. C. Kingman extended the work of Wright to samples larger than 2. He showed that for k samples the probability for a coalescence (he termed the process the n-coalescent) is k(k-1)/(4N) and also using an exponential distribution, the expectation for the time of coalescence is E[time u] = 4N/(k(k-1)). The coalescent is a very good approximation to the processes in a Wright-Fisher population model as long as N >>k and N is not very small. The coalescent allows to calculate the probability of a whole genealogy. Exploring this expectation, you can see that when 2N is very large that time u gets large, and if @n is small the opposite. If k is large that u gets small, an indication that with large samples we add most lineages tipwards and almost none of the many samples connects the genealogy at very bottom. Summing up all these times we can calculate the the time it takes until the whole genealogy has coalesced (the time to the most common recent ancestor). Doing so we can see that the total time is 4N/(k(k-1)) + 4N/((k-1)(k-2)) + .... + 4N/2, this can be simplified to 4N(1-1/k). So we can use a genealogy of a sample to estimate the population size.

Variability of a coalescent tree

The exponential distribution is very noise distribution and when we produce many genealogies from the same population size we can get very different shapes and length. The simple approach of taking a genealogy (and believing that this is the true genealogy) and getting the population size seems to be a rather unsafe procedure.

Shapes of coalescent genealogies

Changes in population size leaving signatures in the genealogies: