Lecture Summary March 2: Coalescence (first part of Part 1)
Wright-Fisher population model
Under the assumption that there are N diploid individuals
(2N gene copies) and that there is no selection, and each
gene copy has the same chance to be propagated into the next generation
we can draw a genealogy for all the the gene copies, if we can observe
and record everything. The system allows also to calculate how long it will
take until one gene gets fixed by chance (see your simul8 exercises).
This is a rather tedious way of tracing every lineage of a gene that eventually
gets extinct. An easier way to look at this, instead from the past to the future, to search for relationship from today back into the past.
Probability of descent
Sewall Wright showed that two randomgene copies have the same parental
gene copy with probability of 1/(2N). In every generation we
toss a coin and hope for heads (that they have the same parent)
with a probability of 1/(2N). Eventually the two lineages come
together (coalesce). This process follows a Geometric distribution with
Expectation of 2N. The geometric can be approximated to good accuracy
by the exponential distribution.
The coalescent
In 1982 J. F. C. Kingman extended the work of Wright to samples larger than
2. He showed that for k samples the probability for a coalescence
(he termed the process the n-coalescent) is k(k-1)/(4N) and
also using an exponential distribution, the expectation for the time of coalescence is E[time u] = 4N/(k(k-1)). The coalescent is a very good approximation
to the processes in a Wright-Fisher population model as long as N >>k and N is
not very small. The coalescent allows to calculate the
probability of a whole genealogy. Exploring this expectation,
you can see that when 2N is very large that time u gets large, and if @n is small the opposite.
If k is large that u gets small, an indication that with large samples
we add most lineages tipwards and almost none of the many samples connects
the genealogy at very bottom. Summing up all these times we can calculate the
the time it takes until the whole genealogy has coalesced (the time to the
most common recent ancestor).
Doing so we can see that the total time is 4N/(k(k-1)) + 4N/((k-1)(k-2)) + .... + 4N/2, this can be simplified to 4N(1-1/k).
So we can use a genealogy of a sample to estimate the population size.
Variability of a coalescent tree
The exponential distribution is very noise distribution and when we produce
many genealogies from the same population size we can get very different
shapes and length. The simple approach of taking a genealogy (and believing that this is the true genealogy) and getting
the population size seems to be a rather unsafe procedure.
Shapes of coalescent genealogies
Changes in population size leaving signatures in the genealogies:
- Small populations will have short trees
- Large populations will have long trees
- Growing populations will have more coalescences at the bottom of the
genealogy, the tip-ward coalescences are less condensed.
- Shrinking populations show the opposite.