This is all assuming a Wright-Fisher population of discrete non-overlapping generations.
-
There are a fixed number of N individuals.
-
Mutations enter the population at a rate of μ mutations per individual per site per generation.
-
In a haploid population (usually the case for pathogens), we summarize the population with the parameter θ, which equals 2Nμ. Without temporally resolved data, it's not possible to separately estimate N and μ; in most population genetic circumstances, we can only estimate θ. With temporally resolved data we can separate θ into N and μ.
Genetic diversity is most commonly summarized with the statistic π, which is equal to the average number of mutations per site between two random individuals in the population. π is most commonly measured in terms of substitutions per site. The expectation of π follows
π for Drosophila and π for flu is approximately 0.01, while π for humans is approximately 0.001. This means that for an average length gene of 1000 basepairs, two random fruit flies or two random flues will probably differ at ~10 sites, while two random humans will differ at ~1 site.
The number of unique haplotypes in a sample of n sequences can be estimated from Ewen's sampling formula. Ewen's sampling formula gives the probability of observing a1 copies of haplotype 1, a2 copies of haplotype 2, etc... in a sample of n sequences. The sole parameter of the sampling formula is θ. Thus θ is sufficient to predict the entire distribution of haplotype frequencies. The expectation of k unique haplotypes follows:
With θ = 0.2, there is usually only a single dominant haplotype in the population.
With θ = 1.0, there are generally a small handful of haplotypes.
With θ = 5.0, there are many segregating haplotypes.
A new mutant appears in the population at an initial frequency p of
Similarly, if a mutant is at population frequency p, then it has a p chance of fixing. At any point in time, looking forwards, the chance of fixation of a neutral mutation is just its frequency.
Conditioned on a neutral mutant fixing, the expected time to fixation is 2N generations. Thus, the rate of population turnover scales inversely with population size. This time to fixation is also a measure of the strength of random genetic drift.
With θ = 1 and N = 20, haplotypes emerge and disappear rapidly.
With θ = 1 and N = 100, population turnover takes approximately 200 generations.
With θ = 1 and N = 500, population turnover occurs slowly.
Each generation we expect Nμ mutations across the entire population. Each mutation has a
This result, that the rate of neutral divergence across a species is equal to the rate of mutation in a single individual, is a classic finding from Kimura.
Mutation rate μ determines the rate of interspecies divergence, population size N determines the rate of population turnover and their product Nμ determines the level of standing genetic variation.