Population Genetics Simulation and Modelling — Wright-Fisher Model
Q: What is the distribution of offspring number per individual in the Wright-Fisher model? Convince yourself that, for large enough populations, this distribution is approximately Poisson with mean 1.
As you can see above I have used the concept that “N children pick a parent at random” to form a binomial distribution and then derived a Poisson distribution when N tends to infinity.
Q: What is the distribution of the number of alternate alleles if we sample s individuals in a population with allele frequency p?
If we are sampling s individuals from a population we cannot choose the same person twice. Also the population size is fixed and it has a fixed number of individual with alternate allele. This is equivalent to “probability of k successes in s draws without replacement from a finite population of size N that contains p*N objects with that feature”. So this is a hypergeometric distribution.
It is clear from the simulation as shown above that the number of alternate alleles if we sample 50 individuals from the initial population repeatedly the number of individuals with alternate allele follows hypergeometric distribution.
Here we see the distribution of number of alternate alleles in offsprings if they are allowed to choose a random parent. Since one parent can have multiple offsprings this is a sampling with replacement. So this is equivalent to independent 10000 Bernoulli trials. So this is a binomial distribution.
As you can see above the allele has gone extinct after 10–12 generations in the particular simulation.
Drift
You can see the drift in 10 independent populations over 30 generations starting with allele frequency 0.3
In this graph (which looks like a work of Van Gogh to me) you can see the evolution of 1000 independent populations over 30 generations.
Q: Over time, there are more and more populations at frequencies 0 and 1. (Why?)
This is because a kind of saturation taking place at 0 and 1. When certain population reaches extinction of the allele it stays at 0 and if fixation is achieved it stays at 1. Even after further generations those populations remain in the same category. But other populations which were in between can reach these saturation in subsequent generations.
As you can see all 1000 populations were in 3rd bin initially and spreads out in subsequent generations.
Q: What is the expected distribution of allele frequencies after one generation, if they start at frequency p in a population of size N? (Hint: we explored this numerically above!)
As we have seen previously in this document the distribution is binomial
What is the variance of this distribution?
Effect of population size on the rate of change in allele frequencies
When population size decreases the distribution spreads out i.e variance increases. This is in accordance with the theory we developed above.
Variance vs Population size
The theory fits the simulation perfectly! This serves as further verification.
How does the rate of change in allele frequency depend on the initial allele frequency?
- The relationship between initial frequency and variance is p0(1-p0)/N where p in the initial frequency.
- Variance for p0 = 0 and po = 1 is zero from the above formula. Also it is apparent in the above graph.
Variance vs Initial Frequency
Can you explain why this function is symmetrical around p0=0.5?
Mutation
Analyzing the population frequency over 100 generations
As you can see most populations fix at zero frequency.
Q: What is the probability that a new mutation fixes in the population? — solve this problem both mathematically and numerically.
Consider a haploid population of N individuals. If a mutation is introduced in a single individual its allele frequency is 1/N. If we consider a single individual’s allele over very long period of time there can be only two possibilities: it either fixes in the population or becomes extinct. Since there is no selection the chance for any individual to produce a surviving lineage is equal. So the probability that the mutation fixes in the population is its allele frequency. That is 1/N.
Comparing Theory vs Simulation Regarding Fixation Probability of a Mutation
As you can see the fraction of population where mutation fixed converges to the predicted value.
4. References
[1] The Connection Between the Poisson and Binomial Distributions — http://www.oxfordmathcenter.com/drupal7/node/297
[2] Hypergeometric distribution — https://en.wikipedia.org/wiki/Hypergeometric_distribution
[3] Binomial distribution — https://en.wikipedia.org/wiki/Binomial_distributio
[4] BIOL 434/509: Population genetics — http://www.zoology.ubc.ca/~whitlock/bio434/434LectureNotes.pdf
[5] Wright — Fisher reproduction — https://www.stat.berkeley.edu/~terry/Classes/s260.1998/Week13a/week13a/node9.html