--- title: 'Introduction to Population Genetics' author: "International Statical Genetics Workshop" #date: '2023-03-07' output: html_document: default pdf_document: default --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE) ``` **This practical runs on the workshop RStudio server.** First login to RStudio ```{bash, eval=FALSE} https://workshop.colorado.edu/rstudio/ ``` Copy the Practical document into your home directory using the following `R` commands. ```{R, eval=FALSE} setwd("~/.") system("mkdir Day2") system("cp /home/loic/2023/Practical_PopGen.html ~/Day2/.") setwd("Day2") ``` ## Part 1: Visualize changes in allele frequencies The `R` commands below define two functions: (1: `generateNextGeneration()`) generates genotypes for the next generation from a set of genotypes in the current population, and (2: `sim`), which uses (1) to simulate the evolution over $g$ generations of a population with a constant sample size ($N$), a certain number $m$ of SNPs segregating in the founding (or ancestral) population with a frequency ($p$). The `sim` function then plot the frequency of each of the `m` alleles over the course of `g` generations (each curve corresponds to a SNP). ```{R, eval=TRUE} generateNextGeneration <- function(x,n){ m <- ncol(x) N <- nrow(x) iM <- sample(1:N,n,replace = TRUE) iF <- sample(1:N,n,replace = TRUE) xm <- x[iM,] xf <- x[iF,] xa <- t(sapply(1:n,function(k) rbinom(m,1,prob=0.5*xm[k,]))) xb <- t(sapply(1:n,function(k) rbinom(m,1,prob=0.5*xf[k,]))) xn <- xa + xb return(xn) } sim <- function(N, # sample size p, # allele frequency in the founding population m, # number of marker g){ # number of generations ## Simulate genotypes in the founding population x <- do.call("cbind",lapply(1:m,function(j) rbinom(N,size=2,prob=p))) alleleFreq <- matrix(NA,nrow=g+1,ncol=m) alleleFreq[1,] <- colMeans(x)/2 for(gen in 1:g){ # cat(paste0("\tGen #",gen,"\n")) x <- generateNextGeneration(x,N) alleleFreq[gen+1,] <- colMeans(x)/2 } par(mar=c(5,5,3,2)) matplot(1:(1+g),alleleFreq,type="l",lty=1,axes=FALSE, main=paste0("Size of the population: N = ",N), xlab="Number of generations",ylab="Allele frequencies", col=sample(colors(),m),log="x",cex.lab=1.2) axis(1);axis(2) abline(h=p,col="grey") return(alleleFreq) } ``` **Execute these two functions in your `R` terminal then run the following command.** ```{R, eval=FALSE} t1 <- system.time( m50 <- sim(N=50 ,p=0.05,m=20,g=1000) ) ``` *Question 1a. What is the allele frequency after 1000 generations?* *Question 1b. Do you see the same patterns if you run this command multiple times?* *Question 1c. Try different sizes for the ancestral population (e.g., $N=100$, or $N=200$). How does it affect the frequency trajectories?* *Question 1d. Try different allele frequency in the ancestral population (e.g., $p=0.1$, or $p=0.01$). How does it affect the frequency trajectories?* *Question 1e. Run the following command. What can you say about the effect of the ancestral population size on the probability of fixation?* ```{R, eval=FALSE} t1 <- system.time( m50 <- sim(N=50 ,p=0.05,m=20,g=1000) ) t2 <- system.time( m100 <- sim(N=100 ,p=0.5,m=20,g=1000) ) t3 <- system.time( m200 <- sim(N=200 ,p=0.5,m=20,g=1000) ) ``` The objects `m50`, `m100` and `m200` contains the frequency trajectories of three simulated populations with sizes 50, 100 and 200 respectively. The following `R` commands count and visualize the proportion of fixed alleles over 1000 generations. **Run the following command. What can you can** ```{R, eval=FALSE} f50 <- apply(m50*(1-m50)==0 ,1,mean) f100 <- apply(m100*(1-m100)==0,1,mean) f200 <- apply(m200*(1-m200)==0,1,mean) par(mar=c(5,5,3,2)) matplot(cbind(f50,f100,f200),type="l",lwd=2,lty=1, xlab="Number of generations",axes=FALSE,cex.lab=2, ylab="Fraction of fixed allele frequencies") axis(1);axis(2) legend(800,0.25,legend=c("N=50","N=100","N=200"), box.lty=0,fill=1:3,horiz=FALSE,border=0,cex=1) ``` ## Part 2: Genetic drift creates more differentiation The `R` commands below define a function called `FstSim`, which tracks $F_{ST}(t)$ between the ancestral population and the populuation at time $t$. This functions reuses the `generateNextGeneration` function defined above. Input parameters for that function are size of the ancestral population ($N$), the allele frequency ($p$) in the ancestral population, the number $m$ of SNPs segregating in the ancestral population, the number $g$ of generations, the grow rate $r$ (default $r=0$, i.e. the population does not grow over time), and $k$ an index for the replicate. **Run the following command to define the function.** ```{R, eval=TRUE} FstSim <- function(N=100, # sample size p=0.5, # allele frequency in the ancestral population m=100, # number of markers g=500, # number of generations r=0.0, # growth rate (r=0, no growth; r=0.01: 1% growth per generation) k=0, # replicate index plotIt=TRUE){ ## Simulate base population x <- do.call("cbind",lapply(1:m,function(j) rbinom(N,size=2,prob=p))) alleleFreq <- matrix(NA,nrow=g+1,ncol=m) alleleFreq[1,] <- colMeans(x)/2 n <- N for(gen in 1:g){ n <- round(n*(1+r)) x <- generateNextGeneration(x,n) alleleFreq[gen+1,] <- colMeans(x)/2 } cat(paste0("\tReplicate #",k,"\n")) fst <- apply(alleleFreq,1,function(x) mean((x-p)^2) / (p*(1-p)) ) return(fst) } ``` *Question 2a. Under the Wright-Fisher model, we can predict that $F_{ST}(t) \approx 1-e^{-t/(2N)}$. Run the following commands to verify how well this prediction works. If $N=100$, then how many years do we have to wait for $F_{ST}(t)$ to reach 0.1 ? What if $N=10,000$? (We assume that 1 generation $\approx$ 25 years).* ```{R, eval=FALSE} r <- 0.0 # growth rate of population size m <- 100 # number of SNPs p0 <- 0.5 # allele frequency in the ancestral population nG <- 500 # number of Generations N <- 100 # size of the ancestral population dt <- 0:nG # Time (in number of generations) # Run simulation Fst <- FstSim(N=N,p=p0,m=m,g=nG,r=r,k=0) plot(dt,Fst,pch=19,cex=0.5,ylim=c(0,1),cex.lab=1.0, xlab="Number of generations",ylab="Fst between Current and Ancestral population", axes=FALSE);axis(1);axis(2) ExpectedFst <- sapply(dt,function(t) 1-exp(-t/(2*N))) lines(dt,ExpectedFst,col=2,lwd=2) legend(0.5*nG,0.2,legend=c("Theoretical Expectation"),col=c(2,4),box.lty=0,lwd=2) ``` *Question 2b. Change the growth rate of the population to $r=0.01$ (1%). What do you see?* **[WARNING: Setting growth rate beyond 1% will lead to extremely slow runs].** *Question 2c. Change the growth rate of the population to $r=-0.05$ (-5%). What do you see?*