**This practical runs on the workshop
RStudio server.**

First login to RStudio

`https://workshop.colorado.edu/rstudio/`

Copy the Practical document into your home directory using the
following `R`

commands.

```
setwd("~/.")
system("mkdir Day2")
system("cp /home/loic/2023/Practical_PopGen.html ~/Day2/.")
setwd("Day2")
```

The `R`

commands below define two functions: (1:
`generateNextGeneration()`

) generates genotypes for the next
generation from a set of genotypes in the current population, and (2:
`sim`

), which uses (1) to simulate the evolution over \(g\) generations of a population with a
constant sample size (\(N\)), a certain
number \(m\) of SNPs segregating in the
founding (or ancestral) population with a frequency (\(p\)). The `sim`

function then
plot the frequency of each of the `m`

alleles over the course
of `g`

generations (each curve
corresponds to a SNP).

```
generateNextGeneration <- function(x,n){
m <- ncol(x)
N <- nrow(x)
iM <- sample(1:N,n,replace = TRUE)
iF <- sample(1:N,n,replace = TRUE)
xm <- x[iM,]
xf <- x[iF,]
xa <- t(sapply(1:n,function(k) rbinom(m,1,prob=0.5*xm[k,])))
xb <- t(sapply(1:n,function(k) rbinom(m,1,prob=0.5*xf[k,])))
xn <- xa + xb
return(xn)
}
sim <- function(N, # sample size
p, # allele frequency in the founding population
m, # number of marker
g){ # number of generations
## Simulate genotypes in the founding population
x <- do.call("cbind",lapply(1:m,function(j) rbinom(N,size=2,prob=p)))
alleleFreq <- matrix(NA,nrow=g+1,ncol=m)
alleleFreq[1,] <- colMeans(x)/2
for(gen in 1:g){
# cat(paste0("\tGen #",gen,"\n"))
x <- generateNextGeneration(x,N)
alleleFreq[gen+1,] <- colMeans(x)/2
}
par(mar=c(5,5,3,2))
matplot(1:(1+g),alleleFreq,type="l",lty=1,axes=FALSE,
main=paste0("Size of the population: N = ",N),
xlab="Number of generations",ylab="Allele frequencies",
col=sample(colors(),m),log="x",cex.lab=1.2)
axis(1);axis(2)
abline(h=p,col="grey")
return(alleleFreq)
}
```

**Execute these two functions in your R terminal
then run the following command.**

`t1 <- system.time( m50 <- sim(N=50 ,p=0.05,m=20,g=1000) )`

*Question 1a. What is the allele
frequency after 1000 generations?*

*Question 1b. Do you see the same
patterns if you run this command multiple times?*

*Question 1c. Try different sizes for
the ancestral population (e.g., \(N=100\), or \(N=200\)). How does it affect the frequency
trajectories?*

*Question 1d. Try different allele
frequency in the ancestral population (e.g., \(p=0.1\), or \(p=0.01\)). How does it affect the frequency
trajectories?*

*Question 1e. Run the following
command. What can you say about the effect of the ancestral population
size on the probability of fixation?*

```
t1 <- system.time( m50 <- sim(N=50 ,p=0.05,m=20,g=1000) )
t2 <- system.time( m100 <- sim(N=100 ,p=0.5,m=20,g=1000) )
t3 <- system.time( m200 <- sim(N=200 ,p=0.5,m=20,g=1000) )
```

The objects `m50`

, `m100`

and `m200`

contains the frequency trajectories of three simulated populations with
sizes 50, 100 and 200 respectively. The following `R`

commands count and visualize the proportion of fixed alleles over 1000
generations.

**Run the following command. What can you can**

```
f50 <- apply(m50*(1-m50)==0 ,1,mean)
f100 <- apply(m100*(1-m100)==0,1,mean)
f200 <- apply(m200*(1-m200)==0,1,mean)
par(mar=c(5,5,3,2))
matplot(cbind(f50,f100,f200),type="l",lwd=2,lty=1,
xlab="Number of generations",axes=FALSE,cex.lab=2,
ylab="Fraction of fixed allele frequencies")
axis(1);axis(2)
legend(800,0.25,legend=c("N=50","N=100","N=200"),
box.lty=0,fill=1:3,horiz=FALSE,border=0,cex=1)
```

The `R`

commands below define a function called
`FstSim`

, which tracks \(F_{ST}(t)\) between the ancestral
population and the populuation at time \(t\). This functions reuses the
`generateNextGeneration`

function defined above. Input
parameters for that function are size of the ancestral population (\(N\)), the allele frequency (\(p\)) in the ancestral population, the
number \(m\) of SNPs segregating in the
ancestral population, the number \(g\)
of generations, the grow rate \(r\)
(default \(r=0\), i.e.ย the population
does not grow over time), and \(k\) an
index for the replicate.

**Run the following command to define the
function.**

```
FstSim <- function(N=100, # sample size
p=0.5, # allele frequency in the ancestral population
m=100, # number of markers
g=500, # number of generations
r=0.0, # growth rate (r=0, no growth; r=0.01: 1% growth per generation)
k=0, # replicate index
plotIt=TRUE){
## Simulate base population
x <- do.call("cbind",lapply(1:m,function(j) rbinom(N,size=2,prob=p)))
alleleFreq <- matrix(NA,nrow=g+1,ncol=m)
alleleFreq[1,] <- colMeans(x)/2
n <- N
for(gen in 1:g){
n <- round(n*(1+r))
x <- generateNextGeneration(x,n)
alleleFreq[gen+1,] <- colMeans(x)/2
}
cat(paste0("\tReplicate #",k,"\n"))
fst <- apply(alleleFreq,1,function(x) mean((x-p)^2) / (p*(1-p)) )
return(fst)
}
```

*Question 2a. Under the Wright-Fisher
model, we can predict that \(F_{ST}(t) \approx
1-e^{-t/(2N)}\). Run the following commands to verify how well
this prediction works. If \(N=100\),
then how many years do we have to wait for \(F_{ST}(t)\) to reach 0.1 ? What if \(N=10,000\)? (We assume that 1 generation
\(\approx\) 25 years).*

```
r <- 0.0 # growth rate of population size
m <- 100 # number of SNPs
p0 <- 0.5 # allele frequency in the ancestral population
nG <- 500 # number of Generations
N <- 100 # size of the ancestral population
dt <- 0:nG # Time (in number of generations)
# Run simulation
Fst <- FstSim(N=N,p=p0,m=m,g=nG,r=r,k=0)
plot(dt,Fst,pch=19,cex=0.5,ylim=c(0,1),cex.lab=1.0,
xlab="Number of generations",ylab="Fst between Current and Ancestral population",
axes=FALSE);axis(1);axis(2)
ExpectedFst <- sapply(dt,function(t) 1-exp(-t/(2*N)))
lines(dt,ExpectedFst,col=2,lwd=2)
legend(0.5*nG,0.2,legend=c("Theoretical Expectation"),col=c(2,4),box.lty=0,lwd=2)
```

*Question 2b. Change the growth rate of
the population to \(r=0.01\) (1%). What
do you see?* **[WARNING: Setting
growth rate beyond 1% will lead to extremely slow runs].**

*Question 2c. Change the growth rate of
the population to \(r=-0.05\) (-5%).
What do you see?*