This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
homework_6_ancestry [2015/11/13 17:50] scott |
homework_6_ancestry [2015/11/16 10:18] scott |
||
---|---|---|---|
Line 1: | Line 1: | ||
+ | |||
+ | ==== Notes ==== | ||
+ | |||
+ | |||
+ | |||
+ | ==== Code for Homework ==== | ||
+ | |||
# PSYC 7102 -- Statistical Genetics | # PSYC 7102 -- Statistical Genetics | ||
Line 21: | Line 28: | ||
### ONLY EXPLAIN COMMANDS WHERE I SPECIFICALLY REQUEST IT! YOU DO NOT | ### ONLY EXPLAIN COMMANDS WHERE I SPECIFICALLY REQUEST IT! YOU DO NOT | ||
- | ### HAVE TO EXPLAIN EVERY COMMAND! | + | ### HAVE TO EXPLAIN EVERY COMMAND! |
+ | ### in the end a PCA plot containing yourself compared to all 1000 Genomes | ||
+ | ### samples. | ||
# For many questions you'll want to run analyses by chromosome. To do | # For many questions you'll want to run analyses by chromosome. To do | ||
Line 28: | Line 37: | ||
qsub -I -l walltime=23: | qsub -I -l walltime=23: | ||
+ | ### Load apigenome, plink | ||
+ | module load apigenome_0.0.2 | ||
+ | module load plink_latest | ||
+ | module load tabix_0.2.6 | ||
Line 70: | Line 83: | ||
### Add in rsIDs from dbSNP. PLINK needs these to reconcile | ### Add in rsIDs from dbSNP. PLINK needs these to reconcile | ||
### positions/ | ### positions/ | ||
- | module load apigenome_0.0.2 | + | vcf-add-rsid -vcf chrALL.filtered.PASS.beagled.HG00096.vcf.gz -db / |
- | vcf-add-rsid -vcf chrALL.filtered.PASS.beagled.HG00096.vcf.gz -db / | + | |
### The previous command keeps only variants with rsIDs, otherwise | ### The previous command keeps only variants with rsIDs, otherwise | ||
### plink throws an error that there are >1 variants with ID = " | ### plink throws an error that there are >1 variants with ID = " | ||
Line 94: | Line 106: | ||
###------ QUESTION 2: WHAT DOES THIS COMMAND (THE ENTIRE FOR LOOP) DO? (2 points) | ###------ QUESTION 2: WHAT DOES THIS COMMAND (THE ENTIRE FOR LOOP) DO? (2 points) | ||
for i in {1..22}; do | for i in {1..22}; do | ||
- | zgrep -E ' | + | zgrep -E ' |
done | done | ||
### Retain in the 1000 Genomes VCF only your SNPs that are also fairly common | ### Retain in the 1000 Genomes VCF only your SNPs that are also fairly common | ||
- | ### | + | ### because we're going to conduct PCA on these SNPs and only want common ones. |
- | ###------ QUESTION 3: WHY WOULD WE REMOVE | + | ### |
+ | ###------ QUESTION 3: WHY WOULD WE RETAIN ONLY COMMON SNPS, OTHER THAN IT | ||
### | ### | ||
for i in {1..22}; do | for i in {1..22}; do | ||
Line 242: | Line 255: | ||
R | R | ||
### Then in this R session run: ' | ### Then in this R session run: ' | ||
- | ### If you log out and back in, you'll have to again run this command: | + | ### If you log out and back in, you'll have to again run this command |
export R_LIBS=/ | export R_LIBS=/ | ||
Line 259: | Line 272: | ||
### ' | ### ' | ||
- | kg_sf <- read.table(' | + | kg_sf <- read.table(' |
sample_ids <- unique(data.frame(IID=kg_sf$SAMPLE_NAME, | sample_ids <- unique(data.frame(IID=kg_sf$SAMPLE_NAME, |