homework_6_ancestry
Differences
This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| homework_6_ancestry [2015/11/14 17:16] – scott | homework_6_ancestry [2015/11/16 17:18] (current) – scott | ||
|---|---|---|---|
| Line 2: | Line 2: | ||
| ==== Notes ==== | ==== Notes ==== | ||
| - | 11/14/2015 N.B.: Chelsie noted that the < | ||
| Line 29: | Line 28: | ||
| ### ONLY EXPLAIN COMMANDS WHERE I SPECIFICALLY REQUEST IT! YOU DO NOT | ### ONLY EXPLAIN COMMANDS WHERE I SPECIFICALLY REQUEST IT! YOU DO NOT | ||
| - | ### HAVE TO EXPLAIN EVERY COMMAND! | + | ### HAVE TO EXPLAIN EVERY COMMAND! |
| + | ### in the end a PCA plot containing yourself compared to all 1000 Genomes | ||
| + | ### samples. | ||
| # For many questions you'll want to run analyses by chromosome. To do | # For many questions you'll want to run analyses by chromosome. To do | ||
| Line 39: | Line 40: | ||
| module load apigenome_0.0.2 | module load apigenome_0.0.2 | ||
| module load plink_latest | module load plink_latest | ||
| + | module load tabix_0.2.6 | ||
| Line 81: | Line 83: | ||
| ### Add in rsIDs from dbSNP. PLINK needs these to reconcile | ### Add in rsIDs from dbSNP. PLINK needs these to reconcile | ||
| ### positions/ | ### positions/ | ||
| - | vcf-add-rsid -vcf chrALL.filtered.PASS.beagled.HG00096.vcf.gz -db / | + | vcf-add-rsid -vcf chrALL.filtered.PASS.beagled.HG00096.vcf.gz -db / |
| ### The previous command keeps only variants with rsIDs, otherwise | ### The previous command keeps only variants with rsIDs, otherwise | ||
| ### plink throws an error that there are >1 variants with ID = " | ### plink throws an error that there are >1 variants with ID = " | ||
| Line 108: | Line 110: | ||
| ### Retain in the 1000 Genomes VCF only your SNPs that are also fairly common | ### Retain in the 1000 Genomes VCF only your SNPs that are also fairly common | ||
| - | ### | + | ### because we're going to conduct PCA on these SNPs and only want common ones. |
| - | ###------ QUESTION 3: WHY WOULD WE REMOVE | + | ### |
| + | ###------ QUESTION 3: WHY WOULD WE RETAIN ONLY COMMON SNPS, OTHER THAN IT | ||
| ### | ### | ||
| for i in {1..22}; do | for i in {1..22}; do | ||
| Line 269: | Line 272: | ||
| ### ' | ### ' | ||
| - | kg_sf <- read.table('/ | + | kg_sf <- read.table('/ |
| sample_ids <- unique(data.frame(IID=kg_sf$SAMPLE_NAME, | sample_ids <- unique(data.frame(IID=kg_sf$SAMPLE_NAME, | ||
homework_6_ancestry.1447521397.txt.gz · Last modified: by scott
