User Tools

Site Tools


homework_6_ancestry

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Last revision Both sides next revision
homework_6_ancestry [2015/11/14 11:56]
scott /* Notes */
homework_6_ancestry [2015/11/15 18:17]
scott /* Code for Homework */
Line 28: Line 28:
  
 ### ONLY EXPLAIN COMMANDS WHERE I SPECIFICALLY REQUEST IT! YOU DO NOT ### ONLY EXPLAIN COMMANDS WHERE I SPECIFICALLY REQUEST IT! YOU DO NOT
-### HAVE TO EXPLAIN EVERY COMMAND!+### HAVE TO EXPLAIN EVERY COMMAND! But please run all commands to produce  
 +### in the end a PCA plot containing yourself compared to all 1000 Genomes  
 +### samples.
  
 # For many questions you'll want to run analyses by chromosome. To do # For many questions you'll want to run analyses by chromosome. To do
Line 80: Line 82:
 ### Add in rsIDs from dbSNP. PLINK needs these to reconcile ### Add in rsIDs from dbSNP. PLINK needs these to reconcile
 ### positions/alleles ### positions/alleles
-vcf-add-rsid -vcf chrALL.filtered.PASS.beagled.HG00096.vcf.gz -db /Users/scvr9332/reference_data/dbsnp.144.b37.vcf.gz | bgzip -c > chrALL.filtered.PASS.beagled.HG00096.rsID.vcf.gz+vcf-add-rsid -vcf chrALL.filtered.PASS.beagled.HG00096.vcf.gz -db /Users/scvr9332/reference_data/dbsnp_144/dbsnp.144.b37.vcf.gz | bgzip -c > chrALL.filtered.PASS.beagled.HG00096.rsID.vcf.gz
 ### The previous command keeps only variants with rsIDs, otherwise ### The previous command keeps only variants with rsIDs, otherwise
 ### plink throws an error that there are >1 variants with ID = "." ### plink throws an error that there are >1 variants with ID = "."
Line 107: Line 109:
  
 ### Retain in the 1000 Genomes VCF only your SNPs that are also fairly common ### Retain in the 1000 Genomes VCF only your SNPs that are also fairly common
-### +### because we're going to conduct PCA on these SNPs and only want common ones. 
-###------ QUESTION 3: WHY WOULD WE REMOVE COMMON SNPS, OTHER THAN IT+###  
 +###------ QUESTION 3: WHY WOULD WE RETAIN ONLY COMMON SNPS, OTHER THAN IT
 ###------             MAKES EVERY COMMAND LATER FASTER? (2 points) ###------             MAKES EVERY COMMAND LATER FASTER? (2 points)
 for i in {1..22}; do for i in {1..22}; do
Line 268: Line 271:
  
 ### 'Self-reported' ancestry of 1000g participants ### 'Self-reported' ancestry of 1000g participants
-kg_sf <- read.table('/Users/scvr9332/20130502.sequence.index', header=T, sep='\t', fill=T, stringsAsFactors=F)+kg_sf <- read.table('/Users/scvr9332/PCA/20130502.sequence.index', header=T, sep='\t', fill=T, stringsAsFactors=F)
  
 sample_ids <- unique(data.frame(IID=kg_sf$SAMPLE_NAME, POPULATION=kg_sf$POPULATION, stringsAsFactors=F)) sample_ids <- unique(data.frame(IID=kg_sf$SAMPLE_NAME, POPULATION=kg_sf$POPULATION, stringsAsFactors=F))
homework_6_ancestry.txt · Last modified: 2015/11/16 10:18 by scott