Differences

This shows you the differences between two versions of the page.

--- homework_6_ancestry [2015/11/15 23:13] – /* Code for Homework */ scott
+++ homework_6_ancestry [2015/11/16 17:18] (current) – scott
@@ Line 40: / Line 40: @@
 module load apigenome_0.0.2
 module load plink_latest
+module load tabix_0.2.6
@@ Line 109: / Line 110: @@
 ### Retain in the 1000 Genomes VCF only your SNPs that are also fairly common
-###
+### because we're going to conduct PCA on these SNPs and only want common ones.
-###------ QUESTION 3: WHY WOULD WE REMOVE COMMON SNPS, OTHER THAN IT
+###
+###------ QUESTION 3: WHY WOULD WE RETAIN ONLY COMMON SNPS, OTHER THAN IT
 ###------             MAKES EVERY COMMAND LATER FASTER? (2 points)
 for i in {1..22}; do
@@ Line 270: / Line 272: @@
 ### 'Self-reported' ancestry of 1000g participants
-kg_sf <- read.table('/Users/scvr9332/20130502.sequence.index', header=T, sep='\t', fill=T, stringsAsFactors=F)
+kg_sf <- read.table('/Users/scvr9332/PCA/20130502.sequence.index', header=T, sep='\t', fill=T, stringsAsFactors=F)
 sample_ids <- unique(data.frame(IID=kg_sf$SAMPLE_NAME, POPULATION=kg_sf$POPULATION, stringsAsFactors=F))