User Tools

Site Tools


lab_2

This is an old revision of the document!


Genotype file

Example commands

### This file is zipped (to save hard drive space. ### We'll need slightly different commands to peak at it. less -S hu916767_20170324191934.1kgALTAllele.withHeader.snpEff.vcf

### Wow, that's a lot of information compared to our sparse 23andMe file. ### Let's do some counts

### How many variants are there? wc -l hu916767_20170324191934.1kgALTallele.withHeader.snpEff.vcf

### Let's try to find variants that can lead to stop gains grep –color 'stop_gained' hu916767_20170324191934.1kgALTallele.withHeader.snpEff.vcf

### How many did 23andMe genotype? grep 'stop_gained' hu916767_20170324191934.1kgALTallele.withHeader.snpEff.vcf | wc -l

### How many missense variants are there? grep 'missense_variant' hu916767_20170324191934.1kgALTallele.withHeader.snpEff.vcf | wc -l

### Are there any missense variants in our alcohol metabolism genes? grep 'missense_variant' hu916767_20170324191934.1kgALTallele.withHeader.snpEff.vcf | grep 'ADH1B\|ADH1C\|ALDH2'

### What about phenylketonuria? grep 'PAH' hu916767_20170324191934.1kgALTallele.withHeader.snpEff.vcf | grep 'stop\|missense'

Tuesday, Apr 18, in class exercise

Pick either your favorite gene or favorite phenotype. If the latter, pick a gene associated with that phenotype. Search the 23andMe file to discover:

  1. The number of variants annotated against that gene.
  2. The number of variants predicted to have an effect on gene function
  3. For all variants in #2, search clinvar to see if any are known to affect the phenotype.

What I've done to prepare the vcf file

Feel free to ignore this section – it's only here to document what I've done for my own future reference, and for any interested student.

1. 23andMe format was converted to vcf format.

./bcftools/bcftools view hu916767_20170324191934.bcf -O vcf | bgzip -c > hu916767_20170324191934.vcf.gz

2. I merged sites with 1000 genomes in order to get reference and alternate alleles at each site.

library(data.table) options(stringsAsFactors=F) kg ← fread(“chrALL.vcf”, header=F) kg$V1 ← as.character(kg$V1) head(kg) hu23andme ← fread(“zgrep -v '#' hu916767_20170324191934.vcf.gz”, header=F) dat ← merge(kg, hu23andme, by=c(“V1”, “V2”, “V3”, “V4”), all.y=T) dat$V5 ← ifelse(dat$V5.y == “.”, dat$V5.x, dat$V5.y) dat2 ← dat[,c(1,2,3,4,12,7:11)] dat2[is.na(dat2)] ← “.” write.table(dat2, file=“hu916767_20170324191934.1kgALTallele.vcf”, col.names=F, row.names=F, quote=F, sep=“\t”)

### Add the vcf header back in (zgrep '#' hu916767_20170324191934.vcf.gz ; cat hu916767_20170324191934.1kgALTallele.vcf ) | bgzip -c > hu916767_20170324191934.1kgALTallele.withHeader.vcf.gz

3. I annotated the resulting file with snpEff (http://snpeff.sourceforge.net/SnpEff_manual.html#run)

java -jar snpEff/snpEff.jar -v GRCh37.75 hu916767_20170324191934.1kgALTallele.withHeader.vcf.gz | bgzip -c > hu916767_20170324191934.1kgALTallele.withHeader.snpEff.vcf.gz

lab_2.1492541902.txt.gz · Last modified: 2017/04/18 12:58 by scott