User Tools

Site Tools


homework_7_functional_annotation

Notes

Code for Homework

# PSYC 7102 – Statistical Genetics # Homework #7: Functional Annotation # Due: December 10th

# Overview: In this homework I ask you to annotate genes in your genome # and interpret the consequences of a few annotations.

# Try not to frequently run commands on the head node, use an interactive # session by running the following command: qsub -I -l walltime=23:00:00 -q short -l mem=2gb -l nodes=1:ppn=2

# Some modules you'll need module load tabix_0.2.6 module load anno_1.0

# I encourage you to annotate your own genome, but for purposes of this # homework I ask that you conduct all analyses of this file, which contains # a single person from 1000 Genomes. Please copy both files to a directory # of your choice.

/Users/scvr9332/annotation/chrALL.filtered.PASS.beagled.HG00096.rsID.vcf.gz /Users/scvr9332/annotation/chrALL.filtered.PASS.beagled.HG00096.rsID.vcf.gz.tbi

############### ### STEP 1 ### ############### # Please run the following command to annotate your genome. anno -i chrALL.filtered.PASS.beagled.HG00096.rsID.vcf.gz \

  1. o chrALL.filtered.PASS.beagled.HG00096.rsID.anno.vcf.gz \
  2. r /Users/scvr9332/reference_data/gotcloud.ref/hs37d5.fa \
  3. g /Users/scvr9332/reference_data/anno/refFlat_hg19.txt.gz \
  4. p /Users/scvr9332/reference_data/anno/priority.txt \
  5. c /Users/scvr9332/reference_data/anno/codon.txt

### Take a look at all the output files (especially the top.anno.frq ### file). ### ###———————————-### ###—-QUESTION 1, a-b (2 points)—-### ###———————————-### ### a) How many stop gain variants are there? ### b) How many UTR5 variants?

############## ### STEP 2 ### ############## ### Recall from the 1000 Genomes paper that stop gain variants tend to ### be rare. Let's compare their frequency to that of synonymous ### variants.

zgrep '^#\|Stop_Gain' chrALL.filtered.PASS.beagled.HG00096.rsID.anno.vcf.gz | bgzip -c > stop_gains.vcf.gz zgrep '^#\|Synonymous' chrALL.filtered.PASS.beagled.HG00096.rsID.anno.vcf.gz | bgzip -c > synonymous.vcf.gz

### The following command tells me at how many sites this individual ### has a heterozygous stop gain allele: zgrep '\s0|1:\|\s1|0:' stop_gains.vcf.gz | wc -l

###———————————-### ###—-QUESTION 2, a-e (5 points)—-### ###———————————-### ### Modify the zgrep command immediately above to answer: ### a) At how many sites does this individual have a heterozygous synonymous variant? ### b) At how many sites does this individual have a homozygous stop gain? ### ### Now Take the first homozygous stop gain. ### c) What gene does this variant affect? ### d) How many transcripts of this gene does this variant affect? ### e) What other functional consequences does this variant have in ### this gene? (e.g., splice site)

############## ### STEP 3 ### ############## ### Spencer's favorite gene is TLR4. Let's take a look. tabix chrALL.filtered.PASS.beagled.HG00096.rsID.anno.vcf.gz 9:12000000-121000000 | grep 'TLR4' | less -S

###———————————-### ###—-QUESTION 3, a-d (4 points)—-### ###———————————-### ### a) How many variants are annotated to lie within some component of TLR4? ### b) How many are stop gain or essential splice site? ### c) Look at rs78848399. Descibe the amino acid change that occurs ### as a result of having an alternate allele at this site.

### Now use a tabix command to extract variant rs564925636. ### d) Describe the amino acid change that occurs as a result of ### having the alternate allele at this site. How does strand impact ### your answer to this question compared to your answer to c)?

homework_7_functional_annotation.txt · Last modified: 2015/12/03 14:22 by scott