# PSYC 7102 – Statistical Genetics # Homework #7: Functional Annotation # Due: December 10th
# Overview: In this homework I ask you to annotate genes in your genome # and interpret the consequences of a few annotations.
# Try not to frequently run commands on the head node, use an interactive # session by running the following command: qsub -I -l walltime=23:00:00 -q short -l mem=2gb -l nodes=1:ppn=2
# Some modules you'll need module load tabix_0.2.6 module load anno_1.0
# I encourage you to annotate your own genome, but for purposes of this # homework I ask that you conduct all analyses of this file, which contains # a single person from 1000 Genomes. Please copy both files to a directory # of your choice.
/Users/scvr9332/annotation/chrALL.filtered.PASS.beagled.HG00096.rsID.vcf.gz /Users/scvr9332/annotation/chrALL.filtered.PASS.beagled.HG00096.rsID.vcf.gz.tbi
############### ### STEP 1 ### ############### # Please run the following command to annotate your genome. anno -i chrALL.filtered.PASS.beagled.HG00096.rsID.vcf.gz \
### Take a look at all the output files (especially the top.anno.frq ### file). ### ###———————————-### ###—-QUESTION 1, a-b (2 points)—-### ###———————————-### ### a) How many stop gain variants are there? ### b) How many UTR5 variants?
############## ### STEP 2 ### ############## ### Recall from the 1000 Genomes paper that stop gain variants tend to ### be rare. Let's compare their frequency to that of synonymous ### variants.
zgrep '^#\|Stop_Gain' chrALL.filtered.PASS.beagled.HG00096.rsID.anno.vcf.gz | bgzip -c > stop_gains.vcf.gz zgrep '^#\|Synonymous' chrALL.filtered.PASS.beagled.HG00096.rsID.anno.vcf.gz | bgzip -c > synonymous.vcf.gz
### The following command tells me at how many sites this individual ### has a heterozygous stop gain allele: zgrep '\s0|1:\|\s1|0:' stop_gains.vcf.gz | wc -l
###———————————-### ###—-QUESTION 2, a-e (5 points)—-### ###———————————-### ### Modify the zgrep command immediately above to answer: ### a) At how many sites does this individual have a heterozygous synonymous variant? ### b) At how many sites does this individual have a homozygous stop gain? ### ### Now Take the first homozygous stop gain. ### c) What gene does this variant affect? ### d) How many transcripts of this gene does this variant affect? ### e) What other functional consequences does this variant have in ### this gene? (e.g., splice site)
############## ### STEP 3 ### ############## ### Spencer's favorite gene is TLR4. Let's take a look. tabix chrALL.filtered.PASS.beagled.HG00096.rsID.anno.vcf.gz 9:12000000-121000000 | grep 'TLR4' | less -S
###———————————-### ###—-QUESTION 3, a-d (4 points)—-### ###———————————-### ### a) How many variants are annotated to lie within some component of TLR4? ### b) How many are stop gain or essential splice site? ### c) Look at rs78848399. Descibe the amino acid change that occurs ### as a result of having an alternate allele at this site.
### Now use a tabix command to extract variant rs564925636. ### d) Describe the amino acid change that occurs as a result of ### having the alternate allele at this site. How does strand impact ### your answer to this question compared to your answer to c)?