This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Last revision Both sides next revision | ||
lab_2 [2017/04/19 08:38] scott /* What I've done to prepare the vcf file */ |
lab_2 [2017/04/30 22:20] scott /* Lab assignment 2 */ |
||
---|---|---|---|
Line 44: | Line 44: | ||
For another example, take rs6681049. The REF allele is T and the ALT is C. The genotype is 1/1. That means that one chromosome of this individual carries 1 ALT allele (i.e., a C) and the other chromosome also carries 1 ALT allele (i.e., a C). So the genotype for this individual at that site is C/C. | For another example, take rs6681049. The REF allele is T and the ALT is C. The genotype is 1/1. That means that one chromosome of this individual carries 1 ALT allele (i.e., a C) and the other chromosome also carries 1 ALT allele (i.e., a C). So the genotype for this individual at that site is C/C. | ||
+ | |||
+ | |||
+ | ====== Lab assignment 2 ====== | ||
+ | |||
+ | |||
+ | ### Lab 2 assignment | ||
+ | ### Assigned: 4/20/2017 | ||
+ | ### Due: 4/27/2017 at the beginning of class. Late assignments (even by 5 minutes) | ||
+ | ### will not be accepted! | ||
+ | ### | ||
+ | ### Note: all questions should be answered with respect to the | ||
+ | ### | ||
+ | |||
+ | ### Question 1 (4 points) | ||
+ | ### a) Extract a variant from the vcf and show me the command you used | ||
+ | ### and the output of the command. Tell me what the individual' | ||
+ | ### genotype is at this site. | ||
+ | |||
+ | ### Question 2 (4 points) | ||
+ | ### How many variants did 23andMe genotype in exons; that is, in protein coding sequences. | ||
+ | ### Show me the commands you used to figure this out. | ||
+ | |||
+ | ### Question 3 (8 points) | ||
+ | ### a) Is this individual likely to be lactose intolerant? Show me the | ||
+ | ### steps you used to figure this out. | ||
+ | ### b) Pick one of the variants you used to determine lactose | ||
+ | ### intolerance. What is the geographical distribution of this | ||
+ | ### variant' | ||
+ | |||
+ | |||
+ | Example full credit answers | ||
+ | |||
+ | 1. Most of you got this one right. The most common mistake was to include too much information and too many steps (although that generally did not cost you any points). | ||
+ | |||
+ | zgrep -w ' | ||
+ | |||
+ | | ||
+ | |||
+ | This individuals has 0 alternate alleles, so their genotype is G/G. Two reference alleles. | ||
+ | |||
+ | 2. There are multiple ways to answer this. One of the most straightforward is as follows, although we could quibble over whether I should have included any splicing variants. | ||
+ | |||
+ | zgrep ' | ||
+ | 52772 | ||
+ | |||
+ | |||
+ | 3. Coming soon | ||
Line 92: | Line 139: | ||
1. 23andMe format was converted to vcf format. | 1. 23andMe format was converted to vcf format. | ||
- | bcftools convert --tsv2vcf hu916767_20170324191934.txt -f ~/Desktop/human_g1k_v37.fasta.gz -s hu916767_20170324191934 -Ob -o hu916767_20170324191934.bcf | + | bcftools convert --tsv2vcf hu916767_20170324191934.txt -f human_g1k_v37.fasta.gz -s hu916767_20170324191934 -Ob -o hu916767_20170324191934.bcf |
bcftools view hu916767_20170324191934.bcf -O vcf | bgzip -c > hu916767_20170324191934.vcf.gz | bcftools view hu916767_20170324191934.bcf -O vcf | bgzip -c > hu916767_20170324191934.vcf.gz | ||