User Tools

Site Tools


lab_2

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Last revision Both sides next revision
lab_2 [2017/04/19 08:38]
scott /* What I've done to prepare the vcf file */
lab_2 [2017/04/30 22:20]
scott /* Lab assignment 2 */
Line 44: Line 44:
  
 For another example, take rs6681049. The REF allele is T and the ALT is C. The genotype is 1/1. That means that one chromosome of this individual carries 1 ALT allele (i.e., a C) and the other chromosome also carries 1 ALT allele (i.e., a C). So the genotype for this individual at that site is C/C. For another example, take rs6681049. The REF allele is T and the ALT is C. The genotype is 1/1. That means that one chromosome of this individual carries 1 ALT allele (i.e., a C) and the other chromosome also carries 1 ALT allele (i.e., a C). So the genotype for this individual at that site is C/C.
 +
 +
 +====== Lab assignment 2 ======
 +
 +
 +### Lab 2 assignment
 +### Assigned: 4/20/2017
 +### Due: 4/27/2017 at the beginning of class. Late assignments (even by 5 minutes)
 +###      will not be accepted!
 +###
 +### Note: all questions should be answered with respect to the
 +###       genotypes from hu916767_20170324191934.1kgALTallele.withHeader.snpEff.vcf.gz
 +
 +### Question 1 (4 points)
 +### a) Extract a variant from the vcf and show me the command you used
 +### and the output of the command. Tell me what the individual's
 +### genotype is at this site.
 +
 +### Question 2 (4 points)
 +### How many variants did 23andMe genotype in exons; that is, in protein coding sequences.
 +### Show me the commands you used to figure this out.
 +
 +### Question 3 (8 points)
 +### a) Is this individual likely to be lactose intolerant? Show me the
 +### steps you used to figure this out.
 +### b) Pick one of the variants you used to determine lactose
 +### intolerance. What is the geographical distribution of this
 +### variant's allele frequency?
 +
 +
 +Example full credit answers
 +
 +1. Most of you got this one right. The most common mistake was to include too much information and too many steps (although that generally did not cost you any points).
 +
 +zgrep -w 'rs671' hu916767_20170324191934.1kgALTallele.withHeader.snpEff.vcf.gz
 +
 + 12 112241766 rs671 G A . . ANN=A|missense_variant|MODERATE|ALDH2|ENSG00000111275|transcript|ENST00000261733|protein_coding|12/13|c.1510G>A|p.Glu504Lys|1571/2018|1510/1554|504/517||,A|missense_variant|MODERATE|ALDH2|ENSG00000111275|transcript|ENST00000416293|protein_coding|11/12|c.1369G>A|p.Glu457Lys|1465/1572|1369/1413|457/470||,A|3_prime_UTR_variant|MODIFIER|ALDH2|ENSG00000111275|transcript|ENST00000548536|nonsense_mediated_decay|13/14|c.*1386G>A|||||22035|,A|3_prime_UTR_variant|MODIFIER|ALDH2|ENSG00000111275|transcript|ENST00000549106|nonsense_mediated_decay|3/4|c.*89G>A|||||89|WARNING_TRANSCRIPT_NO_START_CODON GT 0/0
 +
 +This individuals has 0 alternate alleles, so their genotype is G/G. Two reference alleles.
 +
 +2. There are multiple ways to answer this. One of the most straightforward is as follows, although we could quibble over whether I should have included any splicing variants.
 +
 +zgrep 'synonymous\|missense\|start_gain\|start_lost\|stop_gain\|stop_lost\|3_prime_UTR_variant\|5_prime_UTR_variant' hu916767_20170324191934.1kgALTallele.withHeader.snpEff.vcf.gz | wc -l
 +   52772
 +
 +
 +3. Coming soon
  
  
Line 92: Line 139:
 1. 23andMe format was converted to vcf format. 1. 23andMe format was converted to vcf format.
  
-bcftools convert --tsv2vcf hu916767_20170324191934.txt -f ~/Desktop/human_g1k_v37.fasta.gz -s hu916767_20170324191934 -Ob -o hu916767_20170324191934.bcf+bcftools convert --tsv2vcf hu916767_20170324191934.txt -f human_g1k_v37.fasta.gz -s hu916767_20170324191934 -Ob -o hu916767_20170324191934.bcf
 bcftools view hu916767_20170324191934.bcf -O vcf | bgzip -c > hu916767_20170324191934.vcf.gz bcftools view hu916767_20170324191934.bcf -O vcf | bgzip -c > hu916767_20170324191934.vcf.gz
  
lab_2.txt · Last modified: 2017/05/02 09:09 by scott