This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
lab_1 [2017/04/11 14:14] scott |
lab_1 [2017/04/25 10:49] (current) scott /* Lab 1 Assignment */ |
||
---|---|---|---|
Line 3: | Line 3: | ||
The genotype file is located here: https:// | The genotype file is located here: https:// | ||
+ | |||
+ | |||
+ | ====== Lab 1 Assignment ====== | ||
+ | |||
+ | |||
+ | ### Lab 1 assignment | ||
+ | ### Assigned: 4/13/2017 | ||
+ | ### Due: 4/20/2017 at the beginning of class. Late assignments (even by 5 minutes) | ||
+ | ### will not be accepted! | ||
+ | ### | ||
+ | ### Note: all questions should be answered with respect to the | ||
+ | ### | ||
+ | |||
+ | |||
+ | ### Question 1 (4 points) | ||
+ | ### a) What does " | ||
+ | |||
+ | ### Question 2 (2 points) | ||
+ | ### a) Provide a command that I can run to extract only the chromosome column of the genotype file. | ||
+ | |||
+ | ### Question 3 (2 points) | ||
+ | ### a) Provide a command that I can run that extracts only the chromosome column of the genotype | ||
+ | ### file, and pipes it to "sort -u". | ||
+ | ### b) Provide the output of that command and tell me in your own words what the command did. | ||
+ | |||
+ | ### Question 4 (6 points) | ||
+ | ### a) Give me a command that I can run that will extract the most | ||
+ | ### commonly studied SNP associated with the flushing response discussed in class. | ||
+ | ### b) Interpret this individual' | ||
+ | ### esophageal cancer, and their response to Disulfiram. | ||
+ | ### | ||
+ | ### Note: you will need to use your web searching abilities! | ||
+ | |||
+ | ### Question 5 (4 points) | ||
+ | ### Find out more about SNP rs72921001 in dbSNP | ||
+ | ### a) What is the minor allele in individuals of European ancestry? What is the MAF? | ||
+ | ### b) What is the allele frequency of this allele in individuals of African ancestry? | ||
+ | ### c) Is this SNP associated with any phenotypic effects? | ||
+ | ### d) Describe the geographical distribution of allele frequency for this variant using | ||
+ | ### the website | ||
+ | |||
+ | |||
+ | Example full credit answers: | ||
+ | - Question 1 | ||
+ | - "The positive strand refers to the leading strand of DNA being sequenced (eg. the strand that RNA would be replicated against)." | ||
+ | - "Each DNA strand is a double helix - it has two strands. The first strand given is the postive strand; the second strand is based on the first and is called the negative strand. For example, if the positive strand is ATCGG, then the negative strand is TAGCC (T always pairs with A, and G always pairs with C). The header is stating that the genome provided is only based on the first strand (the positive strand)." | ||
+ | - Question 2 | ||
+ | - awk ' | ||
+ | - cut -f2 hu916767_20170324191934.txt | ||
+ | - Question 3 | ||
+ | - awk ' | ||
+ | - cut -f2 hu916767_20170324191934.txt | sort -u | ||
+ | - The command extracts the second column from a tab-delimited file, alphanumerically sorts it, and removes all duplicate lines. | ||
+ | - Question 4 | ||
+ | - grep ' | ||
+ | - Output: rs671 12 112241766 GG | ||
+ | - " | ||
+ | - Question 5 | ||
+ | - Minor allele is A in individuals of European ancestry and MAF is .36 | ||
+ | - In individuals of African ancestry MAF is .021 | ||
+ | - The SNP is associated with thinking cilantro tastes like soap | ||
+ | - "The minor allele is most common in central/ | ||
Line 119: | Line 181: | ||
### We can also grab both variants, if we wanted to | ### We can also grab both variants, if we wanted to | ||
grep -E ' | grep -E ' | ||
+ | |||
+ | ### What if we have a variant where we don't know the rsID, | ||
+ | ### but only the chromosome, position, genome build, and alleles? | ||
+ | ### Well, to get chromosome 1, position 11850759, we can do this: | ||
+ | grep -E ' | ||
+ | |||
Line 153: | Line 221: | ||
grep -E ' | grep -E ' | ||
+ | |||
+ | |||
+ | ====== Useful databases ====== | ||
+ | |||
+ | |||
+ | **Geography of Genetic Variants Browser** Interactively browse geographic distribution of genetic variants. Can compare to 1000 Genomes, ExAC, and POPRES (Euro-centric). http:// | ||
+ | |||
+ | **dbSNP** A fairly exhaustive database of SNPs in humans. https:// | ||
+ | |||
+ | **ExAC** A good source for exonic variants. Very user friendly. http:// | ||