User Tools

Site Tools


lab_1

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
lab_1 [2017/04/13 10:29]
scott
lab_1 [2017/04/25 10:49] (current)
scott /* Lab 1 Assignment */
Line 3: Line 3:
  
 The genotype file is located here: https://drive.google.com/file/d/0B608ps4vtHUaWFNOWXJqZ0tDMXc/view?usp=sharing The genotype file is located here: https://drive.google.com/file/d/0B608ps4vtHUaWFNOWXJqZ0tDMXc/view?usp=sharing
 +
 +
 +====== Lab 1 Assignment ======
 +
 +
 +### Lab 1 assignment
 +### Assigned: 4/13/2017
 +### Due: 4/20/2017 at the beginning of class. Late assignments (even by 5 minutes) 
 +###      will not be accepted!
 +###
 +### Note: all questions should be answered with respect to the 
 +###       genotypes from hu916767_20170324191934.txt
 +
 +
 +### Question 1 (4 points)
 +### a) What does "positive strand" mean in the header of the genotype file?
 +
 +### Question 2 (2 points)
 +### a) Provide a command that I can run to extract only the chromosome column of the genotype file.
 +
 +### Question 3 (2 points)
 +### a) Provide a command that I can run that extracts only the chromosome column of the genotype 
 +###    file, and pipes it to "sort -u".
 +### b) Provide the output of that command and tell me in your own words what the command did.
 +
 +### Question 4 (6 points)
 +### a) Give me a command that I can run that will extract the most 
 +###    commonly studied SNP associated with the flushing response discussed in class. 
 +### b) Interpret this individual's risk for alcoholism, the flushing response, 
 +###    esophageal cancer, and their response to Disulfiram.
 +### 
 +### Note: you will need to use your web searching abilities!
 +
 +### Question 5 (4 points)
 +### Find out more about SNP rs72921001 in dbSNP
 +### a) What is the minor allele in individuals of European ancestry? What is the MAF?
 +### b) What is the allele frequency of this allele in individuals of African ancestry?
 +### c) Is this SNP associated with any phenotypic effects?
 +### d) Describe the geographical distribution of allele frequency for this variant using
 +###    the website  http://popgen.uchicago.edu/ggv/
 +
 +
 +Example full credit answers:
 +  - Question 1
 +    - "The positive strand refers to the leading strand of DNA being sequenced (eg. the strand that RNA would be replicated against)."
 +    - "Each DNA strand is a double helix - it has two strands. The first strand given is the postive strand; the second strand is based on the first and is called the negative strand. For example, if the positive strand is ATCGG, then the negative strand is TAGCC (T always pairs with A, and G always pairs with C). The header is stating that the genome provided is only based on the first strand (the positive strand)."
 +  - Question 2
 +    - awk '{print $2}' hu916767_20170324191934.txt
 +    - cut -f2 hu916767_20170324191934.txt
 +  - Question 3
 +    - awk '{print $2}' hu916767_20170324191934.txt | sort -u
 +    - cut -f2 hu916767_20170324191934.txt | sort -u
 +    - The command extracts the second column from a tab-delimited file, alphanumerically sorts it, and removes all duplicate lines.
 +  - Question 4
 +    - grep 'rs671' hu916767_20170324191934.txt
 +    - Output: rs671 12 112241766 GG
 +    - "Interpretation: This individual does not flush, has a normal risk for alcoholism, normal risk of esophageal cancer, and Disulfiram is effective for alcoholism for this individual."
 +  - Question 5
 +    - Minor allele is A in individuals of European ancestry and MAF is .36
 +    - In individuals of African ancestry MAF is .021
 +    - The SNP is associated with thinking cilantro tastes like soap
 +    - "The minor allele is most common in central/southern Asia and western Europe, and least common in African with the Americas in between."
  
  
Line 119: Line 181:
 ### We can also grab both variants, if we wanted to ### We can also grab both variants, if we wanted to
 grep -E 'rs8176719|rs9430244' hu916767_20170324191934.txt grep -E 'rs8176719|rs9430244' hu916767_20170324191934.txt
 +
 +### What if we have a variant where we don't know the rsID, 
 +### but only the chromosome, position, genome build, and alleles? 
 +### Well, to get chromosome 1, position 11850759, we can do this:
 +grep -E '\s1\s11850750\s' hu916767_20170324191934.txt
 +
  
  
Line 155: Line 223:
  
  
-====== Lab 1 Assignment ====== +====== Useful databases ======
- +
-<syntaxhighlight lang="bash"> +
-  -            Lab 1 assignment +
-      -  Assigned: 4/13/2017 +
-      -  Due: 4/20/2017 at the beginning of class. Late assignments (even by 5 minutes)  +
-      -       will not be accepted! +
-      -  +
-      -  Note: all questions should be answered with respect to the  +
-      -        genotypes from hu916767_20170324191934.txt +
- +
-  -            Question 1 (4 points) +
-      -  a) What does "positive strand" mean in the header of the genotype file?+
  
-  -            Question 2 (2 points) 
-      -  a) Provide a command that I can run to extract only the chromosome column of the genotype file. 
  
-  -            Question 3 (2 points) +**Geography of Genetic Variants Browser** Interactively browse geographic distribution of genetic variants. Can compare to 1000 Genomes, ExAC, and POPRES (Euro-centric)http://popgen.uchicago.edu/ggv/?data=%221000genomes%22&chr=11&pos=6889648
-      -  a) Provide a command that I can run that extracts only the chromosome column of the genotype file, and pipes it to "sort -u". +
-      -  b) Provide the output of that command and tell me in your own words what the command did.+
  
-  -            Question 4 (6 points) +**dbSNP** A fairly exhaustive database of SNPs in humanshttps://www.ncbi.nlm.nih.gov/projects/SNP/
-      -  a) Give me a command that I can run that will extract the most  +
-      -     commonly studied SNP associated with the flushing response discussed in class +
-      -  b) Interpret this individual's risk for alcoholism, the flushing response,  +
-      -     esophageal cancer, and their response to Disulfiram. +
-      -   +
-      -  Noteyou will need to use your web searching abilities!+
  
-  -            Question 5 (4 points) +**ExAC** A good source for exonic variants. Very user friendly. http://exac.broadinstitute.org/
-      -  Find out more about SNP rs72921001 in dbSNP +
-      -  a) What is the minor allele in individuals of European ancestry? What is the MAF? +
-      -  b) What is the allele frequency of this allele in individuals of African ancestry? +
-      -  c) Is this SNP associated with any phenotypic effects?+
  
lab_1.1492100977.txt.gz · Last modified: 2017/04/13 10:29 by scott