====== Instructions ====== PSYC 7102 -- Statistical Genetics. Final Exam **Due:** December 17 2015 @ 5pm. There are 5 questions, each worth 8 points, for a total of 40 possible points. The exam is "open book". Use whatever online sources are helpful to you. All questions contain multiple parts -- please read the questions carefully and answer all components. Complete this exam on your own, without help from others. For this final exam I ask that you conduct all requested analyses on the following files, each of which contains a single person from 1000 Genomes. There's no need to copy any of these files. Just use them out of my directory. Aligned reads /Users/scvr9332/final_exam_files/HG00096.mapped.ILLUMINA.bwa.GBR.low_coverage.20120522.bam /Users/scvr9332/final_exam_files/HG00096.mapped.ILLUMINA.bwa.GBR.low_coverage.20120522.bam.bai Annotated VCFs /Users/scvr9332/final_exam_files/chrALL.filtered.PASS.beagled.HG00096.rsID.anno.vcf.gz /Users/scvr9332/final_exam_files/chrALL.filtered.PASS.beagled.HG00096.rsID.anno.vcf.gz.tbi ====== Question 1 ====== Interpret the phenotypic effects of variant rs16969968. - Tell me the location, alleles, genotype, dosage, and number of reads at this site for HG00096 (1 point) - What is the functional effect of the alternate allele at this site? (1 point) - How does this variant impact HG00096's risk for smoking? Be specific and provide a citation to support your claim. (2 points) - Use samtools to interactively visualize the reads at this site for HG00096 and describe the output for rs16969968. Include a screenshot. You will have to do some googling to understand the output! (4 points) Here is an example samtools visualization command: samtools tview -p chr:pos -d C in.bam /Users/scvr9332/reference_data/gotcloud.ref/hs37d5.fa ====== Question 2 ====== This section will draw from the quality control plots of a bad Illumina run here: http://www.bioinformatics.babraham.ac.uk/projects/fastqc/bad_sequence_fastqc.html - Look at the per tile sequence quality plot. Tell me what this type of plot represents, generally, and what this specific plot tells us about the sequencing run. (1 point) - Do the same for the Kmer Content plot and table. (1 point) - Looking at all the plots, what do you think caused the problems in this run? Why? (2 points) - Give me the command to print out lines in a gzipped BAM file that correspond to reads where at least one base has the lowest possible alignment quality score. (Think regex) (4 points) ====== Question 3 ====== What do genetic ancestry PCA estimates, such as those you generated in this course, represent (3 points)? In your opinion how is genetic ancestry different from/related to race/ethnicity (3 points)? List and discuss two reasons why ancestry is crucial to consider in genetic association studies (2 points). ====== Question 4 ====== Describe the difference between coding and regulatory variation and summarize their relative roles in complex traits and diseases. (8 points) ====== Question 5 ====== On a conceptual level, how does imputation work (2 points)? List and discuss three advantages of conducting imputation in genetic association studies (6 points).