User Tools

Site Tools


keller_and_evans_lab:gscan

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
keller_and_evans_lab:gscan [2016/09/12 14:59]
scott /* Phenotype definitions and analysis plan */
keller_and_evans_lab:gscan [2019/10/28 15:59] (current)
lessem ↷ Links adapted because of a move operation
Line 18: Line 18:
 ====== Phenotype definitions and analysis plan ====== ====== Phenotype definitions and analysis plan ======
  
-{{file_gscan_exome_chip_analysis_plan-v2_2.pdfExome chip analysis plan and phenotype definitions.}}+{{:file_gscan_exome_chip_analysis_plan-v2_2.pdfexome_chip_analysis_plan_and_phenotype_definitions}}
  
  
Line 29: Line 29:
  
 ======= GSCAN GWAS ======= ======= GSCAN GWAS =======
 +
 +
 +====== Workgroups ======
 +
 +  * **Phenotype workgroup:** Laura Bierut, Marilyn Cornelis, Dave Hinds, Youna Hu, Jaakko Kaprio, Eric Jorgenson, Dajiang Liu, Matt McGue, Marcus Munafo, Gunter Schumann, Scott Vrieze, Luisa Zuccolo
 +  * **Analysis workgroup:** Goncalo Abecasis, David Hinds, Youna Hu, Eric Jorgenson, Charles Kooperberg, Pete Kraft, Penelope Lind, Dajiang Liu, Nancy Saccone, Dan Stram, Scott Vrieze, Xiaowei Zhan
  
  
Line 35: Line 41:
 The analysis plan and phenotypes are described in files linked below (makes it easier to keep track of versioning!). Coding of phenotypes is described in the aptly-named "phenotype definitions" file whereas the genome-wide analysis plan is in the all-too-aptly-named "analysis plan" document. Please note that the phenotype definitions document only contains information on how to code the eight smoking/drinking phenotypes. File formats for those phenotypes, which many will recognize as standard pedigree formats, are included in the analysis plan. Everything else should be fairly straightforward. The analysis plan and phenotypes are described in files linked below (makes it easier to keep track of versioning!). Coding of phenotypes is described in the aptly-named "phenotype definitions" file whereas the genome-wide analysis plan is in the all-too-aptly-named "analysis plan" document. Please note that the phenotype definitions document only contains information on how to code the eight smoking/drinking phenotypes. File formats for those phenotypes, which many will recognize as standard pedigree formats, are included in the analysis plan. Everything else should be fairly straightforward.
  
-{{file_gscan_gwas_analysis_plan-v1_3.docxClick here to find the GSCAN GWAS analysis plan.}}+{{:file_gscan_gwas_analysis_plan-v1_3.docxclick_here_to_find_the_gscan_gwas_analysis_plan}}
  
-{{file_gscan_gwas_phenotype_definitions-2-24-2016.pdfClick here to find the GSCAN GWAS phenotype definitions.}}+{{:file_gscan_gwas_phenotype_definitions-2-24-2016.pdfclick_here_to_find_the_gscan_gwas_phenotype_definitions}}
  
  
Line 56: Line 62:
  
  
-====== [[gscan_db_ga_p]] ======+====== [[gscan_db_ga_p]] & UK Biobank ======
  
-Studies included from dbGaP, and the process by which phenotypes and genotypes were constructed and merged is outlined on the [[gscan_db_ga_p]] page.+Studies included from dbGaP, and the process by which phenotypes and genotypes were constructed and merged is outlined on the [[:gscan_db_ga_p]] page.
  
  
-====== GSCAN use of UKBiobank ======+======= GSCAN Sequencing =======
  
-More information about the files used for [[uk_biobank|UKBiobank are here]]. In brief, we used the UK10K + 1kgp3 imputed vcfs provided by UKBionank and added in dosages w/ this python script: 
  
-import gzip, argparse, re, os, datetime +====== TOPMed ======
-from subprocess import Popen, PIPE+
  
-def add_dosage(pair): +We hope to update this section with detailed descriptions of how we have conducted phenotype derivations for each TOPMed cohort to which we have access to raw data
-        a, b = pair +
-        probs = b.split(b','+
-        dose = float(probs[1]) + (float(probs[2]) * 2) +
-        return a + b':' + str(dose).encode('ascii') + b':' + b+
  
-def gziplines(fname): +  *  For nowthe R scripts to go from source phenotype file to eventual derived phenotype is located here: 
-  f = Popen(['zcat'fname], stdout=PIPE) +  /net/twins/svrieze/everything-else/wp/GSCAN/TOPMed/README
-  for line in f.stdout: +
-      yield line+
  
-parser = argparse.ArgumentParser() +  *  We're tracking analyses in [[https://docs.google.com/document/d/1HtJY6DzPWqr2XGTAD8HzoiIUoC45nSj116neRb3do3s/edit|**this Google doc**]]
-parser.add_argument('inputVCF', help = 'The path to the VCF'+
-args = parser.parse_args()+
  
-flag = False 
  
-for line in gziplines(args.inputVCF): +===== Phenotype definitions and analysis plan for external studies =====
-        if line.startswith(b'#'): +
-                os.write(1, line.rstrip() + b'\n'+
-                if not flag: +
-                        os.write(1, b'##FORMAT=<ID=DS,Number=1,Type=Float,Description="Genotype Dosages">\n'+
-                        os.write(1, b'##Dosages added using the script add.dosages.subprocess.py at ' + +
-                                str(datetime.datetime.now()).encode('ascii') + b'\n'+
-                        flag True +
-        else: +
-                elements re.split(b'\t|:', line.rstrip()) +
-                first8 elements[:8] +
-                genotypes elements[10:+
-                form b'GT:DS:GP'+
  
-                genotypes_split = zip(genotypes[::2], genotypes[1::2]) +Phenotype definitions and analysis plans for the TOPMed studies are {{:file_topmed_smoking_analysis_plan-v0_2.docxcontained_in_this_document}}.
-                try: +
-                        dose_genos = [add_dosage(pair) for pair in genotypes_split] +
-                except (ValueError, IndexError) as e: +
-                        os.write(2, "\n" + line) +
-                        os.write(2, line + "\n" + args.inputVCF + "\n\n"+
-                        raise e +
-                os.write(1, b'\t'.join(first8) + b'\t' + form + b'\t' + b'\t'.join(dose_genos) + b'\n')+
  
 +The list of dbGaP studies in TOPMed is in [[https://airtable.com/shryD6CMaM6R5sA3e/tblUKENXX5WmgNXQ8|**this Airtable**]].
  
  
-======= GSCAN Sequencing =======+======= Authorship guidelines =======
  
  
-====== TOPMed ======+While authorship is decided on an individual basis for each GSCAN paper, typically, authorship is arranged in groups. We hope the GIANT investigators will forgive us for adopting their authorship guidelines. 
  
- +  *  A group of 6 or fewer junior investigators who strongly led the efforts, usually starred to denote equal contribution, followed by additional junior investigators who played key, central roles. 
-===== Phenotype definitions and analysis plan ===== +   In alphabetical order, junior investigators who had substantial individual contributions but not as much as those in Group 1. Typically, these might be lead analysts or other junior investigators who made a sizable contribution such as GWA analyses performed specifically for the paper. 
- +   In alphabetical order, junior investigators who had notable individual contributions but not as much as those in Groups 1 or 2. Typically, these might be lead analysts for replication cohorts, providing results for a group of top hits. 
-Phenotype definitions and analysis plans for the TOPMed studies are {{file_topmed_smoking_analysis_plan-v0_2.docxcontained in this document}}+   In alphabetical order, junior and senior investigators who had contributions worthy of authorship (participating in analysis, phenotype collection, genotyping, oversight of cohorts, etcthat was specific to the paper) but not as much as those in the other groups
- +   In alphabetical order, senior investigators who had contributions worthy of authorship and contributed more than those in group 4. Typically, these might be a lead PI of a participating cohort who did not participate as strongly in GSCAN activities as those in group 6. 
-The list of dbGaP studies in TOPMed is in [[https://airtable.com/shryD6CMaM6R5sA3e/tblUKENXX5WmgNXQ8|**this Airtable**]].+  *  In alphabetical order, senior investigators who participated strongly in GSCAN activities but did not strongly lead/oversee the writing and/or analysis for the paperTypically, these might be leaders of key GSCAN activities. 
 +  *  The senior investigators who strongly led/oversaw the writing and/or analysis of the paper, including a subset that are co-corresponding authors (usually 6 or fewer).
  
keller_and_evans_lab/gscan.1473713996.txt.gz · Last modified: 2016/09/12 14:59 by scott