User Tools

Site Tools


keller_and_evans_lab:gscan

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Next revision Both sides next revision
keller_and_evans_lab:gscan [2016/05/02 07:42]
scott /* Phenotype definitions and analysis plan */
keller_and_evans_lab:gscan [2016/09/12 14:59]
scott /* Phenotype definitions and analysis plan */
Line 7: Line 7:
  
 Regular conference calls are held and minutes are [[https://docs.google.com/document/d/1ZK9VIXxcej3lat_oD_oxPP0ajHwj8yX_FKaPh53svVo/edit#|**available here**.]] Regular conference calls are held and minutes are [[https://docs.google.com/document/d/1ZK9VIXxcej3lat_oD_oxPP0ajHwj8yX_FKaPh53svVo/edit#|**available here**.]]
 +
 +Other meeting materials from CO internal meetings are here:
 +
 +[[gscan_6:16:16_--_db_ga_p_gf_g]]
  
  
 ======= GSCAN Exome Chip ======= ======= GSCAN Exome Chip =======
 +
 +
 +====== Phenotype definitions and analysis plan ======
 +
 +{{file_gscan_exome_chip_analysis_plan-v2_2.pdfExome chip analysis plan and phenotype definitions.}}
  
  
Line 24: Line 33:
 ====== Phenotype definitions and analysis plan ====== ====== Phenotype definitions and analysis plan ======
  
-Click here for the GWAS analysis plan {{file_gscan_gwas_analysis_plan-v1_2.pdf}}+The analysis plan and phenotypes are described in files linked below (makes it easier to keep track of versioning!). Coding of phenotypes is described in the aptly-named "phenotype definitions" file whereas the genome-wide analysis plan is in the all-too-aptly-named "analysis plan" document. Please note that the phenotype definitions document only contains information on how to code the eight smoking/drinking phenotypes. File formats for those phenotypes, which many will recognize as standard pedigree formats, are included in the analysis plan. Everything else should be fairly straightforward. 
 + 
 +{{file_gscan_gwas_analysis_plan-v1_3.docxClick here to find the GSCAN GWAS analysis plan.}} 
 + 
 +{{file_gscan_gwas_phenotype_definitions-2-24-2016.pdfClick here to find the GSCAN GWAS phenotype definitions.}}
  
  
Line 41: Line 54:
  
 On RC the organization is similar. Everything is located within the folder /work/KellerLab/GSCAN/GWAS. Study data to which we have raw data access are in the folder //individual_level_study_data//. Summary stats generated on these samples are organized within //summary_stats_generated_internally//. Summary stats generated by outside groups and submitted for meta-analysis are organized within //summary_stats_generated_externally//. On RC the organization is similar. Everything is located within the folder /work/KellerLab/GSCAN/GWAS. Study data to which we have raw data access are in the folder //individual_level_study_data//. Summary stats generated on these samples are organized within //summary_stats_generated_internally//. Summary stats generated by outside groups and submitted for meta-analysis are organized within //summary_stats_generated_externally//.
 +
 +
 +====== [[gscan_db_ga_p]] ======
 +
 +Studies included from dbGaP, and the process by which phenotypes and genotypes were constructed and merged is outlined on the [[gscan_db_ga_p]] page.
 +
 +
 +====== GSCAN use of UKBiobank ======
 +
 +More information about the files used for [[uk_biobank|UKBiobank are here]]. In brief, we used the UK10K + 1kgp3 imputed vcfs provided by UKBionank and added in dosages w/ this python script:
 +
 +import gzip, argparse, re, os, datetime
 +from subprocess import Popen, PIPE
 +
 +def add_dosage(pair):
 +        a, b = pair
 +        probs = b.split(b',')
 +        dose = float(probs[1]) + (float(probs[2]) * 2)
 +        return a + b':' + str(dose).encode('ascii') + b':' + b
 +
 +def gziplines(fname):
 +  f = Popen(['zcat', fname], stdout=PIPE)
 +  for line in f.stdout:
 +      yield line
 +
 +parser = argparse.ArgumentParser()
 +parser.add_argument('inputVCF', help = 'The path to the VCF')
 +args = parser.parse_args()
 +
 +flag = False
 +
 +for line in gziplines(args.inputVCF):
 +        if line.startswith(b'#'):
 +                os.write(1, line.rstrip() + b'\n')
 +                if not flag:
 +                        os.write(1, b'##FORMAT=<ID=DS,Number=1,Type=Float,Description="Genotype Dosages">\n')
 +                        os.write(1, b'##Dosages added using the script add.dosages.subprocess.py at ' +
 +                                str(datetime.datetime.now()).encode('ascii') + b'\n')
 +                        flag = True
 +        else:
 +                elements = re.split(b'\t|:', line.rstrip())
 +                first8 = elements[:8]
 +                genotypes = elements[10:]
 +                form = b'GT:DS:GP'
 +
 +                genotypes_split = zip(genotypes[::2], genotypes[1::2])
 +                try:
 +                        dose_genos = [add_dosage(pair) for pair in genotypes_split]
 +                except (ValueError, IndexError) as e:
 +                        os.write(2, "\n" + line)
 +                        os.write(2, line + "\n" + args.inputVCF + "\n\n")
 +                        raise e
 +                os.write(1, b'\t'.join(first8) + b'\t' + form + b'\t' + b'\t'.join(dose_genos) + b'\n')
 +
  
  
 ======= GSCAN Sequencing ======= ======= GSCAN Sequencing =======
 +
 +
 +====== TOPMed ======
 +
 +
 +===== Phenotype definitions and analysis plan =====
 +
 +Phenotype definitions and analysis plans for the TOPMed studies are {{file_topmed_smoking_analysis_plan-v0_2.docxcontained in this document}}.
  
 The list of dbGaP studies in TOPMed is in [[https://airtable.com/shryD6CMaM6R5sA3e/tblUKENXX5WmgNXQ8|**this Airtable**]]. The list of dbGaP studies in TOPMed is in [[https://airtable.com/shryD6CMaM6R5sA3e/tblUKENXX5WmgNXQ8|**this Airtable**]].
 +
keller_and_evans_lab/gscan.txt · Last modified: 2019/10/28 15:59 by lessem