User Tools

Site Tools


keller_and_evans_lab:gscan

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Next revision Both sides next revision
keller_and_evans_lab:gscan [2016/04/22 13:46]
scott /* GSCAN Sequencing */
keller_and_evans_lab:gscan [2016/08/29 10:28]
scott
Line 7: Line 7:
  
 Regular conference calls are held and minutes are [[https://docs.google.com/document/d/1ZK9VIXxcej3lat_oD_oxPP0ajHwj8yX_FKaPh53svVo/edit#|**available here**.]] Regular conference calls are held and minutes are [[https://docs.google.com/document/d/1ZK9VIXxcej3lat_oD_oxPP0ajHwj8yX_FKaPh53svVo/edit#|**available here**.]]
 +
 +Other meeting materials from CO internal meetings are here:
 +
 +[[gscan_6:16:16_--_db_ga_p_gf_g]]
  
  
 ======= GSCAN Exome Chip ======= ======= GSCAN Exome Chip =======
 +
 +
 +====== Phenotype definitions and analysis plan ======
 +
 +{{file_gscan_exome_chip_analysis_plan-v2_2.pdfExome chip analysis plan and phenotype definitions.}}
  
  
Line 20: Line 29:
  
 ======= GSCAN GWAS ======= ======= GSCAN GWAS =======
 +
 +
 +====== Phenotype definitions and analysis plan ======
 +
 +The analysis plan and phenotypes are described in files linked below (makes it easier to keep track of versioning!). Coding of phenotypes is described in the aptly-named "phenotype definitions" file whereas the genome-wide analysis plan is in the all-too-aptly-named "analysis plan" document. Please note that the phenotype definitions document only contains information on how to code the eight smoking/drinking phenotypes. File formats for those phenotypes, which many will recognize as standard pedigree formats, are included in the analysis plan. Everything else should be fairly straightforward.
 +
 +{{file_gscan_gwas_analysis_plan-v1_2.pdfClick here to find the GSCAN GWAS analysis plan.}}
 +
 +{{file_gscan_gwas_phenotype_definitions-2-24-2016.pdfClick here to find the GSCAN GWAS phenotype definitions.}}
  
  
 ====== Coordination and organization ====== ====== Coordination and organization ======
  
-All analyses, internal and external, are tracked in [[https://docs.google.com/document/d/1kWaY40n-bSURoLW7VcU9CFv08zVx360RHmxvL7DIreU/edit|**this Google Doc**]]. +Progress, internal and external, are tracked in [[https://docs.google.com/document/d/1kWaY40n-bSURoLW7VcU9CFv08zVx360RHmxvL7DIreU/edit|**this Google Doc**]]. More specific progress on internal studies is  [[https://docs.google.com/spreadsheets/d/1canvCaAJW70LjSHidtvwrJgyDMa_ZlT7dvpzOsz6PNY/edit#gid=0|**tracked here**]].
  
 Study contact info is tracked in [[https://docs.google.com/spreadsheets/d/11apZaSyesNy4hl4MIgrKRYSASrwZM2iEJsuuFQByCfI/edit#gid=0|**this Google Sheet**]]. Study contact info is tracked in [[https://docs.google.com/spreadsheets/d/11apZaSyesNy4hl4MIgrKRYSASrwZM2iEJsuuFQByCfI/edit#gid=0|**this Google Sheet**]].
  
-Studies available in dbGaP, along with accession numbers, etc. are tracked in [[https://airtable.com/tblzZUtQWcZSlfjrA/viwhISDznphLfST8m|**this Airtable**]]. +Studies available in dbGaP, along with accession numbers, etc. are tracked in [[https://airtable.com/tblzZUtQWcZSlfjrA/viwhISDznphLfST8m|**this Airtable**]].
  
  
Line 36: Line 54:
  
 On RC the organization is similar. Everything is located within the folder /work/KellerLab/GSCAN/GWAS. Study data to which we have raw data access are in the folder //individual_level_study_data//. Summary stats generated on these samples are organized within //summary_stats_generated_internally//. Summary stats generated by outside groups and submitted for meta-analysis are organized within //summary_stats_generated_externally//. On RC the organization is similar. Everything is located within the folder /work/KellerLab/GSCAN/GWAS. Study data to which we have raw data access are in the folder //individual_level_study_data//. Summary stats generated on these samples are organized within //summary_stats_generated_internally//. Summary stats generated by outside groups and submitted for meta-analysis are organized within //summary_stats_generated_externally//.
 +
 +
 +====== [[gscan_db_ga_p]] ======
 +
 +Studies included from dbGaP, and the process by which phenotypes and genotypes were constructed and merged is outlined on the [[gscan_db_ga_p]] page.
 +
 +
 +====== GSCAN use of UKBiobank ======
 +
 +More information about the files used for [[uk_biobank|UKBiobank are here]]. In brief, we used the UK10K + 1kgp3 imputed vcfs provided by UKBionank and added in dosages w/ this python script:
 +
 +import gzip, argparse, re, os, datetime
 +from subprocess import Popen, PIPE
 +
 +def add_dosage(pair):
 +        a, b = pair
 +        probs = b.split(b',')
 +        dose = float(probs[1]) + (float(probs[2]) * 2)
 +        return a + b':' + str(dose).encode('ascii') + b':' + b
 +
 +def gziplines(fname):
 +  f = Popen(['zcat', fname], stdout=PIPE)
 +  for line in f.stdout:
 +      yield line
 +
 +parser = argparse.ArgumentParser()
 +parser.add_argument('inputVCF', help = 'The path to the VCF')
 +args = parser.parse_args()
 +
 +flag = False
 +
 +for line in gziplines(args.inputVCF):
 +        if line.startswith(b'#'):
 +                os.write(1, line.rstrip() + b'\n')
 +                if not flag:
 +                        os.write(1, b'##FORMAT=<ID=DS,Number=1,Type=Float,Description="Genotype Dosages">\n')
 +                        os.write(1, b'##Dosages added using the script add.dosages.subprocess.py at ' +
 +                                str(datetime.datetime.now()).encode('ascii') + b'\n')
 +                        flag = True
 +        else:
 +                elements = re.split(b'\t|:', line.rstrip())
 +                first8 = elements[:8]
 +                genotypes = elements[10:]
 +                form = b'GT:DS:GP'
 +
 +                genotypes_split = zip(genotypes[::2], genotypes[1::2])
 +                try:
 +                        dose_genos = [add_dosage(pair) for pair in genotypes_split]
 +                except (ValueError, IndexError) as e:
 +                        os.write(2, "\n" + line)
 +                        os.write(2, line + "\n" + args.inputVCF + "\n\n")
 +                        raise e
 +                os.write(1, b'\t'.join(first8) + b'\t' + form + b'\t' + b'\t'.join(dose_genos) + b'\n')
 +
  
  
 ======= GSCAN Sequencing ======= ======= GSCAN Sequencing =======
 +
 +
 +====== TOPMed ======
 +
 +Preliminary phenotype definitions for distributed analyses of TOPMed data are provided in this document.
  
 The list of dbGaP studies in TOPMed is in [[https://airtable.com/shryD6CMaM6R5sA3e/tblUKENXX5WmgNXQ8|**this Airtable**]]. The list of dbGaP studies in TOPMed is in [[https://airtable.com/shryD6CMaM6R5sA3e/tblUKENXX5WmgNXQ8|**this Airtable**]].
 +
keller_and_evans_lab/gscan.txt · Last modified: 2019/10/28 15:59 by lessem