User Tools

Site Tools


keller_and_evans_lab:meeting_notes

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision Both sides next revision
keller_and_evans_lab:meeting_notes [2017/09/06 12:14]
richard_border
keller_and_evans_lab:meeting_notes [2017/09/06 12:14]
richard_border
Line 11: Line 11:
 Overall study structure Overall study structure
 500k px has subcomponents: 500k px has subcomponents:
-    - phenotyping changed over course of study (eg personality only available for a subset) +  - phenotyping changed over course of study (eg personality only available for a subset) 
-    - QC datafile contains batch variable for every individual- see if it contains "BiLEVE" +     - QC datafile contains batch variable for every individual- see if it contains "BiLEVE" 
-    - differences between online/in person data +     - differences between online/in person data 
-    - two genotypings  +     - two genotypings  
-    - 50k on one of the chips where half heavy smokers +     - 50k on one of the chips where half heavy smokers 
-    - two affy arrays but there are sig difs in call rates for particular SNPs +     - two affy arrays but there are sig difs in call rates for particular SNPs 
 - phenotyping confounding with snp arrays and ascn for heavy smoking - phenotyping confounding with snp arrays and ascn for heavy smoking
   - smoking also confounded with batch    - smoking also confounded with batch 
-   - Phenotype data available as .csv and .Rdata file generated by provided R script; possible for SAS as well  +    - Phenotype data available as .csv and .Rdata file generated by provided R script; possible for SAS as well  
-   !! rdata file is large and will excede memory allocated to login nodes +    !! rdata file is large and will excede memory allocated to login nodes 
-        - object is `bd`  +         - object is `bd`  
-   - each "project"/request has it's own file as IDs have been randomized; `f.eid` is randomized day linking phen/gene data within requests; can establish bijection between eids across projects via plink sample files (ie eid1 <-> pos <-> eid2) +    - each "project"/request has it's own file as IDs have been randomized; `f.eid` is randomized day linking phen/gene data within requests; can establish bijection between eids across projects via plink sample files (ie eid1 <-> pos <-> eid2) 
-   - f.50.0.0 : 0 is initial visit; 1: reax (-20k indiv); 2: imaging visit;+    - f.50.0.0 : 0 is initial visit; 1: reax (-20k indiv); 2: imaging visit;
 - 50^ is var id - 50^ is var id
 - details on phenotype page on wiki - details on phenotype page on wiki
Line 29: Line 29:
  
 Phenotypes available Phenotypes available
-    - psychiatric sx data (now available) -- need to submit additional application if interested in using (particularly suicide) +  - psychiatric sx data (now available) -- need to submit additional application if interested in using (particularly suicide) 
-    - wiki with list of fields out to email +     - wiki with list of fields out to email 
-    - data on rc `/work/ibg/` but some still in kellerlab still waiting on data availability +     - data on rc `/work/ibg/` but some still in kellerlab still waiting on data availability 
-    - for storage, important to use generic bgen files +     - for storage, important to use generic bgen files 
-    +
 Data cleaning - need to ensure consistency across projects Data cleaning - need to ensure consistency across projects
- -  genotype data +  -  genotype data 
-  - vcf files,  +   - vcf files,  
-  - ld-pruned relatedness files +   - ld-pruned relatedness files 
-  - gargi will send out parameters (HWE, MAF cutoffs, etc) of cleaned files and location on directory (discussed previously by gargi and luke) +   - gargi will send out parameters (HWE, MAF cutoffs, etc) of cleaned files and location on directory (discussed previously by gargi and luke) 
- - QC +  - QC 
 - raw data will remain available - raw data will remain available
 - one set of files that have a bare min of QC (e.g., for imputed data, info score >=.3, removing indels, individs whose self-rep vs genetic sex differs excluded, singleton doubleton excld, two phases of imputation with some error--should use HRC snps, so luke removed uk10k and 1kg only snps) - one set of files that have a bare min of QC (e.g., for imputed data, info score >=.3, removing indels, individs whose self-rep vs genetic sex differs excluded, singleton doubleton excld, two phases of imputation with some error--should use HRC snps, so luke removed uk10k and 1kg only snps)
keller_and_evans_lab/meeting_notes.txt · Last modified: 2019/10/31 10:50 by lessem