This shows you the differences between two versions of the page.
Both sides previous revision Previous revision | Next revision Both sides next revision | ||
keller_and_evans_lab:meeting_notes [2017/09/06 12:14] richard_border |
keller_and_evans_lab:meeting_notes [2017/09/06 12:14] richard_border |
||
---|---|---|---|
Line 11: | Line 11: | ||
Overall study structure | Overall study structure | ||
500k px has subcomponents: | 500k px has subcomponents: | ||
- | | + | |
- | - QC datafile contains batch variable for every individual- see if it contains " | + | |
- | - differences between online/in person data | + | |
- | - two genotypings | + | |
- | - 50k on one of the chips where half heavy smokers | + | |
- | - two affy arrays but there are sig difs in call rates for particular SNPs | + | |
- phenotyping confounding with snp arrays and ascn for heavy smoking | - phenotyping confounding with snp arrays and ascn for heavy smoking | ||
- smoking also confounded with batch | - smoking also confounded with batch | ||
- | - Phenotype data available as .csv and .Rdata file generated by provided R script; possible for SAS as well | + | |
- | | + | !! rdata file is large and will excede memory allocated to login nodes |
- | - object is `bd` | + | |
- | | + | - each " |
- | | + | - f.50.0.0 : 0 is initial visit; 1: reax (-20k indiv); 2: imaging visit; |
- 50^ is var id | - 50^ is var id | ||
- details on phenotype page on wiki | - details on phenotype page on wiki | ||
Line 29: | Line 29: | ||
Phenotypes available | Phenotypes available | ||
- | | + | |
- | - wiki with list of fields out to email | + | |
- | - data on rc `/ | + | |
- | - for storage, important to use generic bgen files | + | |
- | + | ||
Data cleaning - need to ensure consistency across projects | Data cleaning - need to ensure consistency across projects | ||
- | - genotype data | + | |
- | - vcf files, | + | |
- | - ld-pruned relatedness files | + | |
- | - gargi will send out parameters (HWE, MAF cutoffs, etc) of cleaned files and location on directory (discussed previously by gargi and luke) | + | |
- | - QC | + | - QC |
- raw data will remain available | - raw data will remain available | ||
- one set of files that have a bare min of QC (e.g., for imputed data, info score >=.3, removing indels, individs whose self-rep vs genetic sex differs excluded, singleton doubleton excld, two phases of imputation with some error--should use HRC snps, so luke removed uk10k and 1kg only snps) | - one set of files that have a bare min of QC (e.g., for imputed data, info score >=.3, removing indels, individs whose self-rep vs genetic sex differs excluded, singleton doubleton excld, two phases of imputation with some error--should use HRC snps, so luke removed uk10k and 1kg only snps) |