This shows you the differences between two versions of the page.
Next revision | Previous revision Next revision Both sides next revision | ||
uk_biobank:downloading_the_data [2016/02/19 15:25] lessem Created page with "# The phenotype file was downloaded from UK Biobank by the project PI as instructed in the data accessibility email. # All of the utilities from the UK Biobank [http://biobank..." |
uk_biobank:downloading_the_data [2016/03/11 07:07] luke /* Quality Control */ |
||
---|---|---|---|
Line 1: | Line 1: | ||
- | - The phenotype file was downloaded from UK Biobank by the project PI as instructed in the data accessibility email. | + | These procedures were all derived from the [[http:// |
- | - All of the utilities from the UK Biobank http:// | + | |
- | - The key, k1234.key was saved from the PI's email. | + | |
- | - | + | ====== Phenotypic data ====== |
+ | |||
+ | |||
+ | |||
+ | - The phenotype file was downloaded from UK Biobank by the project PI as instructed in the data accessibility email. | ||
+ | - All of the utilities from the UK Biobank | ||
+ | - The key, k1234.key was saved from the PI's email. | ||
+ | - | ||
$ ./ | $ ./ | ||
which produced the file ukb1234.enc_ukb | which produced the file ukb1234.enc_ukb | ||
+ | - Once decrypted, the following commands were run to extract the data into useful formats | ||
+ | |||
+ | $ ./ukb_conv ukb1234.enc_ukb bulk -eencoding.ukb | ||
+ | $ ./ukb_conv ukb1234.enc_ukb docs -eencoding.ukb | ||
+ | $ ./ukb_conv ukb1234.enc_ukb r -eencoding.ukb | ||
+ | |||
+ | |||
+ | - bulk is a list of IDs for use with the ukbfetch utility | ||
+ | - docs produces an html file containing [[https:// | ||
+ | - r produces a tab deliminated file and an R script for labeling and putting levels on the variables. | ||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | ====== Genotypic data ====== | ||
+ | |||
+ | |||
+ | |||
+ | - Genetic data is downloaded following the instructions at [[http:// | ||
+ | - Scripted downloads of all chromosomes were done using a command such as | ||
+ | |||
+ | $ seq 1 26 | parallel -j1 ./gfetch cal {} | ||
+ | $ seq 1 26 | parallel -j1 ./gfetch imp {} | ||
+ | |||
+ | - A single sample map (impv1.sample) for the imputed data also was downloaded | ||
+ | |||
+ | $ ./gfetch imp 1 -m | ||
+ | |||
+ | |||
+ | |||
+ | |||
+ | ====== Quality Control ====== | ||
+ | |||
+ | |||
+ | |||
+ | - All files can be found on RC at: / | ||
+ | - UKB and Affymetrix performed a number of QC analyses to exclude questionable positions and identify individual samples. Additional pdfs from the UKBiobank are found within / | ||
+ | - Additional Affymetrix and UKB information can be found on their websites: | ||
+ | [[http:// | ||
+ | - A list of 1068 individuals to exclude is in Exclude_individuals.poorQC.UKB_Affy_sex.id on RC. | ||
+ | - A list of 8010 positions to exclude is in duplicate.positions.excludesnps.txt on RC. | ||
+ | - A README.txt file located on RC contains the steps used and additional information. |