uk_biobank:downloading_the_data
Differences
This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| uk_biobank:downloading_the_data [2016/02/23 18:17] – lessem | uk_biobank:downloading_the_data [2020/04/22 17:35] (current) – luev6784 | ||
|---|---|---|---|
| Line 10: | Line 10: | ||
| - The key, k1234.key was saved from the PI's email. | - The key, k1234.key was saved from the PI's email. | ||
| - This command was run to decrypt the downloaded phenotype file | - This command was run to decrypt the downloaded phenotype file | ||
| - | < | + | |
| $ ./ | $ ./ | ||
| - | </ | + | |
| which produced the file ukb1234.enc_ukb | which produced the file ukb1234.enc_ukb | ||
| - Once decrypted, the following commands were run to extract the data into useful formats | - Once decrypted, the following commands were run to extract the data into useful formats | ||
| - | < | + | |
| $ ./ukb_conv ukb1234.enc_ukb bulk -eencoding.ukb | $ ./ukb_conv ukb1234.enc_ukb bulk -eencoding.ukb | ||
| $ ./ukb_conv ukb1234.enc_ukb docs -eencoding.ukb | $ ./ukb_conv ukb1234.enc_ukb docs -eencoding.ukb | ||
| $ ./ukb_conv ukb1234.enc_ukb r -eencoding.ukb | $ ./ukb_conv ukb1234.enc_ukb r -eencoding.ukb | ||
| - | </ | + | |
| - bulk is a list of IDs for use with the ukbfetch utility | - bulk is a list of IDs for use with the ukbfetch utility | ||
| Line 35: | Line 35: | ||
| - Genetic data is downloaded following the instructions at [[http:// | - Genetic data is downloaded following the instructions at [[http:// | ||
| - Scripted downloads of all chromosomes were done using a command such as | - Scripted downloads of all chromosomes were done using a command such as | ||
| - | < | + | |
| $ seq 1 26 | parallel -j1 ./gfetch cal {} | $ seq 1 26 | parallel -j1 ./gfetch cal {} | ||
| $ seq 1 26 | parallel -j1 ./gfetch imp {} | $ seq 1 26 | parallel -j1 ./gfetch imp {} | ||
| - | </ | + | |
| - A single sample map (impv1.sample) for the imputed data also was downloaded | - A single sample map (impv1.sample) for the imputed data also was downloaded | ||
| - | < | + | |
| $ ./gfetch imp 1 -m | $ ./gfetch imp 1 -m | ||
| - | </ | ||
| + | |||
| + | |||
| + | |||
| + | ====== Quality Control ====== | ||
| + | |||
| + | |||
| + | |||
| + | We identified lists of individuals and positions to exclude from information in the UKB data and in the Axiom Array unimputed genotypes. | ||
| + | - A very brief overveiw of QC steps can be found in this .pptx file{{ : | ||
| + | - All files can be found on RC at: / | ||
| + | - UKB and Affymetrix performed a number of QC analyses to exclude questionable positions and identify individual samples. Additional pdfs from the UKBiobank are found within / | ||
| + | - Additional Affymetrix and UKB information can be found on their websites: | ||
| + | [[http:// | ||
| + | - A list of 1068 individuals to exclude is in Exclude_individuals.poorQC.UKB_Affy_sex.id on RC. | ||
| + | - A list of 8010 positions to exclude is in duplicate.positions.excludesnps.txt on RC. | ||
| + | - A README.txt file located on RC contains the steps used and additional information. | ||
uk_biobank/downloading_the_data.1456251436.txt.gz · Last modified: by lessem
