Differences

This shows you the differences between two versions of the page.

--- uk_biobank:downloading_the_data [2016/02/19 22:25] – Created page with "# The phenotype file was downloaded from UK Biobank by the project PI as instructed in the data accessibility email. # All of the utilities from the UK Biobank [http://biobank..." lessem
+++ uk_biobank:downloading_the_data [2020/04/22 17:35] (current) – luev6784
@@ Line 1: / Line 1: @@
-  -  The phenotype file was downloaded from UK Biobank by the project PI as instructed in the data accessibility email.
+These procedures were all derived from the [[http://biobank.ctsu.ox.ac.uk/showcase/exinfo.cgi?src=accessing_data_guide|documentation]] at the UK Biobank. This information is here as a record and reference. Researchers should not have to repeat these steps.
-  -  All of the utilities from the UK Biobank http://biobank.ctsu.ox.ac.uk/showcase/download.cgi|download page were retrieved.
-  -  The key, k1234.key was saved from the PI's email.
-  -  These commands were run to decrypt the downloaded phenotype file
+====== Phenotypic data ======
+  - The phenotype file was downloaded from UK Biobank by the project PI as instructed in the data accessibility email.
+  - All of the utilities from the UK Biobank [[http://biobank.ctsu.ox.ac.uk/showcase/download.cgi|download]] page were retrieved.
+  - The key, k1234.key was saved from the PI's email.
+  -  This command was run to decrypt the downloaded phenotype file
 $ ./ukb_unpack ukb1234.enc k1234.key
 which produced the file ukb1234.enc_ukb
+  - Once decrypted, the following commands were run to extract the data into useful formats
+$ ./ukb_conv ukb1234.enc_ukb bulk -eencoding.ukb
+$ ./ukb_conv ukb1234.enc_ukb docs -eencoding.ukb
+$ ./ukb_conv ukb1234.enc_ukb r -eencoding.ukb
+    - bulk is a list of IDs for use with the ukbfetch utility
+    - docs produces an html file containing [[https://ibg.colorado.edu/~lessem/ukb6395.html|documentation of the variables]] in this dataset
+    - r produces a tab deliminated file and an R script for labeling and putting levels on the variables.
+====== Genotypic data ======
+  -  Genetic data is downloaded following the instructions at [[http://biobank.ctsu.ox.ac.uk/showcase/exinfo.cgi?src=AccessingGeneticData|the UK Biobank site]].
+  -  Scripted downloads of all chromosomes were done using a command such as
+$ seq 1 26 | parallel -j1 ./gfetch cal {}
+$ seq 1 26 | parallel -j1 ./gfetch imp {}
+  -  A single sample map (impv1.sample) for the imputed data also was downloaded
+$ ./gfetch imp 1 -m
+====== Quality Control ======
+We identified lists of individuals and positions to exclude from information in the UKB data and in the Axiom Array unimputed genotypes.
+  - A very brief overveiw of QC steps can be found in this .pptx file{{ :uk_biobank:2020_04_22_ukb_qc.pptx |}}.
+  -  All files can be found on RC at: /work/KellerLab/UKBiobank/genetics/raw/Quality_Control
+  -  UKB and Affymetrix performed a number of QC analyses to exclude questionable positions and identify individual samples. Additional pdfs from the UKBiobank are found within /work/KellerLab/UKBiobank/genetics/raw/Quality_Control/UK_Biobank_Axiom_Array
+  -  Additional Affymetrix and UKB information can be found on their websites:
+        [[http://www.ukbiobank.ac.uk/scientists-3/uk-biobank-axiom-array/|UK Biobank Axiom Array]], [[https://www.jiscmail.ac.uk/cgi-bin/webadmin?A0=UKB-GENETICS|UKB-Genetics Archive]]
+  -  A list of 1068 individuals to exclude is in Exclude_individuals.poorQC.UKB_Affy_sex.id on RC.
+  -  A list of 8010 positions to exclude is in duplicate.positions.excludesnps.txt on RC.
+  -  A README.txt file located on RC contains the steps used and additional information.