flowchart LR A(common cause) --> D(Estimand correlation) B(BMI -> Dep) --> D C(Dep -> BMI) --> D D --> E[estimator] E --> H[Estimate correlation] F[sampling] --> H[Estimate correlation] G[measurement] --> H[Estimate correlation]
Today will cover ways to model the genetic covariance, and correlation, between two, or more traits.
This hour will cover:
\[cov_{x,y} = \sum_{i = 1}^{n}{\frac{(x_i-\color{blue}{\bar{x}})*(y_i-\color{red}{\bar{y}})}{N-1}}\]
\[var_{x} = \sum_{i = 1}^{n}{\frac{(x_i-\color{blue}{\bar{x}})*(x_i-\color{red}{\bar{x}})}{N-1}}\]
\[var_{x} = \sum_{i = 1}^{n}{\frac{(x_i-\color{blue}{\bar{x}})^2}{N-1}}\]
\[cor_{x,y} = {\frac{cov_{x,y}}{\sqrt{var_x * var_y}}}\]
\[p1 = a1 + c1 + e1\]
\[p2 = a2 + c2 + e2\] \[r_g = cor(a1,a2)\]
\[p1_i = \sum_{j = 1}^{m}{(b1_j*snp_j)} +e_i\] \[p2_i = \sum_{j = 1}^{m}{(b2_j*snp_j)} +e_i\]
\[r_g = cor(b_1,b_2)\]
“are risk for depression and BMI genetically correlated?”
“we will apply LD score regression to two sets of GWAS summary data form two different consortia, that studied BMI and MDD”
“The estimate of the genetic correlation between the PGC MDD, and GIANT BMI GWASs, Using LD score regression is 0.09”
An estimand is a quantity that is to be estimated in a statistical analysis. The term is used to distinguish the target of inference (estimand) from the method used to obtain an approximation of this target (i.e., the estimator) and the specific value obtained from a given method and dataset (i.e., the estimate).
flowchart LR A(common cause) --> D(Estimand correlation) B(BMI -> Dep) --> D C(Dep -> BMI) --> D D --> E[estimator] E --> H[Estimate correlation] F[sampling] --> H[Estimate correlation] G[measurement] --> H[Estimate correlation]
flowchart LR B(BMI -> Dep) --> D(Estimand) A(common cause) --> H[Estimate correlation] C(Dep -> BMI) --> H D --> E[estimator] E --> H F[sampling] --> H G[measurement] --> H
flowchart LR D(Estimand correlation) --> G[Estimate] E[sampling] --> G[Estimate] F[measurement] --> G[Estimate]
There are some very specific causes of correlation we need to discuss:
Over-sample cases in a schizophrenia GWAS (because its rare)
Target a study at a specific populations with specific health needs
You will need to adjust your estimator of \(h^2\)!!
participants who social economic position is fragile might not have the time to spare for a day long lab study at a location that has poor access via public transport
Elderly people might only respond to email if their 1. online 2. able too
level of institutional trust may influence people’s willingness to consent
Why would I care?
It will bias all(!!) statistical estimates and inference
There is a long causal chain between population and sample
if: outcome1 -> ascertainment & outcome2 -> ascertainment
in the ascertained sample outcome1 and outcome2 will correlate!
Why do people feel their more attractive partners where also more toxic?
Maybe its true? (maybe it effects my estimand)
Or is it collider bias? (or it effects my estimate)
(latent) common cause
causal relation between two traits
flowchart TB D(SNP) --> G[astma] D(SNP) --> H[stress]
flowchart TB D(SNP) --> E[smoking] E --> G[astma] E --> H[stress]
flowchart LR D(SNP) --> E[lung cancer]
flowchart LR C(SNP) --> D[smoking] --> E[lung cancer]
\[p1 = a1 + c1 + e1\]
\[p2 = a2 + c2 + e2\]
\[r_g = cor(a1,a2)\]
\[Vp1_{mz} = Va1 + Vc1 + Ve1\]
\[cov(p1_{mz1},p1_{mz2}) = Va1 + Vc1\]
\[Vp1 = Va1 + Vc1 + Ve1\] \[Vp2 = Va2 + Vc2 + Ve2\] \[cov(p1_{mz1},p2_{mz2}) = Coc(a1,a2) + Cov(c1,c2)\]
\[p1_i = \sum_{j = 1}^{m}{(b1_j*snp_j)} +e_i\] \[p2_i = \sum_{j = 1}^{m}{(b2_j*snp_j)} +e_i\]
\[r_g = cor(b_1,b_2)\] ## genetic correlations
flowchart TB D(latent_variable) --> E[Depression] D --> G[Anxiety] D --> H[PTSD]
flowchart TB A(A) --> D(latent_variable) B(E) --> D(latent_variable) D --> E[Depression] D --> G[Anxiety] D --> H[PTSD]
Or…
flowchart TB D(E) --> F[Depression] D --> G[Anxiety] D --> H[PTSD] E(A) --> F[Depression] E --> G[Anxiety] E --> H[PTSD]