genetic-correlations-in-ldsc

Michel Nivard

Genetic correlation in LD score regression

This lecture:

the LDSC genetic correlation estimator
Making sure the LD score you use, are appropriate for the cohort/GWAS you have,
The Practical!

Get the data for the practical

Lets start grabbing the data, its a bit more data then some of the other practicals.

Code is in the forum (“ISG Workshop logisitcs”), and in the shared files

system("mkdir $HOME/michel")
system("cp -R /faculty/michel/2024/practical* $HOME/michel")
setwd("~/michel/practical")

Genetic correlation in LD score regression

Univariate LDSC:

\[Z_{j}^2 = \frac{ N * h^2_{snp} }{M}*l_j + 1 + Na\]

Bivariate LDSC:

\[Z_{1j} *Z_{2j} = \frac{ \sqrt{N_{1} N_{2}} * cov_g }{M}*l_j + \frac{cor_p * N_s}{\sqrt{N_{1} N_{2}}} + \sqrt{N_{1} N_{2}}*cor(a_1,a_2)\]

Bulik-Sullivan, B., Finucane, H. K., Anttila, V., Gusev, A., Day, F. R., Loh, P. R., … & Neale, B. M. (2015). An atlas of genetic correlations across human diseases and traits. Nature genetics, 47(11), 1236-1241.

LDSC as a regression

Bivariate LDSC:

\[Z_{1j} *Z_{2j} = \frac{ \sqrt{N_{1} N_{2}} * cov_g }{M}*l_j + \frac{cor_p * N_s}{\sqrt{N_{1} N_{2}}} + \sqrt{N_{1} N_{2}}*cor(a_1,a_2)\]

It’s just a regression:

\[y_j = b*l_j + a\]

Data

GWAS sumstats for traits 1 and 2 for SNP j:

\[Z_{1j} *Z_{2j}\]

Audience Q: “What is this Z?”

The LD score, a per SNP measure of the size of the part of the genome tagged by SNP j:

\[l_i\]

Parameters

\[ b = \frac{\sqrt{N_{1} N_{2}} * cov_g}{M} = slope\]

\[ a = \frac{cor_p * N_s}{\sqrt{N_{1} N_{2}}} + a = intercept\]

Works for GWAS that do not overlap!

BUT!

Its a regression where the “outcome” (GWAS results) and the “predictor” (LD score) come form different sources…

That’s really weird… and it puts a little bit of responsibility on the user to ensure the variables are from a common population…

A look at variation in the LD map

Salehi Nowbandegani, P., Wohns, A. W., Ballard, J. L., Lander, E. S., Bloemendal, A., Neale, B. M., & O’Connor, L. J. (2023). Extremely sparse models of linkage disequilibrium in ancestrally diverse association studies. Nature Genetics, 55(9), 1494-1502.

So we want to match ancestry in the LDscore to the GWAS sample

Use ancestry specific LD scores, for example:

GNOMAD: https://gnomad.broadinstitute.org/downloads#v2-linkage-disequilibrium
UKBiobank Pan ancestry: https://pan-dev.ukbb.broadinstitute.org/downloads

What if I still don’t really think their appropriate for my sample?

Make your won LD scores: https://github.com/bulik/ldsc/wiki/LD-Score-Estimation-Tutorial
(You need sequencing data!)

Can we use LDSC in admixed samples?

Many people have multiple ancestries

How can we use cov-LDSC in admixed samples?

Use “covariate adjusted” LD scores in admixed samples.
cov-LDSC software: https://github.com/immunogenomics/cov-ldsc

What if you want to compute the rg between GWASs in different ancestries?

There is a tool called popcorn that models the differences in LD and AF and rg at the same time
https://github.com/brielin/Popcorn

What if you want to compute the rg between GWASs in different ancestries?

BUT it obviously also pulls in all cultural, cohort, measurement differences between your GWASs!
Solutions: compute rg between traits, within an admixed person correlating the genetic effects betwene their two ancestries.
Acessible explanation: https://www.nature.com/articles/s41588-023-01325-x
Original paper: https://www.nature.com/articles/s41588-023-01338-6
Software: https://github.com/KangchengHou/admix-kit

Popcorn vs Admix-kit

Let’s look at the practical for today!

we have GWAS data and LD scores of MDD and BMI for people of European and east-Asian ancestry
What is YOUR estimand? (what do YOU want to know?)
We estimate the rg’s using LDSC (estimator)
We see if we can learn about your estimand using our estimates?