Slack Export - #day04-heritability-and-gcta

Jeff Lessem (he/him) (jeff.lessem@colorado.edu)

2021-04-30 09:45:30

@Jeff Lessem (he/him) has joined the channel

Loic Yengo (l.yengo@uq.edu.au)

2021-04-30 09:48:29

@Loic Yengo has joined the channel

Valentin Hivert (v.hivert@imb.uq.edu.au)

2021-04-30 09:48:29

@Valentin Hivert has joined the channel

Julia (j.sidorenko@imb.uq.edu.au)

2021-04-30 09:48:30

@Julia has joined the channel

Yuna Zhang (yuanxiang.zhang@uq.edu.au)

2021-04-30 09:48:30

@Yuna Zhang has joined the channel

Zhihong Zhu (z.zhu@econ.au.dk)

2021-04-30 09:48:30

@Zhihong Zhu has joined the channel

Steven Gazal (gazal@usc.edu)

2021-04-30 09:49:48

@Steven Gazal has joined the channel

Gunn-Helen Moen (g.moen@uq.edu.au)

2021-05-03 13:13:42

@Gunn-Helen Moen has joined the channel

Mark Adams (mark.adams@ed.ac.uk)

2021-05-04 02:16:46

@Mark Adams has joined the channel

P Wainschtein (p.wainschtein@imb.uq.edu.au)

2021-05-06 01:01:58

@P Wainschtein has joined the channel

Test Student (test-student@ibg.colorado.edu)

2021-05-06 11:38:58

@Test Student has joined the channel

Bridget Joyner (bnj13@my.fsu.edu)

2021-05-10 13:01:19

@Bridget Joyner has joined the channel

Sally Kuo (ickuo@vcu.edu)

2021-05-10 13:30:20

@Sally Kuo has joined the channel

Aislinn Bowler (aislinnbowler@gmail.com)

2021-05-10 13:30:27

@Aislinn Bowler has joined the channel

Morgan Driver (driverm@vcu.edu)

2021-05-10 13:31:04

@Morgan Driver has joined the channel

Sarah Brislin (she/her) (sarah.brislin@gmail.com)

2021-05-10 13:31:37

@Sarah Brislin (she/her) has joined the channel

Lisa Dinkler (lisa.dinkler@gu.se)

2021-05-10 13:31:43

@Lisa Dinkler has joined the channel

Katie Bountress (kaitlin.bountress@vcuhealth.org)

2021-05-10 13:32:21

@Katie Bountress has joined the channel

Peter Tanksley (peter.tanksley@austin.utexas.edu)

2021-05-10 13:32:32

@Peter Tanksley has joined the channel

Tong Chen (tuc548@psu.edu)

2021-05-10 13:34:05

@Tong Chen has joined the channel

Charlotte Viktorsson (viktorsson.charlotte@gmail.com)

2021-05-10 13:34:34

@Charlotte Viktorsson has joined the channel

Jacob Kunkel (kunke104@umn.edu)

2021-05-10 13:35:32

@Jacob Kunkel has joined the channel

Matthieu de Hemptinne (matthieu.dehemptinne@gmail.com)

2021-05-10 13:36:01

@Matthieu de Hemptinne has joined the channel

Jay Ross (jay.ross@mail.mcgill.ca)

2021-05-10 13:38:34

@Jay Ross has joined the channel

Sam Freis (she/her) (Samantha.Freis@colorado.edu)

2021-05-10 13:38:42

@Sam Freis (she/her) has joined the channel

Jeremy Elman (jaelman@health.ucsd.edu)

2021-05-10 13:38:56

@Jeremy Elman has joined the channel

Spencer Moore (spmo3925@colorado.edu)

2021-05-10 13:39:53

@Spencer Moore has joined the channel

Maizy Brasher (mabr7162@colorado.edu)

2021-05-10 13:39:53

@Maizy Brasher has joined the channel

Jenny Phan (jphan5@wisc.edu)

2021-05-10 13:39:58

@Jenny Phan has joined the channel

Meng Huang (meng.huang.cn@gmail.com)

2021-05-10 13:41:18

@Meng Huang has joined the channel

Jung Chen (jchen378@ucmerced.edu)

2021-05-10 13:41:58

@Jung Chen has joined the channel

Stephanie Zellers (she/her/hers) (zelle063@umn.edu)

2021-05-10 13:42:17

@Stephanie Zellers (she/her/hers) has joined the channel

Grace Wu (yakew@email.unc.edu)

2021-05-10 13:42:32

@Grace Wu has joined the channel

Gladi Thng (s2124928@ed.ac.uk)

2021-05-10 13:43:47

@Gladi Thng has joined the channel

Zoe Schmilovich (zoe.schmilovich@mail.mcgill.ca)

2021-05-10 13:43:51

@Zoe Schmilovich has joined the channel

Olivia Rennie (olivia.rennie@alum.utoronto.ca)

2021-05-10 13:43:58

@Olivia Rennie has joined the channel

Christina Sheerin (Christina.sheerin@vcuhealth.org)

2021-05-10 13:43:59

@Christina Sheerin has joined the channel

William McAuliffe (williamhbmcauliffe@gmail.com)

2021-05-10 13:44:17

@William McAuliffe has joined the channel

Chloe Myers (cmyer011@ucr.edu)

2021-05-10 13:44:20

@Chloe Myers has joined the channel

Francis Vergunst (he/him) (francis.vergunst@umontreal.ca)

2021-05-10 13:44:33

@Francis Vergunst (he/him) has joined the channel

Ravi Bhatt (ravibot93@gmail.com)

2021-05-10 13:44:48

@Ravi Bhatt has joined the channel

Nathan Bell (n.y.bell@student.vu.nl)

2021-05-10 14:46:30

@Nathan Bell has joined the channel

Emil Uffelmann (e.uffelmann@vu.nl)

2021-05-10 14:46:48

@Emil Uffelmann has joined the channel

Kristen Kelly (k.m.kelly@vu.nl)

2021-05-10 14:47:50

@Kristen Kelly has joined the channel

Jeff Lessem (he/him) (jeff.lessem@colorado.edu)

2021-06-08 15:24:36

@Jeff Lessem (he/him) has renamed the channel from "heritability-and-gcta" to "day04-heritability-and-gcta"

Loic Yengo (l.yengo@uq.edu.au)

2021-06-08 16:24:37

GCTA/LDSC practical can be found here: /home/loic/2021/PracticalGuidelines/Day4practical_Boulder2021.**

Loic Yengo (l.yengo@uq.edu.au)

2021-06-08 16:25:03

I'll send a reminder tomorrow.

Sarah Medland (she/her) (sarahme@qimr.edu.au)

2021-06-08 16:25:39

Correction /faculty/loic/2021/PracticalGuidelines/Day4practical_Boulder2021.**

Loic Yengo (l.yengo@uq.edu.au)

2021-06-08 16:31:46

Thanks Sarah!

Loic Yengo (l.yengo@uq.edu.au)

2021-06-08 16:34:54

Can be found here too: https://docs.google.com/document/d/1OZilaO1GhV2vz5iAm4wl-TyiV2LxezBzhecBXxpHmKc/edit?usp=sharing

Day4 - Heritability (GCTA + LDSC) Practical - Boulder Workshop

Jason Freeman (jfreeman@towson.edu)

2021-06-09 12:45:43

In one video you mentioned there are various ways to measure GRM. Do these various means of measuring GRM lead to different heritability estimates? If so, how do we know which measures of heritability are most accurate?

Loic Yengo (l.yengo@uq.edu.au)

2021-06-09 17:16:53

*Thread Reply:* Hi Jason, this is great question. The answer is "Yes" how you measure genetic relatedness (GRM) affects your heritability estimates. Unfortunately, the truth depends on things that we don't know or observed such as 1) causal variants and 2) what is the relationship between SNP effects and allele frequencies. 1) and 2) are often referred to as the "genetic architecture" of the trait or the disease. So how do we know, which one is accurate? Well, methods such as the LDMS (MAF and LD stratified) provide a way to get unbiased estimates (Evans et al. 2018; Pubmed ID = 29700474).

👍 Mark Adams

Jason Freeman (jfreeman@towson.edu)

2021-06-09 20:19:05

*Thread Reply:* Thanks!

Jason Freeman (jfreeman@towson.edu)

2021-06-09 14:42:47

Other than increased computational efficiency, are there any other practical reasons to choose GREML over Haseman-Elston regression for estimating heritability? And are the heritabilities largely identical using both methods?

Loic Yengo (l.yengo@uq.edu.au)

2021-06-09 17:21:32

*Thread Reply:* Another great question! HE and GREML are largely consistent in general. Differences can occur when sample size is not large enough or when the trait is not normally distributed. If possible running both can teach you something interesting about your data.

Jason Freeman (jfreeman@towson.edu)

2021-06-09 20:19:16

*Thread Reply:* Thanks again!

👍 Loic Yengo

matthew keller (matthew.c.keller@gmail.com)

2021-06-09 21:18:24

*Thread Reply:* if there is assortative mating on the trait (leading to long-range gametic disequilibrium), HE regression and GREML behave quite differently. Both are upwardly biased for realistic sample sizes (e.g., n < 100K) but GREML estimates go down as a function of n, asymptoting at h2_time0 as n -> inf whereas HE stay consistently high. See this preprint: https://www.biorxiv.org/content/10.1101/2021.03.18.436091v1.full

bioRxiv

Assortative Mating Biases Marker-based Heritability Estimators

Many complex traits are subject to assortative mating (AM), with recent molecular genetic findings confirming longstanding theoretical predictions that AM alters genetic architecture by inducing long range dependence across causal variants. However, all marker-based heritability estimators assume mating is random. We provide mathematical and simulation-based evidence demonstrating that both method-of-moments estimators and likelihood-based estimators produce biased estimates in the presence of AM and that common approaches to account for population structure fail to mitigate this bias. Then, examining height and educational attainment in the UK Biobank, we demonstrate that these biases affect real world traits. Finally, we derive corrected heritability estimators for traits under equilibrium AM. ### Competing Interest Statement The authors have declared no competing interest.

Original URL: https://www.biorxiv.org/content/10.1101/2021.03.18.436091v1.full

😍 Anna Furtjes

Alex Bloemendal (he/him) (bloem@broadinstitute.org)

2021-06-10 08:27:40

*Thread Reply:* Also, GREML is downwardly biased when applied to (ascertained) case-control data: https://www.pnas.org/content/111/49/E5272.short

PNAS

Measuring missing heritability: Inferring the contribution of common variants

Studies have identified thousands of common genetic variants associated with hundreds of diseases. Yet, these common variants typically account for a minority of the heritability, a problem known as “missing heritability.” Geneticists recently proposed indirect methods for estimating the total heritability attributable to common variants, including those whose effects are too small to allow identification in current studies. Here, we show that these methods seriously underestimate the true heritability when applied to case–control studies of disease. We describe a method that provides unbiased estimates. Applying it to six diseases, we estimate that common variants explain an average of 60% of the heritability for these diseases. The framework also may be applied to case–control studies, extreme-phenotype studies, and other settings.

Original URL: https://www.pnas.org/content/111/49/E5272.short

Alex Bloemendal (he/him) (bloem@broadinstitute.org)

2021-06-10 08:32:30

*Thread Reply:* This followup is interesting too: https://www.cell.com/ajhg/fulltext/S0002-9297(18)30195-2

Penelope Lind (penelope.lind@qimrberghofer.edu.au)

2021-06-09 20:16:41

@Loic Yengo I was trying to go through the Part 2: LD score regression tutorial and I don't have read permissions for the files in LDSCREF/baselineLDv2.2/ (I wanted to look at a chromosome 1 log file for the 11-b2 question). Also, the 11-c command fails for me. I'm not sure if this is related to the file permissions?

Jeff Lessem (he/him) (jeff.lessem@colorado.edu)

2021-06-09 20:17:28

*Thread Reply:* Yeah, I broke it, but I'm fixing it now

Penelope Lind (penelope.lind@qimrberghofer.edu.au)

2021-06-09 20:21:23

*Thread Reply:* I just realised 11-c had /data/ in the path which I removed and it's working now. I should have paid more attention!

Jeff Lessem (he/him) (jeff.lessem@colorado.edu)

2021-06-09 20:22:20

*Thread Reply:* That is what we want, the students should be reading the files from /data, but writing to their own directory

Jeff Lessem (he/him) (jeff.lessem@colorado.edu)

2021-06-09 20:23:10

*Thread Reply:* The files a very large, so we need to avoid having all of the students copy them locally. I did put in some symlinks, so it might work with the default paths

🙂 Lucía Colodro-Conde

Penelope Lind (penelope.lind@qimrberghofer.edu.au)

2021-06-09 20:25:55

*Thread Reply:* That makes sense now. It now works with the /data path. Thank you.

Loic Yengo (l.yengo@uq.edu.au)

2021-06-09 21:42:08

*Thread Reply:* Thanks Jeff for fixing this and Sorry for the inconvenience, Penelope

Jason Freeman (jfreeman@towson.edu)

2021-06-09 20:23:41

One more question. How would I determine if the number of cases and/or the number of SNPs in my dataset are appropriate for doing a GREML (OR HE) analysis? In other words, how do I know if I have enough cases and/or SNPs to get accurate estimates of heritability?

Julia (j.sidorenko@imb.uq.edu.au)

2021-06-09 20:53:01

*Thread Reply:* For a sample size, you can try a power calculator: https://shiny.cnsgenomics.com/gctaPower/

👍 Loic Yengo, Mark Adams

Julia (j.sidorenko@imb.uq.edu.au)

2021-06-09 20:55:31

*Thread Reply:* or maybe this one: https://cnsgenomics.com/software/gcta/#GREMLpowercalculator

👍 Loic Yengo

Jason Freeman (jfreeman@towson.edu)

2021-06-10 06:55:07

*Thread Reply:* Thanks!

Loic Yengo (l.yengo@uq.edu.au)

2021-06-09 22:34:56

@channel: here's tomorrow's practical. It has been updated since yesterday.

Day4_practical_Boulder2021_v2.pdf

Loic Yengo (l.yengo@uq.edu.au)

2021-06-09 22:35:35

https://docs.google.com/document/d/1OZilaO1GhV2vz5iAm4wl-TyiV2LxezBzhecBXxpHmKc/edit?usp=sharing

Day4 - Heritability (GCTA + LDSC) Practical - Boulder Workshop

Guiomar Masip (guiomar.masip-manuel@helsinki.fi)

2021-06-10 04:24:02

Hi, I have a question @Loic Yengo, regarding to the videos. I don't really understand the importance of estimating h2 SNPs. For example, we can know h2 from twin and family studies, and we can also know the variance explained from a polygenic risk, then why should we calculate h2 SNPs, if we can directly know for example the genetic variance of a trait from PRSs? - has this variance explained from PRSs something in common with h2 SNPs? - Actually we know from recent studies that the % of variance explained from PRSs that use genetic variants irrespective of genome-wide significance can be similar to h2 estimates from twin and family studies. Thanks ;)

Loic Yengo (l.yengo@uq.edu.au)

2021-06-10 06:22:21

*Thread Reply:* Hi Guiomar! Thanks for the great question. h2SNP gives you an upper bound for the prediction accuracy of your PRS. In other words, h2SNP > R2PRS. So why would you like to know that upepr bound? Well, it could be useful to evaluate how much more information contained in the observed SNPs you may still be missing to improve your prediction. Also, estimating h2_SNP could be a first hint on a trait heritability when there is no twin study around. Hope these two examples speak to you. Cheers, L

Guiomar Masip (guiomar.masip-manuel@helsinki.fi)

2021-06-10 06:34:11

*Thread Reply:* great thanks! your response is really helpful

matthew keller (matthew.c.keller@gmail.com)

2021-06-10 06:52:58

*Thread Reply:* I agree with both of Loic’s reasons. Let me give a couple more. First, one can look at the relative importance of different annotations using GREML or LDSC (e.g., variants in genes expressed in CNS, variants that are conserved, etc.). That’s impossible to do with twin/family studies and under-powered in PRS studies. Second, one can investigate genetic correlations between traits that are impossible to look at in twin/family data, either because the traits haven’t been measured in them, the data isn’t available to you (twin data still tends to be proprietary whereas this isn’t so for much GWAS data), or the rg is between traits that are mutually exclusive or too rare to co-occur within families

Guiomar Masip (guiomar.masip-manuel@helsinki.fi)

2021-06-10 08:58:42

*Thread Reply:* thanks Matthew for your points

Laura (lhaver01@mail.bbk.ac.uk)

2021-06-10 04:46:16

@Loic Yengo You mention in video 3 that certain siblings will be more genetically related than others due to recombination — could you expand on that a bit please? I have never quite got my head around how recombination works!

👍 Aislinn Bowler

Loic Yengo (l.yengo@uq.edu.au)

2021-06-10 05:45:34

*Thread Reply:* Hi Laura, sure. The degree of DNA sharing between sibling is ~0.5. However, if we directly measure the proportion of DNA segments that are identical by descent between siblings, we see that this proportion actually varies between say ~0.35 and ~0.65 (check Fig.1 in this paper: https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.0020041). The reason is that during meiosis DNA from each parent is recombined in different ways before being passed on to each offspring. Therefore, recombination is the phenomenon responsible for within family variation in IBD. Hope this clarifies, a little bit.

journals.plos.org

Assumption-Free Estimation of Heritability from Genome-Wide Identity-by-Descent Sharing between Full Siblings

The study of continuously varying, quantitative traits is important in evolutionary biology, agriculture, and medicine. Variation in such traits is attributable to many, possibly interacting, genes whose expression may be sensitive to the environment, which makes their dissection into underlying causative factors difficult. An important population parameter for quantitative traits is heritability, the proportion of total variance that is due to genetic factors. Response to artificial and natural selection and the degree of resemblance between relatives are all a function of this parameter. Following the classic paper by R. A. Fisher in 1918, the estimation of additive and dominance genetic variance and heritability in populations is based upon the expected proportion of genes shared between different types of relatives, and explicit, often controversial and untestable models of genetic and non-genetic causes of family resemblance. With genome-wide coverage of genetic markers it is now possible to estimate such parameters solely within families using the actual degree of identity-by-descent sharing between relatives. Using genome scans on 4,401 quasi-independent sib pairs of which 3,375 pairs had phenotypes, we estimated the heritability of height from empirical genome-wide identity-by-descent sharing, which varied from 0.374 to 0.617 (mean 0.498, standard deviation 0.036). The variance in identity-by-descent sharing per chromosome and per genome was consistent with theory. The maximum likelihood estimate of the heritability for height was 0.80 with no evidence for non-genetic causes of sib resemblance, consistent with results from independent twin and family studies but using an entirely separate source of information. Our application shows that it is feasible to estimate genetic variance solely from within-family segregation and provides an independent validation of previously untestable assumptions. Given sufficient data, our new paradigm will allow the estimation of genetic variation for disease susceptibility and quantitative traits that is free from confounding with non-genetic factors and will allow partitioning of genetic variation into additive and non-additive components.

Original URL: https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.0020041

Nathan Bell (n.y.bell@student.vu.nl)

2021-06-10 05:43:50

@Loic Yengo if you get different heritability estimates when doing HE regression in GCTA for HE-CP and HE-SD how would you interpret that and what would be the next steps (if any) to follow up?

Loic Yengo (l.yengo@uq.edu.au)

2021-06-10 05:49:40

*Thread Reply:* Hi Nathan, good question. I'm not sure how much degree of difference do you talk about. The estimates may not be identical but they should be largely similar by design. Do you have a particular example to share?

Nathan Bell (n.y.bell@student.vu.nl)

2021-06-10 06:07:31

*Thread Reply:* No I was just curious what you would do in case there was a difference (or if it's possible to have a meaningful difference)

matthew keller (matthew.c.keller@gmail.com)

2021-06-10 07:14:18

*Thread Reply: They should be broadly similar but not necessarily. A couple of reasons off the top of my head they can differ: (1) HE typically doesn’t use the diagonals of the GRM whereas GREML does. For common variants, this doesn’t matter much because there are only n diagonals but (n*(n-1))/2 off diagonals, so MUCH more information in the off-diagonals. However, as you move to rare variants and/or when there is much ancestry structure in your sample, this can be counter-balanced by the variance of the diagonals getting much higher (we’re talking 1000s to 100k fold higher!) than the variance of the off-diagonals. This is a problem with the current way that GCTA figures diagonals which I don’t think is well appreciated yet. (2) As I noted in a response above, if there is assortative mating on the trait (leading to long-range gametic disequilibrium), HE regression and GREML behave quite differently. Both are upwardly biased for realistic sample sizes (e.g., n < 100K) but GREML estimates go down as a function of n, asymptoting at h2_time0 as n -> inf whereas HE stay consistently high. See this preprint: https://www.biorxiv.org/content/10.1101/2021.03.18.436091v1.full

bioRxiv

Assortative Mating Biases Marker-based Heritability Estimators

Original URL: https://www.biorxiv.org/content/10.1101/2021.03.18.436091v1.full

Nathan Bell (n.y.bell@student.vu.nl)

2021-06-10 08:14:35

*Thread Reply:* Thank you!

Uku Vainik (ukuvainik@gmail.com)

2021-06-10 06:55:55

@Loic Yengo Great videos! It was very cool to see using distantly related people to get maximally unbiased heritability estimates. Does this approach get to test the equal environments assumption? Has the EEA survived such tests?

matthew keller (matthew.c.keller@gmail.com)

2021-06-10 07:08:50

*Thread Reply:* I’ll jump in and hopefully Loic does as well. It’s a good question. By equal env. assumption (EEA), I take you to mean the equal twin env. assumption (that rgenv for MZs = rgenv for DZs). GREML/HE don’t directly test that assumption, but do by extension. h2 estimates from GREML/HE are expected to be lower to the degree that causal variants aren’t tagged by SNPs used to build the GRM (in unrelated samples), and should therefore be a lower bound of the twin/family h2. Thus, observations of h2 in GREML/HE suggest that at least some of the h2 in trait X cannot be explained by violations in the EEA. As we move to using sequence data (and figure out the proper ways to perform GREML/HE in sequence data, which isn’t trivial!), h2 from GREML/HE should approach the full narrow-sense h2, and at that point we’ll get a clearer picture of the degree to which twin/family estimates have been biased this whole time. I strongly suspect they will end up being a bit biased depending on the trait, but little of this bias will be due to violations of EEA

Uku Vainik (ukuvainik@gmail.com)

2021-06-10 07:23:06

*Thread Reply:* Thank you! You unpacked this very well. Does limiting participants to distantly related improve the bias further?

Michel Nivard (m.g.nivard@vu.nl)

2021-06-10 06:59:37

@channel this is the slack channel for today!

Michel Nivard (m.g.nivard@vu.nl)

2021-06-10 07:03:38

How can we order the mug?

😍 Giulio Centorame, Anna Furtjes

☀ Lucía Colodro-Conde

matthew keller (matthew.c.keller@gmail.com)

2021-06-10 07:15:10

*Thread Reply:* I’m a Loic fan. I’d buy one!

:star_struck: Lucía Colodro-Conde

Barbara Molz (Barbara.Molz@mpi.nl)

2021-06-10 07:34:17

I have a quite specific LDSC question as we sometimes see (significant) negative heritability results when using part. heritability and we struggle to find a proper explanation. So currently we rather think that these results might be biased by a small proportion of SNPs in some of our custom annotations. Also, --h2-cts seems to use a one sided test compared to the standard --h2 flag. Does this have a distinct reason?

Loic Yengo (l.yengo@uq.edu.au)

2021-06-10 07:40:08

*Thread Reply:* @Steven Gazal any suggestion?

matthew keller (matthew.c.keller@gmail.com)

2021-06-10 08:04:21

*Thread Reply:* I’m curious to hear people’s thoughts on this one too

Steven Gazal (gazal@usc.edu)

2021-06-10 09:09:25

*Thread Reply:* Hi! • regarding significant negative heritability, I am not sure I have a good explanation here... I think you broke the model with some annotation deeply depleted for heritability, which will have a negative regression coefficient and thus a negative heritability. • regarding h2-cts, yes it is dong a one test comparison as it s specifically looking for cell-type enriched in h2, and thus having a regression coeff >0. I think the -h2 flag only outputs z scores and not P value, so that it let you decide if you want to do a 1 or 2 sided test Does this help?

Kazuki Okubo (okubo-kazuki087@g.ecc.u-tokyo.ac.jp)

2021-06-10 08:04:30

For [question-2 in exercise-2] higher variance of diagonal elements for GRM based on MAF < . 05 Could the difference in frequency itself have effect on this variance other than the effect of the number of variants used?

Priyadarshini Thirunavukkarasu (galaxie2485@yahoo.co.in)

2021-06-10 08:36:34

@Loic Yengo In exercise 4, we are estimating heritability without relatives and with relatives. Inflated heritability observed in samples with relatives is not due to shared genetic factors?. Thanks

Priyadarshini Thirunavukkarasu (galaxie2485@yahoo.co.in)

2021-06-10 08:39:46

@Loic Yengo What are the three sets of SNPs used for LDSC analysis? Why do we use three sets of SNPs and what is their relevance in LDSC analysis?. Thanks

Loic Yengo (l.yengo@uq.edu.au)

2021-06-10 08:47:07

*Thread Reply:* Great question, Priyadarshini! Steven redefined the three sets in the first "Background" section of the practical. In brief, one set is the set of SNPS directly used to run the regression. Here, we have ~1M of them. The over sets refer to how LD scores were calculated. To calculate the LD score of each of these ~1M SNPs you need to sum the r^2 with the neighbouring SNPs. Depending on how many neibouring SNPs you use you can end up with say ~10M (including ~6M with a MAF>5%). So why? We don't need to many SNPs to get the regression right so ~1M is enough. However, to capture the right amount of variation we may need to calculate LD scores relative to more dense sequenced SNPs (e.g., 10M). @Steven Gazal, you want to add something here?

Steven Gazal (gazal@usc.edu)

2021-06-10 11:01:24

*Thread Reply:* Nothing to add! It is important to keep in mind that while you are doing a regression on "only" 1M SNPs, you are modeling the effects of 10M SNPs through the LD scores computed on a reference panel with 10M sequenced SNPs, and you are reporting effects on common reference SNPs (~6M). This is different from GCTA where you are estimating h2 tagged by all the SNPs that are in your data.

Jet Termorshuizen (jet.termorshuizen@ki.se)

2021-06-10 09:02:38

Hi! About the LDSC part of the tutorial: why is the heritability estimate when using stratified LDSC higher compared to when we're not stratifying the LDSC analysis? And why should we trust the stratified LDSC estimate more?

👍 Anna Furtjes

Michel Nivard (m.g.nivard@vu.nl)

2021-06-10 09:12:32

*Thread Reply:* I hope @Steven Gazal weights in as well or @Loic Yengo but ill try my best at an awnser. If we run basic (unstratified) LD score regression we are basically assuming a model where the true effect SNPs are distributed across the entire genome almost uniformly and only LD determines how much effect wd'll expec tto observe at any SNPS (more LD and you tag more causal SNPs). However in reality the true effect SNPs can be expect to be found more in some parts on the genome (in or near genes for example) then in others. the annotation used are based on prior ideas about where in the genome we might expect causal SNPs to be found, those ideas are reasonable so the baseline model is likely a better description of where in the genome true causal SNPs are and can be expected to provide a slightly better h2 estimate.. Now which model is better requires additional evaluation (esp if you compared competing sets of annotation in your stratified LD score analysis) and I dont know whether there is a best practice way to compare models (internally we have used cross validation to test which among various stratified LDSC models fits best)

Michel Nivard (m.g.nivard@vu.nl)

2021-06-10 09:13:09

*Thread Reply:* dont trust my answer fully untill you hear from the others!

Steven Gazal (gazal@usc.edu)

2021-06-10 10:58:23

*Thread Reply:* You're good! It is known that 1) per-SNP heritability strongly varies according to MAF and LD (SNPs with high MAF and/or low LD explain more variance); 2) these MAF- and LD-architectures bias heritability estimates. The second model includes MAF and LD annotations that help to correct for this bias.

Steven Gazal (gazal@usc.edu)

2021-06-10 10:58:27

*Thread Reply:* Does this help?

Jet Termorshuizen (jet.termorshuizen@ki.se)

2021-06-11 02:54:21

*Thread Reply:* Yes, it helps! One more follow-up question: you describe that we should not use LDSC heritability estimates, but that heritability estimates of the baseline-LD model can be used with extreme caution. Do you recommend to not use those stratified LDSC heritability estimates at all or would it be informative to report both heritability estimates from GCTA and S-LDSC?

Ciarrah-Jane Barry (ciarrah.barry@bristol.ac.uk)

2021-06-10 09:20:36

When would you use LDSR over GREML-SNP?

matthew keller (matthew.c.keller@gmail.com)

2021-06-10 09:25:07

*Thread Reply:* If you have the raw data in hand and computational issues (in terms of RAM and computational time) are not an issue, then I’d go with GREML. But - that’s not the world we live in. The advantage of LDSR over GREML (and why it’s become more used) is that one can use LDSR without having the raw data (just having the sumstats) and that it can be done MUCH more computationally cheaply than GREML

Rob Kirkpatrick (robert.kirkpatrick@vcuhealth.org)

2021-06-10 09:27:55

*Thread Reply:* This paper, https://www.nature.com/articles/ng.3941 , contains some discussion that's relevant to your question.

Nature Genetics

Concepts, estimation and interpretation of SNP-based heritability

Jian Yang and colleagues explore the uses and abuses of heritability estimates derived from pedigrees and from GWAS SNPs and make recommendations for best practice in future applications of SNP-based heritability.

Original URL: https://www.nature.com/articles/ng.3941

Abigail ter Kuile (k1456980@kcl.ac.uk)

2021-06-10 10:05:18

Thanks for the great lectures on heritability and GCTA, the topics were broken down in a really clear way! Are there any power calculations to estimate genetic correlations in LDSC regression? GCTA-GREML has a great power calculator, but as LDSC regression requires more power, is there another tool that can be used specifically for LDSC regression?

Abigail ter Kuile (k1456980@kcl.ac.uk)

2021-06-10 10:12:30

*Thread Reply:* In relation to this, are there any power calculations for Genomic SEM, and would this be different when using LDSC vs HDL in Genomic SEM? I'm aware that a LDSC SNP-heritability Z score less than 5 is an indicator that the GWAS summary statistics might not be powered enough for gSEM analyses.. could you use this threshold for a power calculation? @Michel Nivard

Michel Nivard (m.g.nivard@vu.nl)

2021-06-10 10:26:40

*Thread Reply:* Ill let the LDSC ppl speak for their lack of power calculator 😉 but as far as GenomicSEM is concerned its really hard to make a general power calculator when people can fit a great variety of models. Power for what? power to distinguish one latent variable model from another? power to detect a SNP acts on the latent factor, or power to test whether we can reject the premiss that a SNP acts via a latent factor (instead influencing traits directly). THe the underlying power in LDSC will probably depend heavily on the gneeitc architecture (how many causal SNPs, how are causal SNPs spread across the genome). WRt LDSC vs HDL I think HDL should win out IF you have a proper reference LD set (thew authors state HDL is sensitive to missingSNPS in the GWAS that are in the ref). So if you for example use a medium sized (30-50k) Swedish, Japanese or Norwegian dataset for your thesis or a multi year period HDL could offer power gains, and it could be worth it to take the time to create an HDL LD reference.

👍 Loic Yengo, Abigail ter Kuile

Michel Nivard (m.g.nivard@vu.nl)

2021-06-10 10:27:53

*Thread Reply:* alll this is to say this is a really great question, and in any multi year project/paper question it's probably worth it todo some power simulations, be sure to reach out if you need guidance those.

👍 Abigail ter Kuile

Abigail ter Kuile (k1456980@kcl.ac.uk)

2021-06-14 05:48:49

*Thread Reply:* Excellent, thanks so much Michel for the detailed answer. Looking forward to today's workshop!

Loic Yengo (l.yengo@uq.edu.au)

2021-06-10 15:43:08

Today's Practical has been (slightly) updated. Please check here: /faculty/loic/2021/PracticalGuidelines/Day4practicalBoulder2021v4.docx or /faculty/loic/2021/PracticalGuidelines/Day4practicalBoulder2021v4.pdf or https://docs.google.com/document/d/1OZilaO1GhV2vz5iAm4wl-TyiV2LxezBzhecBXxpHmKc/edit?usp=sharing

Day4 - Heritability (GCTA + LDSC) Practical - Boulder Workshop

Rob Kirkpatrick (robert.kirkpatrick@vcuhealth.org)

2021-06-10 16:16:17

I'm curious about building ldsc from source on my computer here at home. What is the minimal set of dependencies necessary to do so? Just the conda package manager, and the dependencies listed in environment.yml ?

Rob Kirkpatrick (robert.kirkpatrick@vcuhealth.org)

2021-06-10 18:06:54

*Thread Reply:* Or to put it another way, is a full anaconda installation really necessary to build ldsc?

David Evans (d.evans1@uq.edu.au)

2021-06-10 16:58:24

If you have not seen it before, do also check out LDHub- a user friendly way to implement LD score regression for calculation of SNP heritability and also genetic correlations http://ldsc.broadinstitute.org/ 🙂

👍 Loic Yengo, Michel Nivard

Kristen (kristenlhopkins@gmail.com)

2021-06-10 18:41:36

Hi all, just a worksheet layout suggestion for this practical for next year, from the absolute beginners in my group: Practical Part 1 was really clear and easy to follow, even for beginners. Practical Part 2 was pitched a bit too advanced for beginners, because we do not yet have the skills to write our own code to answer the questions. It would be really helpful if the worksheet could provide the code required in the main body of the text. I see now (afterwards) that the code is provided in the answer section at the end, but none of us found this during the tutorial. There is so much new content to plough through each day that we're not getting time to read through the worksheets before the tutorial starts. Thank you!

Lucía Colodro-Conde (lucia.colodroconde@qimrberghofer.edu.au)

2021-06-10 19:04:08

*Thread Reply:* Hi Kristen, I think Loic said in the presentation of the practical that the answers were at the end of the document... and to check them if required at any time... Maybe this information could have been reiterated, I know there are many documents and things to coordinate!

Kristen (kristenlhopkins@gmail.com)

2021-06-10 20:08:22

*Thread Reply:* Thanks Lucia - we didn't appreciate that code would be included with the answers. Kristen

Loic Yengo (l.yengo@uq.edu.au)

2021-06-10 22:04:09

*Thread Reply:* Thanks Kristen for the valuable feedback. We will take that into account for the following sessions and next year workshop. Glad you found Part 1 didactic 😃

Lucía de Hoyos (Lucia.DeHoyos@mpi.nl)

2021-06-11 02:35:32

Hi, @Steven Gazal, I am new in the field and I am getting to know LDSC. I was wondering why the threshold for the intercept is 1.

Steven Gazal (gazal@usc.edu)

2021-06-11 09:13:22

*Thread Reply:* Hi Lucia! The idea between LDSC is that when you regress chi-square statistics on LD scores, the slope is proportional to heritability, and the intercept tells you how stratification impacts your GWAS results. Regarding your question, a good way to visualize that is to consider a null GWAS with no heritability (slope=0) and no stratification; in that case the mean chi-square is 1, so we expect the intercept to be 1. Does this make sense?

👌 Lucía de Hoyos

Lucía de Hoyos (Lucia.DeHoyos@mpi.nl)

2021-06-14 03:50:09

*Thread Reply:* Hi Steven. Yes, it does make sense. Thank you!

Public Channels

Private Channels

Direct Messages

Group Direct Messages