Jeff Lessem (he/him) (jeff.lessem@colorado.edu)
2021-04-30 09:45:30

@Jeff Lessem (he/him) has joined the channel

Loic Yengo (l.yengo@uq.edu.au)
2021-04-30 09:48:29

@Loic Yengo has joined the channel

Valentin Hivert (v.hivert@imb.uq.edu.au)
2021-04-30 09:48:29

@Valentin Hivert has joined the channel

Julia (j.sidorenko@imb.uq.edu.au)
2021-04-30 09:48:30

@Julia has joined the channel

Yuna Zhang (yuanxiang.zhang@uq.edu.au)
2021-04-30 09:48:30

@Yuna Zhang has joined the channel

Zhihong Zhu (z.zhu@econ.au.dk)
2021-04-30 09:48:30

@Zhihong Zhu has joined the channel

Steven Gazal (gazal@usc.edu)
2021-04-30 09:49:48

@Steven Gazal has joined the channel

Gunn-Helen Moen (g.moen@uq.edu.au)
2021-05-03 13:13:42

@Gunn-Helen Moen has joined the channel

Mark Adams (mark.adams@ed.ac.uk)
2021-05-04 02:16:46

@Mark Adams has joined the channel

P Wainschtein (p.wainschtein@imb.uq.edu.au)
2021-05-06 01:01:58

@P Wainschtein has joined the channel

Test Student (test-student@ibg.colorado.edu)
2021-05-06 11:38:58

@Test Student has joined the channel

Bridget Joyner (bnj13@my.fsu.edu)
2021-05-10 13:01:19

@Bridget Joyner has joined the channel

Sally Kuo (ickuo@vcu.edu)
2021-05-10 13:30:20

@Sally Kuo has joined the channel

Aislinn Bowler (aislinnbowler@gmail.com)
2021-05-10 13:30:27

@Aislinn Bowler has joined the channel

Morgan Driver (driverm@vcu.edu)
2021-05-10 13:31:04

@Morgan Driver has joined the channel

Sarah Brislin (she/her) (sarah.brislin@gmail.com)
2021-05-10 13:31:37

@Sarah Brislin (she/her) has joined the channel

Lisa Dinkler (lisa.dinkler@gu.se)
2021-05-10 13:31:43

@Lisa Dinkler has joined the channel

Katie Bountress (kaitlin.bountress@vcuhealth.org)
2021-05-10 13:32:21

@Katie Bountress has joined the channel

Peter Tanksley (peter.tanksley@austin.utexas.edu)
2021-05-10 13:32:32

@Peter Tanksley has joined the channel

Tong Chen (tuc548@psu.edu)
2021-05-10 13:34:05

@Tong Chen has joined the channel

Charlotte Viktorsson (viktorsson.charlotte@gmail.com)
2021-05-10 13:34:34

@Charlotte Viktorsson has joined the channel

Jacob Kunkel (kunke104@umn.edu)
2021-05-10 13:35:32

@Jacob Kunkel has joined the channel

Matthieu de Hemptinne (matthieu.dehemptinne@gmail.com)
2021-05-10 13:36:01

@Matthieu de Hemptinne has joined the channel

Jay Ross (jay.ross@mail.mcgill.ca)
2021-05-10 13:38:34

@Jay Ross has joined the channel

Sam Freis (she/her) (Samantha.Freis@colorado.edu)
2021-05-10 13:38:42

@Sam Freis (she/her) has joined the channel

Jeremy Elman (jaelman@health.ucsd.edu)
2021-05-10 13:38:56

@Jeremy Elman has joined the channel

Spencer Moore (spmo3925@colorado.edu)
2021-05-10 13:39:53

@Spencer Moore has joined the channel

Maizy Brasher (mabr7162@colorado.edu)
2021-05-10 13:39:53

@Maizy Brasher has joined the channel

Jenny Phan (jphan5@wisc.edu)
2021-05-10 13:39:58

@Jenny Phan has joined the channel

Meng Huang (meng.huang.cn@gmail.com)
2021-05-10 13:41:18

@Meng Huang has joined the channel

Jung Chen (jchen378@ucmerced.edu)
2021-05-10 13:41:58

@Jung Chen has joined the channel

Stephanie Zellers (she/her/hers) (zelle063@umn.edu)
2021-05-10 13:42:17

@Stephanie Zellers (she/her/hers) has joined the channel

Grace Wu (yakew@email.unc.edu)
2021-05-10 13:42:32

@Grace Wu has joined the channel

Gladi Thng (s2124928@ed.ac.uk)
2021-05-10 13:43:47

@Gladi Thng has joined the channel

Zoe Schmilovich (zoe.schmilovich@mail.mcgill.ca)
2021-05-10 13:43:51

@Zoe Schmilovich has joined the channel

Olivia Rennie (olivia.rennie@alum.utoronto.ca)
2021-05-10 13:43:58

@Olivia Rennie has joined the channel

Christina Sheerin (Christina.sheerin@vcuhealth.org)
2021-05-10 13:43:59

@Christina Sheerin has joined the channel

William McAuliffe (williamhbmcauliffe@gmail.com)
2021-05-10 13:44:17

@William McAuliffe has joined the channel

Chloe Myers (cmyer011@ucr.edu)
2021-05-10 13:44:20

@Chloe Myers has joined the channel

Francis Vergunst (he/him) (francis.vergunst@umontreal.ca)
2021-05-10 13:44:33

@Francis Vergunst (he/him) has joined the channel

Ravi Bhatt (ravibot93@gmail.com)
2021-05-10 13:44:48

@Ravi Bhatt has joined the channel

Nathan Bell (n.y.bell@student.vu.nl)
2021-05-10 14:46:30

@Nathan Bell has joined the channel

Emil Uffelmann (e.uffelmann@vu.nl)
2021-05-10 14:46:48

@Emil Uffelmann has joined the channel

Kristen Kelly (k.m.kelly@vu.nl)
2021-05-10 14:47:50

@Kristen Kelly has joined the channel

Jeff Lessem (he/him) (jeff.lessem@colorado.edu)
2021-06-08 15:24:36

@Jeff Lessem (he/him) has renamed the channel from "heritability-and-gcta" to "day04-heritability-and-gcta"

Loic Yengo (l.yengo@uq.edu.au)
2021-06-08 16:24:37

GCTA/LDSC practical can be found here: /home/loic/2021/PracticalGuidelines/Day4practical_Boulder2021.**

Loic Yengo (l.yengo@uq.edu.au)
2021-06-08 16:25:03

I'll send a reminder tomorrow.

Sarah Medland (she/her) (sarahme@qimr.edu.au)
2021-06-08 16:25:39

Correction /faculty/loic/2021/PracticalGuidelines/Day4practical_Boulder2021.**

Loic Yengo (l.yengo@uq.edu.au)
2021-06-08 16:31:46

Thanks Sarah!

Jason Freeman (jfreeman@towson.edu)
2021-06-09 12:45:43

In one video you mentioned there are various ways to measure GRM. Do these various means of measuring GRM lead to different heritability estimates? If so, how do we know which measures of heritability are most accurate?

Loic Yengo (l.yengo@uq.edu.au)
2021-06-09 17:16:53

*Thread Reply:* Hi Jason, this is great question. The answer is "Yes" how you measure genetic relatedness (GRM) affects your heritability estimates. Unfortunately, the truth depends on things that we don't know or observed such as 1) causal variants and 2) what is the relationship between SNP effects and allele frequencies. 1) and 2) are often referred to as the "genetic architecture" of the trait or the disease. So how do we know, which one is accurate? Well, methods such as the LDMS (MAF and LD stratified) provide a way to get unbiased estimates (Evans et al. 2018; Pubmed ID = 29700474).

šŸ‘ Mark Adams
Jason Freeman (jfreeman@towson.edu)
2021-06-09 20:19:05

*Thread Reply:* Thanks!

Jason Freeman (jfreeman@towson.edu)
2021-06-09 14:42:47

Other than increased computational efficiency, are there any other practical reasons to choose GREML over Haseman-Elston regression for estimating heritability? And are the heritabilities largely identical using both methods?

Loic Yengo (l.yengo@uq.edu.au)
2021-06-09 17:21:32

*Thread Reply:* Another great question! HE and GREML are largely consistent in general. Differences can occur when sample size is not large enough or when the trait is not normally distributed. If possible running both can teach you something interesting about your data.

Jason Freeman (jfreeman@towson.edu)
2021-06-09 20:19:16

*Thread Reply:* Thanks again!

šŸ‘ Loic Yengo
matthew keller (matthew.c.keller@gmail.com)
2021-06-09 21:18:24

*Thread Reply:* if there is assortative mating on the trait (leading to long-range gametic disequilibrium), HE regression and GREML behave quite differently. Both are upwardly biased for realistic sample sizes (e.g., n < 100K) but GREML estimates go down as a function of n, asymptoting at h2_time0 as n -> inf whereas HE stay consistently high. See this preprint: https://www.biorxiv.org/content/10.1101/2021.03.18.436091v1.full

bioRxiv
šŸ˜ Anna Furtjes
Alex Bloemendal (he/him) (bloem@broadinstitute.org)
2021-06-10 08:27:40

*Thread Reply:* Also, GREML is downwardly biased when applied to (ascertained) case-control data: https://www.pnas.org/content/111/49/E5272.short

PNAS
Alex Bloemendal (he/him) (bloem@broadinstitute.org)
2021-06-10 08:32:30

*Thread Reply:* This followup is interesting too: https://www.cell.com/ajhg/fulltext/S0002-9297(18)30195-2

Penelope Lind (penelope.lind@qimrberghofer.edu.au)
2021-06-09 20:16:41

@Loic Yengo I was trying to go through the Part 2: LD score regression tutorial and I don't have read permissions for the files in LDSCREF/baselineLDv2.2/ (I wanted to look at a chromosome 1 log file for the 11-b2 question). Also, the 11-c command fails for me. I'm not sure if this is related to the file permissions?

Jeff Lessem (he/him) (jeff.lessem@colorado.edu)
2021-06-09 20:17:28

*Thread Reply:* Yeah, I broke it, but I'm fixing it now

Penelope Lind (penelope.lind@qimrberghofer.edu.au)
2021-06-09 20:21:23

*Thread Reply:* I just realised 11-c had /data/ in the path which I removed and it's working now. I should have paid more attention!

Jeff Lessem (he/him) (jeff.lessem@colorado.edu)
2021-06-09 20:22:20

*Thread Reply:* That is what we want, the students should be reading the files from /data, but writing to their own directory

Jeff Lessem (he/him) (jeff.lessem@colorado.edu)
2021-06-09 20:23:10

*Thread Reply:* The files a very large, so we need to avoid having all of the students copy them locally. I did put in some symlinks, so it might work with the default paths

šŸ™‚ LucĆ­a Colodro-Conde
Penelope Lind (penelope.lind@qimrberghofer.edu.au)
2021-06-09 20:25:55

*Thread Reply:* That makes sense now. It now works with the /data path. Thank you.

Loic Yengo (l.yengo@uq.edu.au)
2021-06-09 21:42:08

*Thread Reply:* Thanks Jeff for fixing this and Sorry for the inconvenience, Penelope

Jason Freeman (jfreeman@towson.edu)
2021-06-09 20:23:41

One more question. How would I determine if the number of cases and/or the number of SNPs in my dataset are appropriate for doing a GREML (OR HE) analysis? In other words, how do I know if I have enough cases and/or SNPs to get accurate estimates of heritability?

Julia (j.sidorenko@imb.uq.edu.au)
2021-06-09 20:53:01

*Thread Reply:* For a sample size, you can try a power calculator: https://shiny.cnsgenomics.com/gctaPower/

šŸ‘ Loic Yengo, Mark Adams
Julia (j.sidorenko@imb.uq.edu.au)
2021-06-09 20:55:31

*Thread Reply:* or maybe this one: https://cnsgenomics.com/software/gcta/#GREMLpowercalculator

šŸ‘ Loic Yengo
Jason Freeman (jfreeman@towson.edu)
2021-06-10 06:55:07

*Thread Reply:* Thanks!

Loic Yengo (l.yengo@uq.edu.au)
2021-06-09 22:34:56

@channel: here's tomorrow's practical. It has been updated since yesterday.

Guiomar Masip (guiomar.masip-manuel@helsinki.fi)
2021-06-10 04:24:02

Hi, I have a question @Loic Yengo, regarding to the videos. I don't really understand the importance of estimating h2 SNPs. For example, we can know h2 from twin and family studies, and we can also know the variance explained from a polygenic risk, then why should we calculate h2 SNPs, if we can directly know for example the genetic variance of a trait from PRSs? - has this variance explained from PRSs something in common with h2 SNPs? - Actually we know from recent studies that the % of variance explained from PRSs that use genetic variants irrespective of genome-wide significance can be similar to h2 estimates from twin and family studies. Thanks ;)

Loic Yengo (l.yengo@uq.edu.au)
2021-06-10 06:22:21

*Thread Reply:* Hi Guiomar! Thanks for the great question. h2SNP gives you an upper bound for the prediction accuracy of your PRS. In other words, h2SNP > R2PRS. So why would you like to know that upepr bound? Well, it could be useful to evaluate how much more information contained in the observed SNPs you may still be missing to improve your prediction. Also, estimating h2_SNP could be a first hint on a trait heritability when there is no twin study around. Hope these two examples speak to you. Cheers, L

Guiomar Masip (guiomar.masip-manuel@helsinki.fi)
2021-06-10 06:34:11

*Thread Reply:* great thanks! your response is really helpful

matthew keller (matthew.c.keller@gmail.com)
2021-06-10 06:52:58

*Thread Reply:* I agree with both of Loicā€™s reasons. Let me give a couple more. First, one can look at the relative importance of different annotations using GREML or LDSC (e.g., variants in genes expressed in CNS, variants that are conserved, etc.). Thatā€™s impossible to do with twin/family studies and under-powered in PRS studies. Second, one can investigate genetic correlations between traits that are impossible to look at in twin/family data, either because the traits havenā€™t been measured in them, the data isnā€™t available to you (twin data still tends to be proprietary whereas this isnā€™t so for much GWAS data), or the rg is between traits that are mutually exclusive or too rare to co-occur within families

Guiomar Masip (guiomar.masip-manuel@helsinki.fi)
2021-06-10 08:58:42

*Thread Reply:* thanks Matthew for your points

Laura (lhaver01@mail.bbk.ac.uk)
2021-06-10 04:46:16

@Loic Yengo You mention in video 3 that certain siblings will be more genetically related than others due to recombination ā€” could you expand on that a bit please? I have never quite got my head around how recombination works!

šŸ‘ Aislinn Bowler
Loic Yengo (l.yengo@uq.edu.au)
2021-06-10 05:45:34

*Thread Reply:* Hi Laura, sure. The degree of DNA sharing between sibling is ~0.5. However, if we directly measure the proportion of DNA segments that are identical by descent between siblings, we see that this proportion actually varies between say ~0.35 and ~0.65 (check Fig.1 in this paper: https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.0020041). The reason is that during meiosis DNA from each parent is recombined in different ways before being passed on to each offspring. Therefore, recombination is the phenomenon responsible for within family variation in IBD. Hope this clarifies, a little bit.

journals.plos.org
Nathan Bell (n.y.bell@student.vu.nl)
2021-06-10 05:43:50

@Loic Yengo if you get different heritability estimates when doing HE regression in GCTA for HE-CP and HE-SD how would you interpret that and what would be the next steps (if any) to follow up?

Loic Yengo (l.yengo@uq.edu.au)
2021-06-10 05:49:40

*Thread Reply:* Hi Nathan, good question. I'm not sure how much degree of difference do you talk about. The estimates may not be identical but they should be largely similar by design. Do you have a particular example to share?

Nathan Bell (n.y.bell@student.vu.nl)
2021-06-10 06:07:31

*Thread Reply:* No I was just curious what you would do in case there was a difference (or if it's possible to have a meaningful difference)

matthew keller (matthew.c.keller@gmail.com)
2021-06-10 07:14:18

*Thread Reply: They should be broadly similar but not necessarily. A couple of reasons off the top of my head they can differ: (1) HE typically doesnā€™t use the diagonals of the GRM whereas GREML does. For common variants, this doesnā€™t matter much because there are only n diagonals but (n*(n-1))/2 off diagonals, so MUCH more information in the off-diagonals. However, as you move to rare variants and/or when there is much ancestry structure in your sample, this can be counter-balanced by the variance of the diagonals getting much higher (weā€™re talking 1000s to 100k fold higher!) than the variance of the off-diagonals. This is a problem with the current way that GCTA figures diagonals which I donā€™t think is well appreciated yet. (2) As I noted in a response above, if there is assortative mating on the trait (leading to long-range gametic disequilibrium), HE regression and GREML behave quite differently. Both are upwardly biased for realistic sample sizes (e.g., nĀ < 100K) but GREML estimates go down as a function of n, asymptoting at h2_time0 as n -> inf whereas HE stay consistently high. See this preprint: https://www.biorxiv.org/content/10.1101/2021.03.18.436091v1.full

bioRxiv
Nathan Bell (n.y.bell@student.vu.nl)
2021-06-10 08:14:35

*Thread Reply:* Thank you!

Uku Vainik (ukuvainik@gmail.com)
2021-06-10 06:55:55

@Loic Yengo Great videos! It was very cool to see using distantly related people to get maximally unbiased heritability estimates. Does this approach get to test the equal environments assumption? Has the EEA survived such tests?

matthew keller (matthew.c.keller@gmail.com)
2021-06-10 07:08:50

*Thread Reply:* Iā€™ll jump in and hopefully Loic does as well. Itā€™s a good question. By equal env. assumption (EEA), I take you to mean the equal twin env. assumption (that rgenv for MZs = rgenv for DZs). GREML/HE donā€™t directly test that assumption, but do by extension. h2 estimates from GREML/HE are expected to be lower to the degree that causal variants arenā€™t tagged by SNPs used to build the GRM (in unrelated samples), and should therefore be a lower bound of the twin/family h2. Thus, observations of h2 in GREML/HE suggest that at least some of the h2 in trait X cannot be explained by violations in the EEA. As we move to using sequence data (and figure out the proper ways to perform GREML/HE in sequence data, which isnā€™t trivial!), h2 from GREML/HE should approach the full narrow-sense h2, and at that point weā€™ll get a clearer picture of the degree to which twin/family estimates have been biased this whole time. I strongly suspect they will end up being a bit biased depending on the trait, but little of this bias will be due to violations of EEA

Uku Vainik (ukuvainik@gmail.com)
2021-06-10 07:23:06

*Thread Reply:* Thank you! You unpacked this very well. Does limiting participants to distantly related improve the bias further?

Michel Nivard (m.g.nivard@vu.nl)
2021-06-10 06:59:37

@channel this is the slack channel for today!

Michel Nivard (m.g.nivard@vu.nl)
2021-06-10 07:03:38

How can we order the mug?

šŸ˜ Giulio Centorame, Anna Furtjes
ā˜€ LucĆ­a Colodro-Conde
matthew keller (matthew.c.keller@gmail.com)
2021-06-10 07:15:10

*Thread Reply:* Iā€™m a Loic fan. Iā€™d buy one!

:star_struck: LucĆ­a Colodro-Conde
Barbara Molz (Barbara.Molz@mpi.nl)
2021-06-10 07:34:17

I have a quite specific LDSC question as we sometimes see (significant) negative heritability results when using part. heritability and we struggle to find a proper explanation. So currently we rather think that these results might be biased by a small proportion of SNPs in some of our custom annotations. Also, --h2-cts seems to use a one sided test compared to the standard --h2 flag. Does this have a distinct reason?

Loic Yengo (l.yengo@uq.edu.au)
2021-06-10 07:40:08

*Thread Reply:* @Steven Gazal any suggestion?

matthew keller (matthew.c.keller@gmail.com)
2021-06-10 08:04:21

*Thread Reply:* Iā€™m curious to hear peopleā€™s thoughts on this one too

Steven Gazal (gazal@usc.edu)
2021-06-10 09:09:25

*Thread Reply:* Hi! ā€¢ regarding significant negative heritability, I am not sure I have a good explanation here... I think you broke the model with some annotation deeply depleted for heritability, which will have a negative regression coefficient and thus a negative heritability. ā€¢ regarding h2-cts, yes it is dong a one test comparison as it s specifically looking for cell-type enriched in h2, and thus having a regression coeff >0. I think the -h2 flag only outputs z scores and not P value, so that it let you decide if you want to do a 1 or 2 sided test Does this help?

Kazuki Okubo (okubo-kazuki087@g.ecc.u-tokyo.ac.jp)
2021-06-10 08:04:30

For [question-2 in exercise-2] higher variance of diagonal elements for GRM based on MAF < . 05 Could the difference in frequency itself have effect on this variance other than the effect of the number of variants used?

Priyadarshini Thirunavukkarasu (galaxie2485@yahoo.co.in)
2021-06-10 08:36:34

@Loic Yengo In exercise 4, we are estimating heritability without relatives and with relatives. Inflated heritability observed in samples with relatives is not due to shared genetic factors?. Thanks

Priyadarshini Thirunavukkarasu (galaxie2485@yahoo.co.in)
2021-06-10 08:39:46

@Loic Yengo What are the three sets of SNPs used for LDSC analysis? Why do we use three sets of SNPs and what is their relevance in LDSC analysis?. Thanks

Loic Yengo (l.yengo@uq.edu.au)
2021-06-10 08:47:07

*Thread Reply:* Great question, Priyadarshini! Steven redefined the three sets in the first "Background" section of the practical. In brief, one set is the set of SNPS directly used to run the regression. Here, we have ~1M of them. The over sets refer to how LD scores were calculated. To calculate the LD score of each of these ~1M SNPs you need to sum the r^2 with the neighbouring SNPs. Depending on how many neibouring SNPs you use you can end up with say ~10M (including ~6M with a MAF>5%). So why? We don't need to many SNPs to get the regression right so ~1M is enough. However, to capture the right amount of variation we may need to calculate LD scores relative to more dense sequenced SNPs (e.g., 10M). @Steven Gazal, you want to add something here?

Steven Gazal (gazal@usc.edu)
2021-06-10 11:01:24

*Thread Reply:* Nothing to add! It is important to keep in mind that while you are doing a regression on "only" 1M SNPs, you are modeling the effects of 10M SNPs through the LD scores computed on a reference panel with 10M sequenced SNPs, and you are reporting effects on common reference SNPs (~6M). This is different from GCTA where you are estimating h2 tagged by all the SNPs that are in your data.

Jet Termorshuizen (jet.termorshuizen@ki.se)
2021-06-10 09:02:38

Hi! About the LDSC part of the tutorial: why is the heritability estimate when using stratified LDSC higher compared to when we're not stratifying the LDSC analysis? And why should we trust the stratified LDSC estimate more?

šŸ‘ Anna Furtjes
Michel Nivard (m.g.nivard@vu.nl)
2021-06-10 09:12:32

*Thread Reply:* I hope @Steven Gazal weights in as well or @Loic Yengo but ill try my best at an awnser. If we run basic (unstratified) LD score regression we are basically assuming a model where the true effect SNPs are distributed across the entire genome almost uniformly and only LD determines how much effect wd'll expec tto observe at any SNPS (more LD and you tag more causal SNPs). However in reality the true effect SNPs can be expect to be found more in some parts on the genome (in or near genes for example) then in others. the annotation used are based on prior ideas about where in the genome we might expect causal SNPs to be found, those ideas are reasonable so the baseline model is likely a better description of where in the genome true causal SNPs are and can be expected to provide a slightly better h2 estimate.. Now which model is better requires additional evaluation (esp if you compared competing sets of annotation in your stratified LD score analysis) and I dont know whether there is a best practice way to compare models (internally we have used cross validation to test which among various stratified LDSC models fits best)

Michel Nivard (m.g.nivard@vu.nl)
2021-06-10 09:13:09

*Thread Reply:* dont trust my answer fully untill you hear from the others!

Steven Gazal (gazal@usc.edu)
2021-06-10 10:58:23

*Thread Reply:* You're good! It is known that 1) per-SNP heritability strongly varies according to MAF and LD (SNPs with high MAF and/or low LD explain more variance); 2) these MAF- and LD-architectures bias heritability estimates. The second model includes MAF and LD annotations that help to correct for this bias.

Steven Gazal (gazal@usc.edu)
2021-06-10 10:58:27

*Thread Reply:* Does this help?

Jet Termorshuizen (jet.termorshuizen@ki.se)
2021-06-11 02:54:21

*Thread Reply:* Yes, it helps! One more follow-up question: you describe that we should not use LDSC heritability estimates, but that heritability estimates of the baseline-LD model can be used with extreme caution. Do you recommend to not use those stratified LDSC heritability estimates at all or would it be informative to report both heritability estimates from GCTA and S-LDSC?

Ciarrah-Jane Barry (ciarrah.barry@bristol.ac.uk)
2021-06-10 09:20:36

When would you use LDSR over GREML-SNP?

matthew keller (matthew.c.keller@gmail.com)
2021-06-10 09:25:07

*Thread Reply:* If you have the raw data in hand and computational issues (in terms of RAM and computational time) are not an issue, then Iā€™d go with GREML. But - thatā€™s not the world we live in. The advantage of LDSR over GREML (and why itā€™s become more used) is that one can use LDSR without having the raw data (just having the sumstats) and that it can be done MUCH more computationally cheaply than GREML

Rob Kirkpatrick (robert.kirkpatrick@vcuhealth.org)
2021-06-10 09:27:55

*Thread Reply:* This paper, https://www.nature.com/articles/ng.3941 , contains some discussion that's relevant to your question.

Nature Genetics
Abigail ter Kuile (k1456980@kcl.ac.uk)
2021-06-10 10:05:18

Thanks for the great lectures on heritability and GCTA, the topics were broken down in a really clear way! Are there any power calculations to estimate genetic correlations in LDSC regression? GCTA-GREML has a great power calculator, but as LDSC regression requires more power, is there another tool that can be used specifically for LDSC regression?

Abigail ter Kuile (k1456980@kcl.ac.uk)
2021-06-10 10:12:30

*Thread Reply:* In relation to this, are there any power calculations for Genomic SEM, and would this be different when using LDSC vs HDL in Genomic SEM? I'm aware that a LDSC SNP-heritability Z score less than 5 is an indicator that the GWAS summary statistics might not be powered enough for gSEM analyses.. could you use this threshold for a power calculation? @Michel Nivard

Michel Nivard (m.g.nivard@vu.nl)
2021-06-10 10:26:40

*Thread Reply:* Ill let the LDSC ppl speak for their lack of power calculator šŸ˜‰ but as far as GenomicSEM is concerned its really hard to make a general power calculator when people can fit a great variety of models. Power for what? power to distinguish one latent variable model from another? power to detect a SNP acts on the latent factor, or power to test whether we can reject the premiss that a SNP acts via a latent factor (instead influencing traits directly). THe the underlying power in LDSC will probably depend heavily on the gneeitc architecture (how many causal SNPs, how are causal SNPs spread across the genome). WRt LDSC vs HDL I think HDL should win out IF you have a proper reference LD set (thew authors state HDL is sensitive to missingSNPS in the GWAS that are in the ref). So if you for example use a medium sized (30-50k) Swedish, Japanese or Norwegian dataset for your thesis or a multi year period HDL could offer power gains, and it could be worth it to take the time to create an HDL LD reference.

šŸ‘ Loic Yengo, Abigail ter Kuile
Michel Nivard (m.g.nivard@vu.nl)
2021-06-10 10:27:53

*Thread Reply:* alll this is to say this is a really great question, and in any multi year project/paper question it's probably worth it todo some power simulations, be sure to reach out if you need guidance those.

šŸ‘ Abigail ter Kuile
Abigail ter Kuile (k1456980@kcl.ac.uk)
2021-06-14 05:48:49

*Thread Reply:* Excellent, thanks so much Michel for the detailed answer. Looking forward to today's workshop!

Loic Yengo (l.yengo@uq.edu.au)
2021-06-10 15:43:08

Today's Practical has been (slightly) updated. Please check here: /faculty/loic/2021/PracticalGuidelines/Day4practicalBoulder2021v4.docx or /faculty/loic/2021/PracticalGuidelines/Day4practicalBoulder2021v4.pdf or https://docs.google.com/document/d/1OZilaO1GhV2vz5iAm4wl-TyiV2LxezBzhecBXxpHmKc/edit?usp=sharing

Rob Kirkpatrick (robert.kirkpatrick@vcuhealth.org)
2021-06-10 16:16:17

I'm curious about building ldsc from source on my computer here at home. What is the minimal set of dependencies necessary to do so? Just the conda package manager, and the dependencies listed in environment.yml ?

Rob Kirkpatrick (robert.kirkpatrick@vcuhealth.org)
2021-06-10 18:06:54

*Thread Reply:* Or to put it another way, is a full anaconda installation really necessary to build ldsc?

David Evans (d.evans1@uq.edu.au)
2021-06-10 16:58:24

If you have not seen it before, do also check out LDHub- a user friendly way to implement LD score regression for calculation of SNP heritability and also genetic correlations http://ldsc.broadinstitute.org/ šŸ™‚

šŸ‘ Loic Yengo, Michel Nivard
Kristen (kristenlhopkins@gmail.com)
2021-06-10 18:41:36

Hi all, just a worksheet layout suggestion for this practical for next year, from the absolute beginners in my group: Practical Part 1 was really clear and easy to follow, even for beginners. Practical Part 2 was pitched a bit too advanced for beginners, because we do not yet have the skills to write our own code to answer the questions. It would be really helpful if the worksheet could provide the code required in the main body of the text. I see now (afterwards) that the code is provided in the answer section at the end, but none of us found this during the tutorial. There is so much new content to plough through each day that we're not getting time to read through the worksheets before the tutorial starts. Thank you!

LucĆ­a Colodro-Conde (lucia.colodroconde@qimrberghofer.edu.au)
2021-06-10 19:04:08

*Thread Reply:* Hi Kristen, I think Loic said in the presentation of the practical that the answers were at the end of the document... and to check them if required at any time... Maybe this information could have been reiterated, I know there are many documents and things to coordinate!

Kristen (kristenlhopkins@gmail.com)
2021-06-10 20:08:22

*Thread Reply:* Thanks Lucia - we didn't appreciate that code would be included with the answers. Kristen

Loic Yengo (l.yengo@uq.edu.au)
2021-06-10 22:04:09

*Thread Reply:* Thanks Kristen for the valuable feedback. We will take that into account for the following sessions and next year workshop. Glad you found Part 1 didactic šŸ˜ƒ

LucĆ­a de Hoyos (Lucia.DeHoyos@mpi.nl)
2021-06-11 02:35:32

Hi, @Steven Gazal, I am new in the field and I am getting to know LDSC. I was wondering why the threshold for the intercept is 1.

Steven Gazal (gazal@usc.edu)
2021-06-11 09:13:22

*Thread Reply:* Hi Lucia! The idea between LDSC is that when you regress chi-square statistics on LD scores, the slope is proportional to heritability, and the intercept tells you how stratification impacts your GWAS results. Regarding your question, a good way to visualize that is to consider a null GWAS with no heritability (slope=0) and no stratification; in that case the mean chi-square is 1, so we expect the intercept to be 1. Does this make sense?

šŸ‘Œ LucĆ­a de Hoyos
LucĆ­a de Hoyos (Lucia.DeHoyos@mpi.nl)
2021-06-14 03:50:09

*Thread Reply:* Hi Steven. Yes, it does make sense. Thank you!