Slack Export - #day09-rare-saige

Jeff Lessem (he/him) (jeff.lessem@colorado.edu)

2021-04-26 10:19:45

@Jeff Lessem (he/him) has joined the channel

Wei Zhou (wzhou@broadinstitute.org)

2021-04-30 09:58:05

@Wei Zhou has joined the channel

Rounak Dey (rdey@hsph.harvard.edu)

2021-04-30 09:58:05

@Rounak Dey has joined the channel

Zhangchen Zhao (zczhao@umich.edu)

2021-04-30 09:58:05

@Zhangchen Zhao has joined the channel

Tetyana Zayats (tzayats@broadinstitute.org)

2021-04-30 09:58:05

@Tetyana Zayats has joined the channel

Kristin Tsuo (ktsuo@broadinstitute.org)

2021-04-30 09:58:05

@Kristin Tsuo has joined the channel

Gunn-Helen Moen (g.moen@uq.edu.au)

2021-05-03 13:13:42

@Gunn-Helen Moen has joined the channel

Test Student (test-student@ibg.colorado.edu)

2021-05-06 11:38:59

@Test Student has joined the channel

Bridget Joyner (bnj13@my.fsu.edu)

2021-05-10 13:29:35

@Bridget Joyner has joined the channel

Sally Kuo (ickuo@vcu.edu)

2021-05-10 13:30:21

@Sally Kuo has joined the channel

Aislinn Bowler (aislinnbowler@gmail.com)

2021-05-10 13:30:28

@Aislinn Bowler has joined the channel

Morgan Driver (driverm@vcu.edu)

2021-05-10 13:31:04

@Morgan Driver has joined the channel

Sarah Brislin (she/her) (sarah.brislin@gmail.com)

2021-05-10 13:31:38

@Sarah Brislin (she/her) has joined the channel

Lisa Dinkler (lisa.dinkler@gu.se)

2021-05-10 13:31:44

@Lisa Dinkler has joined the channel

Katie Bountress (kaitlin.bountress@vcuhealth.org)

2021-05-10 13:32:21

@Katie Bountress has joined the channel

Peter Tanksley (peter.tanksley@austin.utexas.edu)

2021-05-10 13:32:33

@Peter Tanksley has joined the channel

Tong Chen (tuc548@psu.edu)

2021-05-10 13:34:05

@Tong Chen has joined the channel

Charlotte Viktorsson (viktorsson.charlotte@gmail.com)

2021-05-10 13:34:35

@Charlotte Viktorsson has joined the channel

Jacob Kunkel (kunke104@umn.edu)

2021-05-10 13:35:33

@Jacob Kunkel has joined the channel

Matthieu de Hemptinne (matthieu.dehemptinne@gmail.com)

2021-05-10 13:36:02

@Matthieu de Hemptinne has joined the channel

Jay Ross (jay.ross@mail.mcgill.ca)

2021-05-10 13:38:34

@Jay Ross has joined the channel

Sam Freis (she/her) (Samantha.Freis@colorado.edu)

2021-05-10 13:38:42

@Sam Freis (she/her) has joined the channel

Jeremy Elman (jaelman@health.ucsd.edu)

2021-05-10 13:38:57

@Jeremy Elman has joined the channel

Spencer Moore (spmo3925@colorado.edu)

2021-05-10 13:39:53

@Spencer Moore has joined the channel

Maizy Brasher (mabr7162@colorado.edu)

2021-05-10 13:39:54

@Maizy Brasher has joined the channel

Jenny Phan (jphan5@wisc.edu)

2021-05-10 13:39:59

@Jenny Phan has joined the channel

Meng Huang (meng.huang.cn@gmail.com)

2021-05-10 13:41:18

@Meng Huang has joined the channel

Jung Chen (jchen378@ucmerced.edu)

2021-05-10 13:41:59

@Jung Chen has joined the channel

Stephanie Zellers (she/her/hers) (zelle063@umn.edu)

2021-05-10 13:42:18

@Stephanie Zellers (she/her/hers) has joined the channel

Grace Wu (yakew@email.unc.edu)

2021-05-10 13:42:32

@Grace Wu has joined the channel

Gladi Thng (s2124928@ed.ac.uk)

2021-05-10 13:43:48

@Gladi Thng has joined the channel

Zoe Schmilovich (zoe.schmilovich@mail.mcgill.ca)

2021-05-10 13:43:51

@Zoe Schmilovich has joined the channel

Olivia Rennie (olivia.rennie@alum.utoronto.ca)

2021-05-10 13:43:59

@Olivia Rennie has joined the channel

Christina Sheerin (Christina.sheerin@vcuhealth.org)

2021-05-10 13:44:00

@Christina Sheerin has joined the channel

William McAuliffe (williamhbmcauliffe@gmail.com)

2021-05-10 13:44:18

@William McAuliffe has joined the channel

Chloe Myers (cmyer011@ucr.edu)

2021-05-10 13:44:21

@Chloe Myers has joined the channel

Francis Vergunst (he/him) (francis.vergunst@umontreal.ca)

2021-05-10 13:44:34

@Francis Vergunst (he/him) has joined the channel

Ravi Bhatt (ravibot93@gmail.com)

2021-05-10 13:44:49

@Ravi Bhatt has joined the channel

Nathan Bell (n.y.bell@student.vu.nl)

2021-05-10 14:46:30

@Nathan Bell has joined the channel

Emil Uffelmann (e.uffelmann@vu.nl)

2021-05-10 14:46:48

@Emil Uffelmann has joined the channel

Kristen Kelly (k.m.kelly@vu.nl)

2021-05-10 14:47:50

@Kristen Kelly has joined the channel

Jeff Lessem (he/him) (jeff.lessem@colorado.edu)

2021-06-08 15:28:57

@Jeff Lessem (he/him) has renamed the channel from "rare-saige" to "day09-rare-saige"

Wei Zhou (wzhou@broadinstitute.org)

2021-06-16 14:21:12

Hi @channel, excited to see you tomorrow in the rare+SAIGE session! We will try 4 methods corresponding to the 4 videos, respectively, to perform genetic association tests for binary phenotypes. We will use Rstudio to run the commands. Here is the material for tomorrow’s practical https://github.com/weizhou0/ISGW_rare_SAIGE_hands_on/wiki/Day-9-Rare-and-SAIGE Please feel free to post any questions on this slack channel.

GitHub

weizhou0/ISGW_rare_SAIGE_hands_on

Contribute to weizhou0/ISGW_rare_SAIGE_hands_on development by creating an account on GitHub.

Original URL: https://github.com/weizhou0/ISGW_rare_SAIGE_hands_on/wiki/Day-9-Rare-and-SAIGE

👍 Giulio Centorame, Tetyana Zayats

👍:skin_tone_4: Pamela Romero

Shannon O'Connor (oconnors@montclair.edu)

2021-06-16 16:54:42

Hello! Will the lecture slides be made available? Thank you!

Wei Zhou (wzhou@broadinstitute.org)

2021-06-16 17:27:19

*Thread Reply:* Hi! They will be put on the website on the day’s page shortly. Thanks!

👍 Shannon O'Connor

Jeff Lessem (he/him) (jeff.lessem@colorado.edu)

2021-06-16 17:31:21

*Thread Reply:* Parts 2-4 are up, and part 1 will be added when it's available to me.

👍 Wei Zhou, Shannon O'Connor

Shannon O'Connor (oconnors@montclair.edu)

2021-06-16 17:41:33

*Thread Reply:* Great! Thank you!

Shannon D'Urso (s.durso@uq.edu.au)

2021-06-16 22:03:31

Hi! In the GWAS in large-scale biobanks and cohorts lecture, one of the limitations is that ‘asymptotic approaches were used to achieve scalability for large data sizes, whose performance may be poor when sample sizes are too small’. I was wondering how small is ‘too small’ and if you could elaborate on why this is the case?

Wei Zhou (wzhou@broadinstitute.org)

2021-06-17 13:50:46

*Thread Reply:* Hi, We have tried the sample size low to 1000 in the UKBB data and it still works fine. It depends on how heavy the sample relatedness is in the data set.

Shannon D'Urso (s.durso@uq.edu.au)

2021-06-17 16:14:45

*Thread Reply:* Thank you for clarifying 🙂

Penelope Lind (penelope.lind@qimrberghofer.edu.au)

2021-06-17 00:04:34

@Wei Zhou I am trying to run Step 2 (set-based association tests) of Part 4 and get the error below. However, when I look at this location in my home drive the SKAT.so file is sitting there. Everything has worked up to there. How do I fix this? Error in dyn.load(file, DLLpath = DLLpath, ...) : unable to load shared object '/home/penell44/R/x86_64-pc-linux-gnu-library/4.0/SKAT/libs/SKAT.so': libR.so: cannot open shared object file: No such file or directory Calls: SPAGMMATtest ... tryCatch -> tryCatchList -> tryCatchOne -> <Anonymous> Timing stopped at: 0.007 0 0.014 Execution halted

Wei Zhou (wzhou@broadinstitute.org)

2021-06-17 06:55:36

*Thread Reply:* Hi, did you try to install SAIGE on the cluster or directly call the singularity?

Penelope Lind (penelope.lind@qimrberghofer.edu.au)

2021-06-17 15:28:56

*Thread Reply:* I ran this using the singularity

Penelope Lind (penelope.lind@qimrberghofer.edu.au)

2021-06-17 17:11:05

*Thread Reply:* All good - it worked this morning.

Jet Termorshuizen (jet.termorshuizen@ki.se)

2021-06-17 01:20:50

Hi! When is a case-control ratio considered "unbalanced", and is it suitable to use the saddlepoint approximation (SPA) test? Is the border a ratio of 1:5?

👍 Abigail ter Kuile

Wei Zhou (wzhou@broadinstitute.org)

2021-06-17 06:54:54

*Thread Reply:* Usually we start seeing inflation when case-control ratio is < 1:10

Giulio Centorame (giulio.centorame@outlook.it)

2021-06-17 05:53:27

Hi, I am very intrigued by the Phecodes, would you mind expanding a little bit on what they are and what is the difference from manually-constructed phenotypes? Can you list some examples?

Wei Zhou (wzhou@broadinstitute.org)

2021-06-17 06:49:21

*Thread Reply:* phecodes is a curated database to help map ICD codes to diseases https://phewascatalog.org/phecodes_icd10

Wei Zhou (wzhou@broadinstitute.org)

2021-06-17 06:53:06

*Thread Reply:* Here is a paper on phecodes https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0175508

journals.plos.org

Evaluating phecodes, clinical classification software, and ICD-9-CM codes for phenome-wide association studies in the electronic health record

Objective To compare three groupings of Electronic Health Record (EHR) billing codes for their ability to represent clinically meaningful phenotypes and to replicate known genetic associations. The three tested coding systems were the International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM) codes, the Agency for Healthcare Research and Quality Clinical Classification Software for ICD-9-CM (CCS), and manually curated “phecodes” designed to facilitate phenome-wide association studies (PheWAS) in EHRs. Methods and materials We selected 100 disease phenotypes and compared the ability of each coding system to accurately represent them without performing additional groupings. The 100 phenotypes included 25 randomly-chosen clinical phenotypes pursued in prior genome-wide association studies (GWAS) and another 75 common disease phenotypes mentioned across free-text problem lists from 189,289 individuals. We then evaluated the performance of each coding system to replicate known associations for 440 SNP-phenotype pairs. Results Out of the 100 tested clinical phenotypes, phecodes exactly matched 83, compared to 53 for ICD-9-CM and 32 for CCS. ICD-9-CM codes were typically too detailed (requiring custom groupings) while CCS codes were often not granular enough. Among 440 tested known SNP-phenotype associations, use of phecodes replicated 153 SNP-phenotype pairs compared to 143 for ICD-9-CM and 139 for CCS. Phecodes also generally produced stronger odds ratios and lower p-values for known associations than ICD-9-CM and CCS. Finally, evaluation of several SNPs via PheWAS identified novel potential signals, some seen in only using the phecode approach. Among them, rs7318369 in PEPD was associated with gastrointestinal hemorrhage. Conclusion Our results suggest that the phecode groupings better align with clinical diseases mentioned in clinical practice or for genomic studies. ICD-9-CM, CCS, and phecode groupings all worked for PheWAS-type studies, though the phecode groupings produced superior results.

Original URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0175508

Abigail ter Kuile (k1456980@kcl.ac.uk)

2021-06-17 05:57:49

Thanks for the great lectures! What are the advantages of using SAIGE for GWAS of common variants in binary phenotypes over the more recently developed methods Regenie and FastGWA-GLMM?

Wei Zhou (wzhou@broadinstitute.org)

2021-06-17 14:02:21

*Thread Reply:* This is a great questions. These methods have pros and cons. It would be nice to systematically compare them in different scenarios. SAIGE uses Average Information REML to fit the null logistic mixed model, which is different from what Regenie uses. Regenie improves the computation efficiency by running multiple phenotypes together that needs to impute the missing phenotypes when analyzing together for the same samples, which may not be the ideal approach for some data sets. FastGWA-GLMM fits the null logistic mixed model using the sparse GRM instead of a full GRM. It will be certainly much faster than using a full GRM. This works well for some data sets, such as UKBB, with light sample relatedness, but for data with very heavy sample relatedness, using sparse GRM is not quite feasible. BTW, SAIGE can also fit the null model using a sparse GRM with the argument --useSparseGRMtoFitNULL

Lucía de Hoyos (Lucia.DeHoyos@mpi.nl)

2021-06-17 06:43:17

Hi, in the lectures you mention sparse GRM matrix? How can you obtain these? Any references on that topic?

Wei Zhou (wzhou@broadinstitute.org)

2021-06-17 06:46:36

*Thread Reply:* SAIGE-GENE has the step 0 script to generate a sparse GRM https://github.com/weizhou0/ISGW_rare_SAIGE_hands_on/wiki/Part-4-SAIGE-GENE#step-0-creating-a-sparse-grm There are also other programs that can be used to generate GRM, such as GCTA and KING https://kingrelatedness.com/

GitHub

weizhou0/ISGW_rare_SAIGE_hands_on

Contribute to weizhou0/ISGW_rare_SAIGE_hands_on development by creating an account on GitHub.

Original URL: https://github.com/weizhou0/ISGW_rare_SAIGE_hands_on/wiki/Part-4-SAIGE-GENE#step-0-creating-a-sparse-grm

Lucía de Hoyos (Lucia.DeHoyos@mpi.nl)

2021-06-17 06:52:28

*Thread Reply:* Thank you for the links 😊 , are there also papers that describe sparse GRM matrices in more detail?

matthew keller (matthew.c.keller@gmail.com)

2021-06-17 10:43:03

*Thread Reply:* A sparse GRM is one in which off-diagonal values of pihat that are small enough (e.g., < .05) are set to 0. I think it was first described in Zaitlen et al (2013) PLoS Genetics paper.

👌 Lucía de Hoyos

Lucía de Hoyos (Lucia.DeHoyos@mpi.nl)

2021-06-18 02:00:01

*Thread Reply:* Okay, thank you, Matthew.

Kai Lim (he/him) (kai.lim@kcl.ac.uk)

2021-06-17 07:28:57

Hi! I have a question related to the practical. You probably have covered this, but do you mind elaborate a little more about what is meant/happening when we “call the singularity container of SAIGE/SAIGE-GENE”?

Jeff Lessem (he/him) (jeff.lessem@colorado.edu)

2021-06-17 07:40:51

*Thread Reply:* The brief answer is when you call singularity you are setting up a special environment designed to run SAIGE.

The longer answer is below.

There are some definitions to get out of the way first. A virtual machine is a computer that is emulated by another computer. So, for example a way to run Linux within your Windows computer.

A container is sort of a mini-virtual machine. It is like a zip file which contains all of the files and programs necessary to do some task.

SAIGE depends on certain versions of python and R, so it is easiest to install as a container, so it doesn't interfere with other things that require different versions of python and R.

"singularity" is just a container running method. Perhaps you've heard of docker or kubernetes as other methods of running containers.

So when you call the singularity container for SAIGE, what your doing is "booting" another computer (which is just emulated in the computer your logged into) which is running a system setup in the special way necessary to run SAIGE.

I don't know if that is more or less or different than what you wanted to know.

Brooke Wolford (bwolford@umich.edu)

2021-06-17 07:56:35

Some more info on inverse normalization which you may have noticed was recommended for quantitative traits: • https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2921808/ • https://cran.r-project.org/web/packages/RNOmni/vignettes/RNOmni.html

PubMed Central (PMC)

Rank-Based Inverse Normal Transformations are Increasingly Used, But are They Merited?

Many complex traits studied in genetics have markedly non-normal distributions. This often implies that the assumption of normally distributed residuals has been violated. Recently, inverse normal transformations (INTs) have gained popularity among genetics ...

Original URL: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2921808/

Anna Furtjes (anna.furtjes@kcl.ac.uk)

2021-06-17 08:59:31

I was wondering if you could please share a reference discussing that heritability estimates from LMMs are not accurate, but that it's okay for genetic correlations? Thank you 🙂

Wei Zhou (wzhou@broadinstitute.org)

2021-06-17 09:04:04

*Thread Reply:* I haven’t find a published paper to discuss it. The notes on the wiki page of LDSC https://github.com/bulik/ldsc/wiki has mentioned it

GitHub

bulik/ldsc

LD Score Regression (LDSC). Contribute to bulik/ldsc development by creating an account on GitHub.

Original URL: https://github.com/bulik/ldsc/wiki

Anna Furtjes (anna.furtjes@kcl.ac.uk)

2021-06-17 09:17:30

*Thread Reply:* Interesting, thank you!

Zoe Schmilovich (zoe.schmilovich@mail.mcgill.ca)

2021-06-17 16:14:45

hello! if I am performing a logistic regression, is it recommended that I account for relatives in my sample by adding family ID as a random variable (ie.: (1|FID)) or to include the GRM as a variable? thank you!

Wei Zhou (wzhou@broadinstitute.org)

2021-06-17 18:16:39

*Thread Reply:* Hello! It depends what sample relatedness you’d like to account for. Using (1|FID) is accounting for sample relatedness within families, while GRM is account for sample relatedness between each sample pairs in the data, no matter whether sample are in the same family or not.

👍 Zoe Schmilovich

Katerina Zorina-Lichtenwalter (kazo7929@colorado.edu)

2021-06-17 18:00:10

I am wondering why the theta output in Step1 of SAIGE is not a good estimate of heritability even though it estimates the variance in the phenotype explained by the GRM. And then, for what purpose may it be used?

Wei Zhou (wzhou@broadinstitute.org)

2021-06-17 18:13:41

*Thread Reply:* Great question! Tau is a vector with 2 elements. The first element is for the variance component parameter for the error term and the second one is for the GRM (genetic relationship matrix). Tau can be extracted from the null model of SAIGE results by R load("model.rda"); tau = modglmm$theta For quantitative traits from the linear mixed model, h2 = tau[2]/(tau[1]+tau[2]) For binary traits from the logistic mixed model (tau[1] is always 1), h2_liability = tau[2]/(tau[2]+pi^2/3) . But note that the heritability is the point estimate for proportion of variance of the phenotype explained by the GRM, which is not equal to the heritability explained using LDSC. Also, we have noticed that the h2 estimate for binary traits by SAIGE is underestimated and the penalized quasi-likelihood used in SAIGE for fitting the null logistic model is known to be biased for heritability estimation but it works well for adjusting for sample-relatedness.

Katerina Zorina-Lichtenwalter (kazo7929@colorado.edu)

2021-06-17 18:22:21

*Thread Reply:* hmmmm. I'll have to think more about this 🙂 It's comparable to the h2 estimated in BOLT-LMM, right? which is also not a good estimator of h2?

Wei Zhou (wzhou@broadinstitute.org)

2021-06-18 08:45:50

*Thread Reply:* Yes, sorry i forgot to mention for quantitative traits using the linear mixed mdoels, the heritability estimates in SAIGE and BOLT-LMM are the same. But for binary phenotypes using logistic mixed models, the h2 in SAIGE is underestimated

Katerina Zorina-Lichtenwalter (kazo7929@colorado.edu)

2021-06-18 10:21:19

*Thread Reply:* ahh, ok. Got it!

Public Channels

Private Channels

Direct Messages

Group Direct Messages