Jeff Lessem (he/him) (jeff.lessem@colorado.edu)
2021-04-26 10:20:31

@Jeff Lessem (he/him) has joined the channel

Danielle (danielle.posthuma@gmail.com)
2021-04-30 09:58:58

@Danielle has joined the channel

M Nagel (m.nagel@vu.nl)
2021-04-30 09:58:58

@M Nagel has joined the channel

Doug Wightman (d.p.wightman@vu.nl)
2021-04-30 09:58:58

@Doug Wightman has joined the channel

Kristen Kelly (k.m.kelly@vu.nl)
2021-04-30 09:58:58

@Kristen Kelly has joined the channel

Christiaan de Leeuw (c.a.de.leeuw@vu.nl)
2021-04-30 09:58:58

@Christiaan de Leeuw has joined the channel

Elleke Tissink (e.p.tissink@vu.nl)
2021-04-30 09:58:58

@Elleke Tissink has joined the channel

Emil Uffelmann (e.uffelmann@vu.nl)
2021-04-30 09:58:59

@Emil Uffelmann has joined the channel

Gunn-Helen Moen (g.moen@uq.edu.au)
2021-05-03 13:13:41

@Gunn-Helen Moen has joined the channel

Mark Adams (mark.adams@ed.ac.uk)
2021-05-04 02:16:50

@Mark Adams has joined the channel

Test Student (test-student@ibg.colorado.edu)
2021-05-06 11:38:57

@Test Student has joined the channel

Bridget Joyner (bnj13@my.fsu.edu)
2021-05-10 13:00:05

@Bridget Joyner has joined the channel

Sally Kuo (ickuo@vcu.edu)
2021-05-10 13:30:20

@Sally Kuo has joined the channel

Aislinn Bowler (aislinnbowler@gmail.com)
2021-05-10 13:30:27

@Aislinn Bowler has joined the channel

Morgan Driver (driverm@vcu.edu)
2021-05-10 13:31:03

@Morgan Driver has joined the channel

Sarah Brislin (she/her) (sarah.brislin@gmail.com)
2021-05-10 13:31:37

@Sarah Brislin (she/her) has joined the channel

Lisa Dinkler (lisa.dinkler@gu.se)
2021-05-10 13:31:42

@Lisa Dinkler has joined the channel

Katie Bountress (kaitlin.bountress@vcuhealth.org)
2021-05-10 13:32:20

@Katie Bountress has joined the channel

Peter Tanksley (peter.tanksley@austin.utexas.edu)
2021-05-10 13:32:32

@Peter Tanksley has joined the channel

Tong Chen (tuc548@psu.edu)
2021-05-10 13:34:04

@Tong Chen has joined the channel

Charlotte Viktorsson (viktorsson.charlotte@gmail.com)
2021-05-10 13:34:34

@Charlotte Viktorsson has joined the channel

Jacob Kunkel (kunke104@umn.edu)
2021-05-10 13:35:31

@Jacob Kunkel has joined the channel

Matthieu de Hemptinne (matthieu.dehemptinne@gmail.com)
2021-05-10 13:35:59

@Matthieu de Hemptinne has joined the channel

Jay Ross (jay.ross@mail.mcgill.ca)
2021-05-10 13:38:33

@Jay Ross has joined the channel

Sam Freis (she/her) (Samantha.Freis@colorado.edu)
2021-05-10 13:38:41

@Sam Freis (she/her) has joined the channel

Jeremy Elman (jaelman@health.ucsd.edu)
2021-05-10 13:38:55

@Jeremy Elman has joined the channel

Maizy Brasher (mabr7162@colorado.edu)
2021-05-10 13:39:52

@Maizy Brasher has joined the channel

Spencer Moore (spmo3925@colorado.edu)
2021-05-10 13:39:52

@Spencer Moore has joined the channel

Jenny Phan (jphan5@wisc.edu)
2021-05-10 13:39:58

@Jenny Phan has joined the channel

Meng Huang (meng.huang.cn@gmail.com)
2021-05-10 13:41:17

@Meng Huang has joined the channel

Jung Chen (jchen378@ucmerced.edu)
2021-05-10 13:41:58

@Jung Chen has joined the channel

Stephanie Zellers (she/her/hers) (zelle063@umn.edu)
2021-05-10 13:42:17

@Stephanie Zellers (she/her/hers) has joined the channel

Grace Wu (yakew@email.unc.edu)
2021-05-10 13:42:31

@Grace Wu has joined the channel

Gladi Thng (s2124928@ed.ac.uk)
2021-05-10 13:43:47

@Gladi Thng has joined the channel

Zoe Schmilovich (zoe.schmilovich@mail.mcgill.ca)
2021-05-10 13:43:50

@Zoe Schmilovich has joined the channel

Olivia Rennie (olivia.rennie@alum.utoronto.ca)
2021-05-10 13:43:57

@Olivia Rennie has joined the channel

Christina Sheerin (Christina.sheerin@vcuhealth.org)
2021-05-10 13:43:59

@Christina Sheerin has joined the channel

William McAuliffe (williamhbmcauliffe@gmail.com)
2021-05-10 13:44:17

@William McAuliffe has joined the channel

Chloe Myers (cmyer011@ucr.edu)
2021-05-10 13:44:20

@Chloe Myers has joined the channel

Francis Vergunst (he/him) (francis.vergunst@umontreal.ca)
2021-05-10 13:44:33

@Francis Vergunst (he/him) has joined the channel

Ravi Bhatt (ravibot93@gmail.com)
2021-05-10 13:44:48

@Ravi Bhatt has joined the channel

Nathan Bell (n.y.bell@student.vu.nl)
2021-05-10 14:46:29

@Nathan Bell has joined the channel

Jeff Lessem (he/him) (jeff.lessem@colorado.edu)
2021-06-08 15:29:10

@Jeff Lessem (he/him) has renamed the channel from "pathway-and-gene-based-analyses" to "day10-pathway-and-gene-based-analyses"

Christiaan de Leeuw (c.a.de.leeuw@vu.nl)
2021-06-17 04:26:46

Dear all,

Here are the commands you will need tomorrow to get started with the gene-set analysis practical: mkdir friday2 cd friday2 cp /faculty/christiaan/Boulder2021/magma_session.zip . unzip magma_session.zip cd magma_session

And attached for your convenience also the practical instructions (these are also in the .zip file)

Christiaan de Leeuw (c.a.de.leeuw@vu.nl)
2021-06-17 04:29:59

In addition, here are the worked-out answers for the practical. It is of course highly recommended that you only check these after you have done the practical yourself.

Christiaan de Leeuw (c.a.de.leeuw@vu.nl)
2021-06-17 07:08:27
Kristen (kristenlhopkins@gmail.com)
2021-06-17 20:55:14

Good afternoon, is it possible to get the lecture slides in PPT or PDF format please? I can't find these on the syllabus page. It makes it a lot easier to take notes while watching videos. Thank you.

Danielle (danielle.posthuma@gmail.com)
2021-06-18 01:32:22

*Thread Reply:* I will check with @Jeff Lessem (he/him) if he is posting them!

Danielle (danielle.posthuma@gmail.com)
2021-06-18 01:36:13

*Thread Reply:* Since Jeff is at a different timezone, i am posting the ppts’s here. They do have audio on them, which you can disable if you want (these are the same slides as used to create the the Youtube vids which have edited subtitles, so only use these pptx’s if you prefer the original slides)

Jeff Lessem (he/him) (jeff.lessem@colorado.edu)
2021-06-18 06:53:33

*Thread Reply:* I've added the slides (without audio) to the webpage at https://www.colorado.edu/ibg/international-workshop/2021-international-statistical-genetics-workshop/syllabus/day-10-friday-june

Institute for Behavioral Genetics
🙏 Danielle
Lucía de Hoyos (Lucia.DeHoyos@mpi.nl)
2021-06-18 02:02:02

Hi, I have a question about today's videos. MAGMA uses principal component linear regression. Then, is it correct that you can fit any linear model to the data?

Danielle (danielle.posthuma@gmail.com)
2021-06-18 02:02:46

*Thread Reply:* you mean adding any covariates you like? -> yes!

Lucía de Hoyos (Lucia.DeHoyos@mpi.nl)
2021-06-18 02:30:57

*Thread Reply:* Thanks for your answer, Danielle 😊. I meant that instead of using PC linear regression (as in MAGMA) if you could use -in theory- a different kind of linear model (e.g. factor analyses).

Danielle (danielle.posthuma@gmail.com)
2021-06-18 02:33:43

*Thread Reply:* aha - that is not currently implemented, but feasible in theory @Christiaan de Leeuw can answer that in more detail

Christiaan de Leeuw (c.a.de.leeuw@vu.nl)
2021-06-18 03:12:26

*Thread Reply:* In principle yes. though as Danielle notes it does of course need separate implementation into MAGMA (both for the gene analysis itself, as well as to compute gene-gene correlations for use in subsequent gene-set analysis; the latter correlations need to be computed differently depending on the gene analysis model used).

Lucía de Hoyos (Lucia.DeHoyos@mpi.nl)
2021-06-18 03:30:21

*Thread Reply:* Okay, thanks Danielle and Christiaan. Nice. Yes, I just wanted to know if in theory that would work. 😃

Danielle (danielle.posthuma@gmail.com)
2021-06-18 02:13:12

@channel Last day of the workshop today! Looking forward to seeing you in the pathway session later this morning/afternoon/night. See instructions to copy over the files pinned to this channel, See ya!

👍 Angelica Ronald
Stella Tsotsi (stella.tsotsi@psykologi.uio.no)
2021-06-18 05:41:31

Hi, I have a question regarding CADD scores. Do they differ based on the specific population under investigation? I am thinking mostly in terms of ethnicity and sex.

Danielle (danielle.posthuma@gmail.com)
2021-06-18 06:07:17

*Thread Reply:* Good q! I am not a total expert on CADD scores, but theoretically they represent the (expected or predicted) deleteriousness of an allele substitution at a particular genomic location - so that should be population independent i’d say.

There is more info here: https://cadd.gs.washington.edu

cadd.gs.washington.edu
Stella Tsotsi (stella.tsotsi@psykologi.uio.no)
2021-06-18 06:11:19

*Thread Reply:* I see - thank you 🙂

Abigail ter Kuile (k1456980@kcl.ac.uk)
2021-06-18 05:59:48

Is there a threshold for the proportion of overlapping genes between associated gene sets in which you would then want to check for confounding by conducting interaction or conditional association analyses?

Danielle (danielle.posthuma@gmail.com)
2021-06-18 06:09:12

*Thread Reply:* hm not really, it also depends on the Z-scores of the genes whether overlap would influence gene-set P-valyes (i..e if there is 50% overlap but none of these genes are associated with the trait, than the overlap would be less relevant). We usually select sets for conditional analysis based on the marginal significant P-values of the sets

Christiaan de Leeuw (c.a.de.leeuw@vu.nl)
2021-06-18 06:13:55

*Thread Reply:* Indeed. It will depend a bit on your data of course, but particularly for the conditional analysis, we would often do those as a follow-up analysis on just the significant gene sets in a regular gene-set analysis (since if a gene set does not have a significant marginal assocation, there is rarely a good reason to do conditional analyses with it).

Christiaan de Leeuw (c.a.de.leeuw@vu.nl)
2021-06-18 06:15:51

*Thread Reply:* And in that case, the easiest strategy is often just to run all the pairwise conditional analyses; if there isn't much or any overlap between a pair of gene sets, then the p-values for those sets won't really change if you condition them on each other anyway.

Christiaan de Leeuw (c.a.de.leeuw@vu.nl)
2021-06-18 06:24:22

*Thread Reply:* For interaction analyses it depends more on your specific research aims. By default MAGMA is set to analyze interactions if at least 25 genes / 10% of genes overlap (both criteria need to be met, for both sets), but also for each gene set 25 genes / 10% of genes for that set are not shared with the other set.

👍 Abigail ter Kuile
Christiaan de Leeuw (c.a.de.leeuw@vu.nl)
2021-06-18 06:26:43

*Thread Reply:* Especially that percentage is relatively low though, in practice I would probably set this a bit higher (say, 20-25%). That also depends on the research aims, but as you can imagine if you have eg. 5000 gene sets and you're doing an exploratory interaction analysis the number of tests can add up quickly, it can be rather helpful to reduce the number of tested interactions somewhat by restricting analyses to reasonably overlapping pairs.

Christiaan de Leeuw (c.a.de.leeuw@vu.nl)
2021-06-18 06:27:53

*Thread Reply:* Not sure if it was mentioned in the slides, but this paper gives a detailed overview and guideline (see esp. the supplementals) on running these kinds of analyses, should you want to conduct your own: https://www.nature.com/articles/s41467-018-06022-6

Nature Communications
Abigail ter Kuile (k1456980@kcl.ac.uk)
2021-06-18 06:40:50

*Thread Reply:* Great, thank you both so much for the detailed response. Looking forward to the workshop!

Danielle (danielle.posthuma@gmail.com)
2021-06-18 09:20:46

Thanks @channel for an active participation during today’s session A - as said, we’ll be watching the channel and anonymous question box for a while to respond to any further questions you may have.

Don’t forget to fill out the evaluation John mentioned to keep future workshops going.

And - as mentioned there will be several job openings in my lab soon for PhD students and Postdocs, so if you are interested in post-GWAS analyses, single cell integration with GWAS, method development or just like science and Amsterdam, feel free to message here on Slack.

❤ Lydia Rader
Giulio Centorame (giulio.centorame@outlook.it)
2021-06-18 09:24:32

*Thread Reply:* Thank you for the great session!

😍 Anna Furtjes
Giacomo Bignardi (giacomo.bignardi@maxplanckschools.de)
2021-06-18 09:42:38

Hello Very clear and nice videos for gene path analysis today! 😊 I have two questions related to gene-set and gene-prop analysis.

1) relevant to the videos: you showed that MAGMA could consider undesirable statistical effects of gene-based association tests (LD, gene size, and the number of genes). I was wondering if gene-sets can also influence the observed significance of the association under the competitive test. Given that any random gene-set will be likely associated with the phenotype, is the composition of the control set going to influence the likelihood of picking up gene sets based on their “averageness” in a given pool of gene-set?

2) similarly to 1), but slightly unrelated to the content of the video: in gene prop analysis, is the composition of prop sets (e.g., brain expression profile) going to influence the likelihood of picking up signals from the most “different” tissue (e.g., artificially decreasing the association with expression profile in, for example, different areas of the cortex, while favoring the one that less represented, such as the cerebellum)?

Danielle (danielle.posthuma@gmail.com)
2021-06-18 09:58:41

*Thread Reply:* i am going to answer #2 and let @Christiaan de Leeuw take on #1 (and #2 if he likes 😉 ) For #2 that is a very valid question - any choices you make in constructing the scores to be used as indicators of gene-property will influence the results as well as the actual hypothesis you are testing. For example, if you normalize a gene’s expression value in a specific cell type by e.g. dividing it by the average expression across all cell types in that study - your normalized expression value is very dependent on what kind of cell types were included in that study - if only brain, your normalized expression value is in comparison to other brain cell types, but of other organs and celltypes are included the expression value is in relation to that. There currently is no consensus how best to do this, as it also depends on what kind of hypothesis you are interested in. We are currently comparing several different strategies for this. So those are general thoughts. but more directly related to your question: MAGMA picks up those gene-sets where the average statistical association of genes is stronger than of genes outside of the set, So if you already know that only genes expressed in brain are influencing your trait, it may be a bit cheating to test your favorite gene-set against a background of all other genes/tissue. That said - we often do not know yet which tissues are involved.

👍 Giacomo Bignardi
Danielle (danielle.posthuma@gmail.com)
2021-06-18 09:59:34

*Thread Reply:* and Hi Giacomo - great to ‘see’ you here 😉

❤ Giacomo Bignardi
Christiaan de Leeuw (c.a.de.leeuw@vu.nl)
2021-06-18 10:12:56

*Thread Reply:* With regard to your first question, by 'control set' you mean the totality of genes that are in your data (but not included in the gene set being tested)? If so, then yes this indeed can influence the results.

For example, suppose that we were to include not just the protein coding genes in our analysis, but also an assortment of RNA genes. Protein-coding genes for most phenotypes will tend to contain more genetic association than non-coding genes, so if we now test a gene-set containing only or primarily protein-coding genes against a mixture of protein- and non-coding genes, that gene set will tend to have a lower p-value just because of that.

Essentially this is an issue of confounding: we have a property, "being a protein-coding gene" that is associated with the outcome ("genetic association with the phenotype") and is also correlated with our gene set. Hence, this induces an association for that gene set regardless of whether it has any substantive role to play in your phenotype. And we can resolve this in the usual way as well, using conditonal analyses: we could create "protein-coding gene" gene set, and test the association of our gene sets of interest conditioning on that "protein-coding gene" set.

There are obviously a lot of ways in which this sort of thing can arise. Some of them are automatically corrected for, eg. MAGMA automatically includes gene size (and log(gene size) ) as covariates in any gene set analysis, to account for the possibility that eg. larger genes may tend to show stronger (or weaker) associations than smaller ones, and some gene sets will contain disproportionate numbers of larger or smaller genes. In general though, as with all confounding, it's something we need to explicitly be aware of and account for in the analysis (generally, by using some form of conditional analysis). So there may indeed be hidden biases in the associations of whatever genes we have included in the analysis that we are not aware of, that could induce spurious associations.

👍 Giacomo Bignardi
Giacomo Bignardi (giacomo.bignardi@maxplanckschools.de)
2021-06-18 13:56:57

*Thread Reply:* Thanks! Yes, that was precisely my question @Christiaan de Leeuw. This confirms my doubts. Knowing that the set of genes or prop can indeed induce some biases is very helpful. I’ll err on the side of caution from now on 🙂 @Danielle, I am looking forward to seeing the development of your test on the best strategies to overcome this problem (especially for scRNA-seq data!).

Danielle (danielle.posthuma@gmail.com)
2021-06-18 18:02:05

@channel Thanks everyone in session B for participating! We hope you enjoyed the session and the workshop as a whole, thanks for your questions and efforts, and hope we meet again at some point in person. Enjoy the weekend!

❤ Lucía Colodro-Conde, Jianing Yao, Ravi Bhatt, Maizy Brasher, Pamela Romero, Peter Tanksley, Svetlana Bivol, Jeremy Elman, Giacomo Bignardi, Matthieu de Hemptinne, Giulio Centorame, Zoe Schmilovich
🙂 Katerina Zorina-Lichtenwalter