Differences

This shows you the differences between two versions of the page.

--- workshop:2016:questions [2016/03/11 14:40]
65.114.233.215
+++ workshop:2016:questions [2016/03/11 16:31] (current)
65.114.233.215
@@ Line 2: / Line 2: @@
 <note warning>
-[[http://goo.gl/forms/8mhsgL2mXV|Ask questions with this Google form.]]
+Ask questions at the [[http://openmx.psyc.virginia.edu/forums|OpenMx forums]] and the [[http://openmx.psyc.virginia.edu/forums/openmx-help/teaching-sem-using-openmx/boulder-workshop-2016|dedicated 2016 Workshop forum]] there.
 </note>
@@ Line 24: / Line 24: @@
 ??? 2. In her folder from Monday, Hermine Maes provided many different scripts, which is absolutely great! Is is possible to give some explanation to these scripts and in what they differ (except of ACE, ADE & SAT which is quite clear)?
 !!! MN: Hermine's computer died today so she can't help directly.   There is a table describing which script is which is here: http://ibg.colorado.edu/cdrom2016/hmaes/UnivariateAnalysis/  It links to the files from the table so you can navigate easily.
+Also, Hermine's webpage table has links for scripts for univariate/monophenotype [one] and bivariate/diphenotype [two] twin analyses.  The file names reflect a naming convention and model details.  The naming convention reflects the following features:
+a- Analysis for 2 zygosity groups without any covariates and with an age covariate [a]
+b- Analysis for 5 groups without any covariates and with age covariate [a]
+- Each set  includes a saturated model (SAT with sub models testing assumptions), an ACE model (with sub models) and an ADE model (with sub models).
+- Analyses for specific types of variables: continuous (c), binary (b), ordinal (o), mean/variance ordinal (m) and joint (j).  The joint analyses refer to bivariate analyses where one measure is continuous and the other is ordinal.
+Two examples of the naming convention:
+- The file called "twoACEj" is a script for a bivariate ACE model for joint data (continuous and ordinal measures) and has no covariates. This script also only uses data from two zygosity groups.
+- The file called "oneSAT5ca" is a script for a univariate saturated model using continuous data with age as a covariate. This script uses data from 5 zygosity groups.
 ??? 3. In classic Mx, we were able to include twin pairs with missing data. Can you clarify that we need to either drop twin pairs with any missing data or replace missing values with the mean? Are we not able to define a missing value and include that individual in the model?
@@ Line 58: / Line 71: @@
 Yang, J., Bakshi, A., Zhu, Z., Hemani, G., Vinkhuyzen, A. A., Lee, S. H., . . . Visscher, P. M. (2015). Genetic variance estimation with imputed variants finds negligible missing heritability for human height and body mass index. Nature Genetics, 47(10), 1114-1120. doi:[[http://www.nature.com/ng/journal/v47/n10/full/ng.3390.html|10.1038/ng.3390]]
-MCK: Right, but note that the Yang 2015 paper used *imputed* SNPs that were thereby much rarer than the SNPs usually included on arrays. They picked up additional variation due to rare causal variants that were better tagged by these imputed SNPs. So they are picking up additional rare causal variant variation that would NOT usually have been picked up using GCTA just on SNPs on arrays. So this isn't equivalent to the usual way of running GCTA. In essence, if we had all the sequence variation (instead of just imputed SNPs), we'd be able to pick up 100% of variation due to both rare and common variants.
+MCK: Right, but note that the Yang 2015 paper used *imputed* SNPs that were thereby much rarer on average than the SNPs usually included on arrays. They picked up additional variation due to rare causal variants that were better tagged by these rare imputed SNPs. So they are picking up additional rare causal variant variation that would NOT usually have been picked up using GCTA just on SNPs on arrays. So this isn't equivalent to the usual way of running GCTA. In essence, if we had all the sequence variation (instead of just imputed SNPs), we'd be able to pick up 100% of variation due to both rare and common variants.
 ??? 9. Could Matt or someone please expand on how to address ethnic heterogeneity in GCTA? On the slide on assumptions in estimating heritability, the options include PCA or analyzing cases and controls separately. Could you please provide more detail on these two strategies?
-!!! First off, you'd want to have a sample that is relatively ethnically homogeneous; e.g., analyzing a sample of mixed ethnicity can lead to problems even if you correct for stratification because the CV-SNP LD will be different between the groups. So, now that we have an ethnically homogeneous sample, the most common way people control for any additional stratification (e.g., subtle differences between Caucasians on a north-south Europe gradient) is to add 5-20 ancestry principal components into the fixed part of the model. This should correct for any effect broad-level stratification has on your estimates.
+!!! MCK: First off, you'd want to have a sample that is relatively ethnically homogeneous; e.g., analyzing a sample of mixed ethnicity can lead to problems even if you correct for stratification because the CV-SNP LD will be different between the groups. So, now that we have an ethnically homogeneous sample, the most common way people control for any additional stratification (e.g., subtle differences between Caucasians on a north-south Europe gradient) is to add 5-20 ancestry principal components into the fixed part of the model. This should correct for any effect broad-level stratification has on your estimates.
 ??? 10.  What exactly is the issue in using GREML with ascertained samples? Would it be appropriate for a continuous trait within a clinical group?
 !!! For background, see [[http://www.pnas.org/content/111/49/E5272|these]] [[http://www.pnas.org/content/112/40/E5452|papers]].
-Rob K. says: I would tentatively answer your second question with a "yes," as long as you have a sample that is representative of the population of patients who meet diagnostic criteria for (whatever disorder).  Obviously, the generalizability of your results to the general population would be highly questionable.
+Rob K. says: I would tentatively answer your second question with a "yes," as long as you have a sample that is representative of the population of patients who meet diagnostic criteria for (whatever disorder).  I'm not altogether sure, though.  Obviously, the generalizability of your results to the general population would be highly questionable.
 ??? 11. If we get a code Mx status RED, what do we need to do/consider (in general)? E.g., are our model estimates still reliable?
-!!!
+!!! Rob K. says:  Status RED means the optimizer is not certain it has found a minimum of the fitfunction.  So, no, your parameter estimates are probably not reliable, and the standard errors are even more suspect.  Status RED with code 6 (first-order conditions not met) is worse than with code 5 (second-order conditions not met).  You should always try to do //something// about status RED, for instance:
+  * If you're analyzing ordinal data, sometimes a status RED is unavoidable without changing some [[http://openmx.psyc.virginia.edu/docs/OpenMx/latest/_static/Rdoc/mxOption.html|mxOptions]].  In fact, it's possible to get status RED with ordinal data even when the optimizer //has// found a minimum.
+  * Use different start values.
+  * Try a different optimizer.
+  * Reparameterize your MxModel.
+  * Use [[http://openmx.psyc.virginia.edu/docs/OpenMx/latest/_static/Rdoc/mxTryHard.html|mxTryHard()]] or one of its wrapper functions.  Note that, by default, mxTryHard() prints to console the start values it used to find the best solution it found.  The idea is that you copy-paste those start values into your script, and assign them to your pre-mxRun() model, using [[http://openmx.psyc.virginia.edu/docs/OpenMx/latest/_static/Rdoc/omxSetParameters.html|omxSetParameters()]].
+Trying different start values is the most important thing.
 ??? 12. For Ben's presentation, where (papers, websites?) are the graphs on slides 17, 22, & 28 located? They are from published studies, yeah?
-!!! MCK: I'll let Ben way in as well. But in essence, they should be two different ways of estimating the *same* parameter, e.g., SNP-heritability. I think that the differences in the literature are about what we'd expect given the SE's on the estimates. Ben - are there any systematic differences between the two?
+!!! Ben here - Here's the original LD Score MS: http://www.nature.com/ng/journal/v47/n3/full/ng
-??? 13.  Can we have a short summary of how the estimates from GCTA and LD regression differ?
+???13. What is the difference in interpretation between GCTA h2 estimates and LD score regression h2 estimates?
+!!!MCK: I'll let Ben weigh in as well. But in essence, they should be two different ways of estimating the *same* parameter: SNP-heritability. I think that the differences in the literature are about what we'd expect given the SE's on the estimates. Ben - are there any systematic differences between the two?
+??? 14. @Sarah where can we find the full syntax of this morning's assumption testing (so the syntax with the right answers)? thanks!
 !!!
+??? 15. Where can researchers find publicly available twin data?
+!!! One place a researcher can begin searching for publicly available data is the repository developed and managed by the Inter-university Consortium for Political and Social Research (ICPSR).  You can search for "twins" and datasets that have twin data collected and available for dissemination will be listed.  http://www.icpsr.umich.edu/
+As an aside, another publicly available resource towards developing harmonized measures of biomedical phenotypes is the PhenX toolbox (https://www.phenxtoolkit.org/).  No data for analysis is available for download at this website, but it is good if you are thinking about study design of a project.
+MCK: Great question! Nick, Dorret, John?? Want to weight in?
+Sadly (I think), twin research has not kept up with whole-genome research, where sharing of data is not only the norm, but also mandatory (for NIH funding at least). Would be scientifically useful if the same were true of twin research!
 ====== Questions/comments from Thursday ======

IBG Wiki

User Tools

Site Tools

Differences

Page Tools