-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Interpretation of the results #5
Comments
[X] ABSENT: scripts/gsa_mixer_combine.py Notice that since the reported enrichment values are akin to odds ratios, we have to first take the log of the enrichment values before combining them (mean, standard deviations, etc.). My initial intent was to combine the enrichment values into a meta-analysis across the 20 replicates, with each replicate weighted by its standard error across the 100's of realizations. Unfortunately, since we have to work in the log-transformed space, we cannot simply take the log(enrich_std)... for that, we would need the enrichment values for each realization of each replicate... The R code below is intended as a temporary patch, to explore the results whilst a true solution to combine the replicates is provided.
|
@humanpaingeneticslab sorry again for all the mess with the codes, and for my slow response.
This is highlighted in the discussion section: The GSA-MiXeR tool has some limitations. First, it was not feasible to provide formal p-values due to technical reasons and difficulties in defining the null-hypothesis (see Supplementary Information), thus GSA-MiXeR relies on the MAGMA tool to pre-filter the set of gene-sets for the most conservative analysis. Our exploratory analysis (without pre-filtering by MAGMA) selects gene-sets based on AIC criteria, which does not allow for multiple testing correction; we however confirmed that ranking gene-sets according to GSA-MiXeR fold enrichment is at least as stable as ranking according to conservatively defined MAGMA p-values; additionally, all estimates have SEs to evaluate their uncertainty. Second, SEs are derived from the likelihood function, and may in some cases be not well calibrated, particularly for genes with close to zero heritability estimate. Real-data analysis with GSA-MiXeR is unlikely to be affected by this due to filtering genes on a positive AIC value, which implies sufficient curvature of the log-likelihood around the MLE point and justifies hessian-based SEs estimation. I'm keeping this ticket open, good to discuss this further as interpretation of GSA_MiXeR results might be tricky due to limitations outlined above! |
Hello,
Is this the correct interpretation of the results?
Column output for gsa-mixer include 'enrich' and 'enrich_std'.
Was expecting 'enrich_se' and/or 'enrich_pval'.
Can we do:
enrich_se = enrich_std / sqrt( 100 )
enrich_Z = enrich / enrich_se
enrich_pval = 2*pnorm( -abs( enrich_Z ) )
Fig. 2A of the paper states "with error bars showing Hessian-based standard error of the model estimates"; so these error bars would have been computed like 'enrich_se' above, right? Because the standard deviations were based on "we sample N = 100 realizations ... and report the standard deviations across the realizations".
Ah, I should instead use the 'gsa_mixer_combine.py' script, that does it all?
"The main branch of gsa-mixer does not contain the path scripts/gsa_mixer_combine.py." :-(
Solved? "MIXER_FIGURES_PY combine ..."
The text was updated successfully, but these errors were encountered: