Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]: Better QQ plots #11

Open
kellijohnson-NOAA opened this issue Mar 22, 2023 · 11 comments
Open

[Feature]: Better QQ plots #11

kellijohnson-NOAA opened this issue Mar 22, 2023 · 11 comments
Assignees
Labels
status--In progress Currently being worked on topic--diagnostics type--Bug Non-standard behavior is present in the code
Milestone

Comments

@kellijohnson-NOAA
Copy link
Collaborator

I'm not sure what the evolving best practice is here. These (randomized quantal residuals with the Laplace approximation) and DHARMa residuals with the Laplace approximation can definitely look off even if the model is consistent with the data. MCMC residuals with fixed effects at their MLE seem to work well but take longer. That's what they used here. ?residuals.sdmTMB https://pbs-assess.github.io/sdmTMB/articles/web_only/residual-checking.html

Note that because of esoteric problems on CRAN, we had to split that out into its own (non-CRAN) package.

samps <- sdmTMBextra::predict_mle_mcmc(fit, mcmc_iter = 201, mcmc_warmup = 200)
mcmc_res <- residuals(fit_nb2, type = "mle-mcmc", mcmc_samples = samps)
qqnorm(mcmc_res);qqline(mcmc_res)

In a perfect world you'd sample with more MCMC iterations first.

Originally posted by @seananderson in #10 (comment)

@kellijohnson-NOAA kellijohnson-NOAA changed the title Better QQ plots (randomized quantal residuals with the Laplace approximation) and DHARMa residuals with the Laplace approximation can definitely look off even if the model is consistent with the data. MCMC residuals with fixed effects at their MLE seem to work well but take longer. That's what they used [here](https://doi.org/10.1002/eap.2453). ?residuals.sdmTMB https://pbs-assess.github.io/sdmTMB/articles/web_only/residual-checking.html Better QQ plots Mar 22, 2023
@kellijohnson-NOAA kellijohnson-NOAA added this to the V0.0.2 milestone Mar 22, 2023
@kellijohnson-NOAA kellijohnson-NOAA added the status--In progress Currently being worked on label Mar 22, 2023
@kellijohnson-NOAA kellijohnson-NOAA self-assigned this Mar 22, 2023
@seananderson
Copy link
Collaborator

On the topic of QQ plots, I fixed a bug in residuals.sdmTMB() yesterday where the offset wasn't getting included. This also affected sdmTMBextra::predict_mle_mcmc() and sdmTMBextra::dharma_residuals(). It looks like this code base is using an offset term. Sorry about that. Usually it makes sense to drop the offset when predicting on new data (predict for 1 effort unit or offset = 0), but obviously it doesn't make sense for residuals. It looks like I had tweaked the predict() call within the residuals function to use the newdata option when I added delta models and that version (vs. newdata = NULL) drops the offset by default. I'll get this onto CRAN shortly, but for now the GitHub version is fixed.

@kellijohnson-NOAA
Copy link
Collaborator Author

Check that Q-Q plots are unique to a model because changing the distribution doesn't appear to change the resulting figure even if the model has a difference in likelihood.

An additional thought, from an email request ... A plot of quantile residual (y axis) vs predicted would be a useful diagnostic.

@kellijohnson-NOAA
Copy link
Collaborator Author

Thanks @seananderson for updating the codebase. I see that the recommended type is not actually the default residual type. I tried the "mle-mcmc" with a delta model and my R session crashed. I will do some checking to see if this type is supported with a delta model. Currently the figure is being generated with the "mle-laplace" residuals, which is quick!

And, @okenk the figures that were placed on the network originally were only the qq plots for the first model component, not the rate model. I have updated to plotting both of them on the same figure. These are qq plots from canary for 2023 with the lognormal on the left and the gamma on the right.
image

@kellijohnson-NOAA kellijohnson-NOAA added type--Bug Non-standard behavior is present in the code topic--diagnostics labels Jun 20, 2023
@okenk
Copy link

okenk commented Jun 20, 2023

Thanks! Also great that lognormal looks better since we had already picked that one. :)

@okenk
Copy link

okenk commented Jun 20, 2023

Also tagging @brianlangseth-NOAA so he sees this.

@chantelwetzel-noaa
Copy link
Collaborator

@kellijohnson-NOAA Thank you for continuing to work on improving our diagnostic tools. Providing a QQ line for each of the models makes it easier to identify the error structure that best fits the data. Can I request that on the plot we modify the labeling in the legend from model 1 and 2 to "presence-absence" and "rate"? I don't think there is a pressing need to modify this now but I think long-term that labeling would be the clearest for users and reviewers.

@kellijohnson-NOAA
Copy link
Collaborator Author

You bet @chantelwetzel-noaa, I altered the legend text (see below). And, for those at large I investigated stats::qqline(), which turns out to be different than abline(0,1), where I was doing the latter previously. Now, the figure has the equivalent of what would have been added to the figure from stats::qqline() but per model.
image

kellijohnson-NOAA added a commit that referenced this issue Jun 21, 2023
* stop using `abline(0,1)` and use the results of `stats::qqline()`
* color by model component
* return the ggplot2 object
* increase documentation
* fix code to run it in diagnose
* stop calculating the residuals ten thousand times
* pass just the data frame of residuals and raw data to `map_residuals()`
  rather than the fit object that was altered

Part of #11
@iantaylor-NOAA
Copy link
Collaborator

This is great, thank you @kellijohnson-NOAA.

Will it be possible to get these plots for a model that's already been run, or is it easier to just update the package once you've completed this change and re-run the models?

If you're taking more requests, can I request that the colors not include yellow which is just hard to see when projected on a screen?

@kellijohnson-NOAA
Copy link
Collaborator Author

I just pushed the code @iantaylor-NOAA to main so you just need to reinstall and run

load("petrale_sole/wcgbts/delta_lognormal/index/sdmTMB_save.RData")
plot_qq(fit, "petrale_sole/wcgbts/delta_lognormal/index/qq.png")

I am for sure open to discussion about the colors but they have strategically been chosen by {viridis} to be easily accessible to those with disabilities or when printed in black and white.

@chantelwetzel-noaa
Copy link
Collaborator

Sometimes when {viridis} when only plotting two items where the colors default to that purple and yellow, I have modified the begin and end inputs to be: begin = 0 and end = 0.5 that modifies the two colors to be that purple and a teal instead of the yellow. However, I don't recall any projection issues using that default yellow to date, so perhaps that is not entirely necessary.

iantaylor-NOAA added a commit to pfmc-assessments/petrale that referenced this issue Jun 21, 2023
@seananderson
Copy link
Collaborator

On residuals, I assume you've seen it, but just in case: https://pbs-assess.github.io/sdmTMB/articles/web_only/residual-checking.html

Yes, you can do MCMC residuals (with fixed effects at their MLE) with delta models. That's what they did in this paper. Those are currently set up for each model part independently. Those will be slow for large models. You may be able to get away with fewer iterations than the default and keep in mind that only the last iteration is kept as the residuals. I need to rethink the interface, because it currently forces you to sample from the model for each model component. You could sample yourself and then grab as many sets of residuals as you want all at once. I can point you to code if you are interested—it's basically pulling code out from the residuals.sdmTMB function.

There are DHARMa residuals (dharma_residuals(), now over in sdmTMBextra), which should be fast and easy to calculate for the combined delta-model. However, I believe they are known to indicate issues even when the model is correct and I assume this goes back to the Laplace approximation, which is also affecting the 'default' residuals. I'm not sure if @Cole-Monnahan-NOAA or @Andrea-Havron-NOAA have advice based on their work.

There are also one-step-ahead residuals, which aren't set up right now.

@kellijohnson-NOAA kellijohnson-NOAA changed the title Better QQ plots [Feature]: Better QQ plots Feb 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status--In progress Currently being worked on topic--diagnostics type--Bug Non-standard behavior is present in the code
Projects
None yet
Development

No branches or pull requests

5 participants