-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove normal approximation #153
Conversation
Given instability of the normal approximation for many values (especially given asymmetric likelihood), and because binomial implementation is quick, this is being removed to ensure accurate outputs.
Output NA if total_outcomes<=total_deaths
Focus on early stage
Note: need to review the test functions to ensure consistency with updated messages and outputs. |
Could you run |
I've run |
Knitted vignettes appear to show equations OK once removed.
sorry for the delayed reply. Just created this reprex to reproduce the README figure. This PR successfully solved the issue of the point estimate and confidence interval. I would only add that this reprex also diagnoses that there are sections of the time series that do not generate an output, generating no estimates for certain date ranges, which are visible in the plot. # pak::pak("epiverse-trace/cfr@remove-normal")
# Load package
library(cfr)
library(ggplot2)
# Calculate the static CFR without correcting for delays
cfr_static(data = ebola1976)
#> severity_estimate severity_low severity_high
#> 1 0.955102 0.9210866 0.9773771
# Calculate the CFR without correcting for delays on each day of the outbreak
rolling_cfr_naive <- cfr_rolling(
data = ebola1976
)
#> `cfr_rolling()` is a convenience function to help understand how additional data influences the overall (static) severity. Use `cfr_time_varying()` instead to estimate severity changes over the course of the outbreak.
# Calculate the rolling daily CFR while correcting for delays
rolling_cfr_corrected <- cfr_rolling(
data = ebola1976,
delay_density = function(x) dgamma(x, shape = 2.40, scale = 3.33)
)
#> `cfr_rolling()` is a convenience function to help understand how additional data influences the overall (static) severity. Use `cfr_time_varying()` instead to estimate severity changes over the course of the outbreak.
#> Some daily ratios of total deaths to total cases with known outcome are below 0.01%: some CFR estimates may be unreliable.FALSE
# combine the data for plotting
rolling_cfr_naive$method <- "naive"
rolling_cfr_corrected$method <- "corrected"
data_cfr <- rbind(
rolling_cfr_naive,
rolling_cfr_corrected
)
# visualise both corrected and uncorrected rolling estimates
ggplot(data_cfr) +
geom_ribbon(
aes(
date,
ymin = severity_low, ymax = severity_high,
fill = method
),
alpha = 0.2, show.legend = FALSE
) +
geom_line(
aes(date, severity_estimate, colour = method)
) +
scale_colour_brewer(
palette = "Dark2",
labels = c("Corrected CFR", "Naive CFR"),
name = NULL
) +
scale_fill_brewer(
palette = "Dark2"
)
#> Warning: Removed 1 row containing missing values or values outside the scale range
#> (`geom_line()`). Created on 2024-07-11 with reprex v2.1.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case NA
may not be an expected outcome, we can use this reprex to check this behaviour after the corresponding fix. Here are the cfr_rolling
and two cfr_static
outputs with data until specific dates.
# pak::pak("epiverse-trace/cfr@remove-normal")
# Load package
library(cfr)
library(dplyr)
library(lubridate)
cfr_rolling(
data = ebola1976,
delay_density = function(x) dgamma(x, shape = 2.40, scale = 3.33)
) %>%
filter(is.na(severity_estimate))
#> `cfr_rolling()` is a convenience function to help understand how additional data influences the overall (static) severity. Use `cfr_time_varying()` instead to estimate severity changes over the course of the outbreak.
#> Some daily ratios of total deaths to total cases with known outcome are below 0.01%: some CFR estimates may be unreliable.FALSE
#> date severity_estimate severity_low severity_high
#> 1 1976-08-25 NA NA NA
#> 2 1976-09-28 NA NA NA
#> 3 1976-09-29 NA NA NA
#> 4 1976-09-30 NA NA NA
#> 5 1976-10-01 NA NA NA
#> 6 1976-10-02 NA NA NA
#> 7 1976-10-03 NA NA NA
#> 8 1976-10-04 NA NA NA
#> 9 1976-10-05 NA NA NA
#> 10 1976-10-06 NA NA NA
#> 11 1976-10-07 NA NA NA
#> 12 1976-10-08 NA NA NA
#> 13 1976-10-15 NA NA NA
ebola1976 %>%
filter(date<=ymd(19761001)) %>%
cfr_static(delay_density = function(x) dgamma(x, shape = 2.40, scale = 3.33))
#> Total deaths = 140 and expected outcomes = 134 so setting expected outcomes = NA. If we were to assume
#> total deaths = expected outcomes, it would produce an estimate of 1.
#> severity_estimate severity_low severity_high
#> 1 NA NA NA
ebola1976 %>%
filter(date<=ymd(19761015)) %>%
cfr_static(delay_density = function(x) dgamma(x, shape = 2.40, scale = 3.33))
#> Total deaths = 214 and expected outcomes = 214 so setting expected outcomes = NA. If we were to assume
#> total deaths = expected outcomes, it would produce an estimate of 1.
#> severity_estimate severity_low severity_high
#> 1 NA NA NA
Created on 2024-07-11 with reprex v2.1.0
Thanks for looking at this. Currently we have some situations where E(known outcomes) < deaths, and hence the likelihood as currently implemented isn't totally valid. One option would be to set |
That is interesting; thank you for adding an explicit explanation. In the meantime, would it be valid to get the message as a warning and, instead of NA, provide the latest or most recent estimated output at a given date? |
It would require quite a lot of additional refactoring to output the last valid estimate, as the README example is a showcase of multiple estimates at each point in time in a visualisation (i.e. a loop over The current message is displayed above (e.g. |
The current message displayed is appropriate and specific. This reflects that this is produced in an extreme scenario, as described in #154. Given that this PR already solved the key issue, I'll move on with the approval. As a complementary comment, I suggest adding to the message an explicit next step for the user. If, in an ongoing outbreak, we create a reproducible sitrep and suddenly get an We could add sth like:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚀 ready to merge given that this PR solved the main issue.
As commented in #153 (comment) this could optionally consider adding a next step in the output message for NA outputs from an extreme situation, as evaluated with data from ebola 1976, and until #154 manages to be solved.
Thanks, have merged and will create an issue with the above rolling suggestion. |
NEWS.md
This addresses issues #152 and #151
Instability with normal approximation in Ebola example
Normal approximation removed. This PR also updates tests for consistency with the removed functionality.
There are two additional changes:
No.