-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature request: Smooth survival curve #164
Comments
Hi @agdamsbo ! Thank you for the post with the thoughtful examples! I am certainly sympathetic to this need. But trying to think of a way to get this done with no impact for current users while not adding too many/any arguments to existing functions. Question: if you can't show steps in your results, are you allowed to show a risktable? The risktable seems to be far more explicit with the information that a few tick marks. |
Hi @ddsjoberg. So, the thing is, that the regulations are quite tight (as you might have guessed). I think, there are two different use cases: data evaluation and publication ready plots, right! In the first case, everything is fine, as I am allowed to look at all the data I have access to. Regarding creating publication ready plots (or tables for that matter as with your And regarding the smoothing, it may create a smoothed line that slope upwards in some sections, which I have not been able to correct (it is much more profound with "loess" than "gam"). Maybe using a different smoothing function in "gam" would help. My ultimate wish would be for a "no microdata mode" for both ggsurvfit and gtsummary. I'll just let that hang here and work on the modifications I need and share them, when ready. Ok, this was the long answer, I think. |
So, I had to look into that slope direction. Turns out there's a package, Updated example: library(tidyverse)
library(survival)
library(purrr)
library(ggsurvfit)
library(cobs)
## Data
plot.type <- "survival"
x <- survfit2(Surv(time, status) ~ surg, data = df_colon)
df <-
tidy_survfit(x, type = plot.type) %>% dplyr::mutate(survfit = c(list(x),
rep_len(list(), dplyr::n() - 1L)))
method <- "cobs"
df_split <- split(df,df$strata)
df_smoothed <- purrr::reduce(lapply(c("estimate","conf.low", "conf.high"), function(j) {
do.call(rbind,
lapply(seq_along(df_split), function(i) {
nms <- names(df_split)[i]
x = df_split[[i]]$time
if (method=="loess"){
y <-
predict(loess(as.formula(paste0(
j[[1]], " ~ time"
)), data = df_split[[i]]))
} else if (method=="gam"){
y <-
predict(mgcv::gam(as.formula(paste0(
j[[1]], " ~ s(time, bs = 'cs')"
)), data = df_split[[i]]))
} else if (method=="cobs") {
if (plot.type=="survival"){
## This will make the plot start in (0,1)
con <- rbind(c( 0,min(x),1))
## This ensures a monotonic decreasing slope
## for the estimate, not the CIs
if (j[[1]]=="estimate"){
direction="decrease"
} else {direction="none"}
} else if (plot.type=="risk"){
con <- rbind(c( 0,min(x),0))
if (j[[1]]=="estimate"){
direction="increase"
} else {direction="none"}
}
m <- cobs(x,df_split[[i]][[j]],
constraint=direction,
nknots = 4,
pointwise= con,
degree = 2,)
y <- predict(m, x)[, 'fit']
}
df <- data.frame(x, y, nms)
names(df) <- c("time", paste0(j[[1]], ".smooth"), "strata")
df
}))
}),dplyr::full_join) |> full_join(df)
## Plotting
ggplot(data=df_smoothed) +
geom_line(aes(x=time, y=estimate.smooth, color = strata))+
geom_ribbon(aes(x=time, ymin = conf.low.smooth, ymax = conf.high.smooth, fill = strata), alpha = 0.50) I updated the example from my initial post, to avoid setting constraints on the CIs. Created on 2023-09-22 with reprex v2.0.2 |
Hello! Thanks for the update and the examples! They are super! I don't know if you're involved at all with the pharmaceutical industry, but there is an emerging standard called Analysis Results Model, where all stats are calculated in advanced and and a standardized "Analysis Results Data" object or ARD, can be passed along to create figures. I am developing this now for tables, with the next thing to tackle being KM figures (and figures generally). I don't have a timeline for you, but I think this will be the best place to introduce this type of smoothing. Adding native support at this moment, I feel, will open some mini flood gates of suggestions for various smoothing methods and which smoothing parameters are best, etc. Using the ARD will allow users to do whatever they want to the data first, then pass it along to ggsurvfit(). This will mean that no additional package dependencies would be added to ggsurvfit, which from a maintenance point of view, is huge. I am actively speaking with CDISC about the ARD model for complex figures, and hopefully we'll have a more satisfying solution for you soon. |
Sure. Yeah, I saw you're involved with the pharmaverse. I'm doing clinical/epidemiological research at Aarhus University, Denmark. ARD and the methods around that really sounds interesting and like something I will try to keep track of. Thank you. I perfectly understand your hesitancy. It's perfectly fair. Now the examples are "out" for anyone to use and possibly lookup. I will think about adding them to a small extension-package of my own. |
Hi Daniel! |
Looks great to me @SinomeM ! |
In the last release of ggsurvfit, I exported some of the helper functions used to construct the plots. Here's an example of how you can smooth the survival curves, and bin any counts less than 5 to obscure the true numbers. Hope it helps somewhat! library(ggsurvfit)
#> Loading required package: ggplot2
sf <- survfit2(Surv(time, status) ~ sex, data = df_lung)
# build primary plot, smoothed
gg_survfit <-
tidy_survfit(sf) |>
ggplot(aes(x = time, y = estimate, color = strata)) +
geom_smooth(se = FALSE, method = 'loess', formula = 'y ~ x') +
scale_x_continuous(breaks = seq(0, 30, by = 6))
# construct risktable, and hide low counts
gg_risktable <-
tidy_survfit(sf, times = seq(0, 30, by = 6)) |>
dplyr::select(time, n.risk, strata) |>
# mask low counts
dplyr::mutate(
n.risk = ifelse(n.risk < 5, "<5", n.risk)
) |>
ggplot(aes(x = time, y = strata, label = n.risk)) +
geom_text() +
labs(y = NULL) +
theme_classic()
# combine plots
ggsurvfit_align_plots(list(gg_survfit, gg_risktable)) |>
patchwork::wrap_plots(ncol = 1, heights = c(0.85, 0.15)) Created on 2024-01-18 with reprex v2.0.2 |
Thank you Daniel! |
This really is great! Thank you very much! |
Is your feature request related to a problem? Please describe.
I am working with registry based health data and stepwise survival curves are cosidered sensitive data, and I am therefor not allowed to publish such curves. Publishing a smooth plot would be the solution
Describe the solution you'd like
A clear and concise description of what you want to happen. I am aware, that in most cases you would prefer not to smooth the survival line, but there are some valid cases, where this approach is desirable.
Describe alternatives you've considered
The only solutions I have found was a StackOverflow with some suggested solutions, that wouldn't work.
Additional context
Below is a working example of the solution I have come up with to be able to plot the line and confidence intervals.
As an example, I have included the two standard methods used for smoothing in
ggplot2::geom_smooth()
, but I wanted to perform the calculations separately to have control over the method selection and to use theggplot2::geom_ribbon()
for confidence intervals.Created on 2023-09-15 with reprex v2.0.2
The text was updated successfully, but these errors were encountered: