-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Test and document forced evaluation of promises for parallel execution #262
Conversation
Could someone with a macOS machine please troubleshoot the unexpected error? ══ Failed tests ════════════════════════════════════════════════════════════════
── Error ('test-unvalidated-sim_gs_n.R:538:3'): create_cut() can accept variables as arguments ──
Error in ``[.data.table`(x, , `:=`(enroll_time, rpwexp_enroll(n, enroll_rate)))`: Supplied 454 items to be assigned to 453 items of column 'enroll_time'. If you wish to 'recycle' the RHS please use rep() to make this intent clear to readers of your code.
Backtrace:
▆
1. └─simtrial::sim_gs_n(...) at test-unvalidated-sim_gs_n.R:538:3
2. └─... %dofuture% ... at simtrial/R/sim_gs_n.R:271:3
3. └─doFuture:::doFuture2(foreach, expr, envir = parent.frame(), data = NULL)
[ FAIL 1 | WARN 0 | SKIP 0 | PASS 233 ]
Error: Test failures |
I can reproduce the error in a macOS system. Complete log by running Click to expand
|
That's a different error. I always get that when using |
I can reproduce the exact unit testing error from GitHub Actions macOS runner by running
|
This fixes the macOS issue of simtrial on GHA: Merck/simtrial#262 When `n_analysis = 2`, `seq_along(n_analysis)` is `1`, and only the first `n` in `x_new$analysis` is rounded. We should have rounded all elements in `n` that are close enough to integers, so the loop should go from `1` to `n_analysis`, i.e., the looping indices should be `seq_len(n_analysis)`.
I have finally figured out this super weird issue, and submitted the fix Merck/gsDesign2#447. In short, The fix turned out to be super simple, but the debugging process was quite a journey. Initially I was worried that I'd have to jump into data.table's C code. Thank goodness, I didn't have to. |
I need to update the workflow file to temporarily install the latest version of {gsDesign2} from GitHub in order to obtain @yihui's latest fix in Merck/gsDesign2#447 |
Thank you so much, Yihui!!! |
@jdblischak You can add After CRAN re-opens on Aug 17, we can send a new version of gsDesign2 to CRAN, and then remove the |
b2cefb8
to
d536333
Compare
CI is green. Ready for review. Thank @yihui for the impressive debugging! 💪 |
We finally get it through. I will get it merged. Thank you so much, @yihui ! |
My only nagging doubt: do we understand why this affected macOS but not Linux? Using |
It turns that that your doubt was correct and I concluded too early. Now the question is why |
The script below gives different output on macOS vs other platforms. All code except for the last block was copied from #260. library(gsDesign2)
ratio = 1
enroll_rate = define_enroll_rate(duration = c(2, 2, 8), rate = c(1, 2, 3))
fail_rate = define_fail_rate(
duration = c(4, Inf), fail_rate = log(2) / 12, hr = c(1, .6), dropout_rate = .001
)
alpha = 0.025
beta = 0.1
upper = gs_spending_bound
upar = list(sf = gsDesign::sfLDOF, total_spend = alpha)
test_upper = rep(TRUE, 2)
lower = gs_spending_bound
lpar = list(sf = gsDesign::sfLDOF, total_spend = beta)
test_lower = c(TRUE, FALSE)
binding = FALSE
info_frac = NULL
analysis_time = c(24, 36)
x = gs_design_ahr(
enroll_rate = enroll_rate, fail_rate = fail_rate,
alpha = alpha, beta = beta, ratio = ratio,
info_frac = info_frac, analysis_time = analysis_time,
upper = upper, upar = upar, test_upper = test_upper,
lower = lower, lpar = lpar, test_lower = test_lower,
binding = binding
)
n2 = x$analysis$n[2]
sample_size_new = ceiling(n2 / 2) * 2 # 454L
n = with(x$enroll_rate, {
rate = rate * sample_size_new / n2
sum(rate * duration)
})
n - 454
|
I wonder if this is from gcc vs. clang. You know, the default compiler for base R under macOS is clang, but it's gcc under both Windows and Linux. So this pattern matches the outcomes we observe. One quick way to test this hypothesis is to launch an Ubuntu clang build of R via the R-hub v2 GitHub Actions workflow and run the code above. If it generates a not exact 0 result we see on macOS, then it's probably the culprit. Relevant read: GCC on x86 does not round floating-point divisions to the nearest value |
I used the R-hub v2 workflow to see which combination can reproduce the non-exact-zero results. (Actions page). I can only reproduce the non-exact-zero result under the Click to expand
The result is exact zero under the Click to expand
I also tried Since the r-lib/actions If so, it's unlikely something we can fix, and I agree the most important thing is:
|
@yihui and @nanxstats thanks for the incredibly detailed investigation! I understand this so much better now.
I followed up on this. My worry was unfounded. While the macOS test failed because of different rounding behavior on Apple Silicon chips, the behavior on Linux and Windows was not affected by Merck/gsDesign2#447. In other words, they were producing the correct results prior to the bug fix. I confirmed this on Windows using the code below: Confirmed stable behavior on Windows. Click for code:# Start with CRAN version of gsDesign2 that uses seq_along()
install.packages("gsDesign2")
library("gsDesign2")
packageVersion("gsDesign2")
## [1] ‘1.1.2’
grep("seq_along", deparse(gsDesign2:::to_integer.gs_design), value = TRUE)
## [1] " for (i in seq_along(n_analysis)) {" " for (i in seq_along(n_analysis)) {"
library("simtrial")
packageVersion("simtrial")
## [1] ‘0.4.1.7’
ratio <- 1
enroll_rate <- define_enroll_rate(duration = c(2, 2, 8),
rate = c(1, 2, 3))
fail_rate <- define_fail_rate(duration = c(4, Inf),
fail_rate = log(2) / 12,
hr = c(1, .6),
dropout_rate = .001)
alpha <- 0.025
beta <- 0.1
upper <- gsDesign2::gs_spending_bound
upar <- list(sf = gsDesign::sfLDOF, total_spend = alpha)
test_upper <- rep(TRUE, 2)
lower <- gsDesign2::gs_spending_bound
lpar <- list(sf = gsDesign::sfLDOF, total_spend = beta)
test_lower <- c(TRUE, FALSE)
binding <- FALSE
info_frac = NULL
analysis_time = c(24, 36)
x <- gsDesign2::gs_design_ahr(enroll_rate = enroll_rate, fail_rate = fail_rate,
alpha = alpha, beta = beta, ratio = ratio,
info_frac = info_frac, analysis_time = analysis_time,
upper = upper, upar = upar, test_upper = test_upper,
lower = lower, lpar = lpar, test_lower = test_lower,
binding = binding) |> gsDesign2::to_integer()
ia_cut <- simtrial::create_cut(planned_calendar_time = x$analysis$time[1])
fa_cut <- simtrial::create_cut(planned_calendar_time = x$analysis$time[2])
future::plan("sequential")
set.seed(1)
results1 <- simtrial::sim_gs_n(
n_sim = 1e2,
sample_size = x$analysis$n[2],
enroll_rate = x$enroll_rate,
fail_rate = x$fail_rate,
test = simtrial::wlr,
cut = list(ia = ia_cut, fa = fa_cut),
weight = simtrial::fh(rho = 0, gamma = 0))
# Switch to GitHub version of gsDesign2 that uses seq_len()
detach("package:gsDesign2")
remove.packages("gsDesign2")
# For some reason Windows won't let R delete the DLL
file.remove("~/../AppData/Local/R/win-library/4.3/gsDesign2/libs/x64/gsDesign2.dll")
## [1] FALSE
## Warning message:
## In file.remove("~/../AppData/Local/R/win-library/4.3/gsDesign2/libs/x64/gsDesign2.dll") :
## cannot remove file '~/../AppData/Local/R/win-library/4.3/gsDesign2/libs/x64/gsDesign2.dll', reason 'Permission denied'
system("rm ~/../AppData/Local/R/win-library/4.3/gsDesign2/libs/x64/gsDesign2.dll")
unlink("~/../AppData/Local/R/win-library/4.3/gsDesign2", recursive = TRUE)
remotes::install_github("Merck/gsDesign2@9092288", upgrade = FALSE)
library("gsDesign2")
packageVersion("gsDesign2")
## [1] ‘1.1.2.18’
grep("seq_len", deparse(gsDesign2:::to_integer.gs_design), value = TRUE)
## [1] " for (i in seq_len(n_analysis)) {" " for (i in seq_len(n_analysis)) {"
ratio <- 1
enroll_rate <- define_enroll_rate(duration = c(2, 2, 8),
rate = c(1, 2, 3))
fail_rate <- define_fail_rate(duration = c(4, Inf),
fail_rate = log(2) / 12,
hr = c(1, .6),
dropout_rate = .001)
alpha <- 0.025
beta <- 0.1
upper <- gsDesign2::gs_spending_bound
upar <- list(sf = gsDesign::sfLDOF, total_spend = alpha)
test_upper <- rep(TRUE, 2)
lower <- gsDesign2::gs_spending_bound
lpar <- list(sf = gsDesign::sfLDOF, total_spend = beta)
test_lower <- c(TRUE, FALSE)
binding <- FALSE
info_frac = NULL
analysis_time = c(24, 36)
x <- gsDesign2::gs_design_ahr(enroll_rate = enroll_rate, fail_rate = fail_rate,
alpha = alpha, beta = beta, ratio = ratio,
info_frac = info_frac, analysis_time = analysis_time,
upper = upper, upar = upar, test_upper = test_upper,
lower = lower, lpar = lpar, test_lower = test_lower,
binding = binding) |> gsDesign2::to_integer()
ia_cut <- simtrial::create_cut(planned_calendar_time = x$analysis$time[1])
fa_cut <- simtrial::create_cut(planned_calendar_time = x$analysis$time[2])
future::plan("sequential")
set.seed(1)
results2 <- simtrial::sim_gs_n(
n_sim = 1e2,
sample_size = x$analysis$n[2],
enroll_rate = x$enroll_rate,
fail_rate = x$fail_rate,
test = simtrial::wlr,
cut = list(ia = ia_cut, fa = fa_cut),
weight = simtrial::fh(rho = 0, gamma = 0))
all.equal(results1, results2)
## [1] TRUE |
Technically I only tested this on Windows. I'm fairly confident Linux should also be fine since macOS seems to be the outlier for this bug. But my inner paranoia convinced me it is better to be safe than sorry. So I copy-pasted my reprex above for Windows in my WSL Ubuntu 22.04 to be absolutely sure we no longer need to worry about this bug. All good. Click for results of reprex on Linux# Start with CRAN version of gsDesign2 that uses seq_along()
install.packages("gsDesign2")
library("gsDesign2")
packageVersion("gsDesign2")
## [1] ‘1.1.2’
grep("seq_along", deparse(gsDesign2:::to_integer.gs_design), value = TRUE)
## [1] " for (i in seq_along(n_analysis)) {" " for (i in seq_along(n_analysis)) {"
library("simtrial")
packageVersion("simtrial")
## [1] ‘0.4.1.8’
ratio <- 1
enroll_rate <- define_enroll_rate(duration = c(2, 2, 8),
rate = c(1, 2, 3))
fail_rate <- define_fail_rate(duration = c(4, Inf),
fail_rate = log(2) / 12,
hr = c(1, .6),
dropout_rate = .001)
alpha <- 0.025
beta <- 0.1
upper <- gsDesign2::gs_spending_bound
upar <- list(sf = gsDesign::sfLDOF, total_spend = alpha)
test_upper <- rep(TRUE, 2)
lower <- gsDesign2::gs_spending_bound
lpar <- list(sf = gsDesign::sfLDOF, total_spend = beta)
test_lower <- c(TRUE, FALSE)
binding <- FALSE
info_frac = NULL
analysis_time = c(24, 36)
x <- gsDesign2::gs_design_ahr(enroll_rate = enroll_rate, fail_rate = fail_rate,
alpha = alpha, beta = beta, ratio = ratio,
info_frac = info_frac, analysis_time = analysis_time,
upper = upper, upar = upar, test_upper = test_upper,
lower = lower, lpar = lpar, test_lower = test_lower,
binding = binding) |> gsDesign2::to_integer()
ia_cut <- simtrial::create_cut(planned_calendar_time = x$analysis$time[1])
fa_cut <- simtrial::create_cut(planned_calendar_time = x$analysis$time[2])
future::plan("sequential")
set.seed(1)
results1 <- simtrial::sim_gs_n(
n_sim = 1e2,
sample_size = x$analysis$n[2],
enroll_rate = x$enroll_rate,
fail_rate = x$fail_rate,
test = simtrial::wlr,
cut = list(ia = ia_cut, fa = fa_cut),
weight = simtrial::fh(rho = 0, gamma = 0))
# Switch to GitHub version of gsDesign2 that uses seq_len()
detach("package:gsDesign2")
remotes::install_github("Merck/gsDesign2@9092288", upgrade = FALSE)
library("gsDesign2")
packageVersion("gsDesign2")
## [1] ‘1.1.2.18’
grep("seq_len", deparse(gsDesign2:::to_integer.gs_design), value = TRUE)
## [1] " for (i in seq_len(n_analysis)) {" " for (i in seq_len(n_analysis)) {"
ratio <- 1
enroll_rate <- define_enroll_rate(duration = c(2, 2, 8),
rate = c(1, 2, 3))
fail_rate <- define_fail_rate(duration = c(4, Inf),
fail_rate = log(2) / 12,
hr = c(1, .6),
dropout_rate = .001)
alpha <- 0.025
beta <- 0.1
upper <- gsDesign2::gs_spending_bound
upar <- list(sf = gsDesign::sfLDOF, total_spend = alpha)
test_upper <- rep(TRUE, 2)
lower <- gsDesign2::gs_spending_bound
lpar <- list(sf = gsDesign::sfLDOF, total_spend = beta)
test_lower <- c(TRUE, FALSE)
binding <- FALSE
info_frac = NULL
analysis_time = c(24, 36)
x <- gsDesign2::gs_design_ahr(enroll_rate = enroll_rate, fail_rate = fail_rate,
alpha = alpha, beta = beta, ratio = ratio,
info_frac = info_frac, analysis_time = analysis_time,
upper = upper, upar = upar, test_upper = test_upper,
lower = lower, lpar = lpar, test_lower = test_lower,
binding = binding) |> gsDesign2::to_integer()
ia_cut <- simtrial::create_cut(planned_calendar_time = x$analysis$time[1])
fa_cut <- simtrial::create_cut(planned_calendar_time = x$analysis$time[2])
future::plan("sequential")
set.seed(1)
results2 <- simtrial::sim_gs_n(
n_sim = 1e2,
sample_size = x$analysis$n[2],
enroll_rate = x$enroll_rate,
fail_rate = x$fail_rate,
test = simtrial::wlr,
cut = list(ia = ia_cut, fa = fa_cut),
weight = simtrial::fh(rho = 0, gamma = 0))
all.equal(results1, results2)
## [1] TRUE
sessionInfo()
## R version 4.4.1 (2024-06-14)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 22.04.4 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.10.0
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
##
## locale:
## [1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8
## [4] LC_COLLATE=C.UTF-8 LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8
## [7] LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C
## [10] LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C
##
## time zone: America/New_York
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats graphics grDevices datasets utils methods base
##
## other attached packages:
## [1] gsDesign2_1.1.2.18 doFuture_1.0.1 future_1.34.0 foreach_1.5.2
## [5] simtrial_0.4.1.8
##
## loaded via a namespace (and not attached):
## [1] gt_0.11.0 utf8_1.2.4 generics_0.1.3
## [4] tidyr_1.3.1 bspm_0.5.5 xml2_1.3.6
## [7] r2rtf_1.1.1 lattice_0.22-6 listenv_0.9.1
## [10] digest_0.6.36 magrittr_2.0.3 grid_4.4.1
## [13] iterators_1.0.14 mvtnorm_1.2-6 fastmap_1.2.0
## [16] Matrix_1.7-0 processx_3.8.4 pkgbuild_1.4.4
## [19] survival_3.7-0 ps_1.7.7 purrr_1.0.2
## [22] fansi_1.0.6 scales_1.3.0 codetools_0.2-20
## [25] cli_3.6.3 rlang_1.1.4 parallelly_1.38.0
## [28] future.apply_1.11.2 munsell_0.5.1 splines_4.4.1
## [31] remotes_2.5.0 withr_3.0.1 gsDesign_3.6.4
## [34] tools_4.4.1 parallel_4.4.1 dplyr_1.1.4
## [37] colorspace_2.1-1 ggplot2_3.5.1 globals_0.16.3
## [40] curl_5.2.1 vctrs_0.6.5 R6_2.5.1
## [43] lifecycle_1.0.4 desc_1.4.3 callr_3.7.6
## [46] pkgconfig_2.0.3 pillar_1.9.0 gtable_0.3.5
## [49] data.table_1.15.4 glue_1.7.0 Rcpp_1.0.13
## [52] tibble_3.2.1 tidyselect_1.2.1 xtable_1.8-4
## [55] htmltools_0.5.8.1 compiler_4.4.1 |
Follow-up to PR #261
I converted the example from Issue #260 into a test. It's a very frustrating test case though. I could only get it to fail pre-#261 when running the code interactively (though it did also fail via
Rscript
). When I ran it viaR CMD check
, it always passed. Unclear why, but I suspect that {testthat} may be responsible.I also moved the parallel example of
sim_gs_n()
outside of\dontrun{}
. This was added in 749eadc(#249), and at least when I run
R CMD check
locally, there is no problem with executing this code.