Test and document forced evaluation of promises for parallel execution #262

jdblischak · 2024-07-18T19:15:19Z

Follow-up to PR #261

I converted the example from Issue #260 into a test. It's a very frustrating test case though. I could only get it to fail pre-#261 when running the code interactively (though it did also fail via Rscript). When I ran it via R CMD check, it always passed. Unclear why, but I suspect that {testthat} may be responsible.

I also moved the parallel example of sim_gs_n() outside of \dontrun{}. This was added in 749eadc
(#249), and at least when I run R CMD check locally, there is no problem with executing this code.

jdblischak · 2024-07-18T21:07:00Z

Could someone with a macOS machine please troubleshoot the unexpected error?

  ══ Failed tests ════════════════════════════════════════════════════════════════
  ── Error ('test-unvalidated-sim_gs_n.R:538:3'): create_cut() can accept variables as arguments ──
  Error in ``[.data.table`(x, , `:=`(enroll_time, rpwexp_enroll(n, enroll_rate)))`: Supplied 454 items to be assigned to 453 items of column 'enroll_time'. If you wish to 'recycle' the RHS please use rep() to make this intent clear to readers of your code.
  Backtrace:
      ▆
   1. └─simtrial::sim_gs_n(...) at test-unvalidated-sim_gs_n.R:538:3
   2.   └─... %dofuture% ... at simtrial/R/sim_gs_n.R:271:3
   3.     └─doFuture:::doFuture2(foreach, expr, envir = parent.frame(), data = NULL)
  
  [ FAIL 1 | WARN 0 | SKIP 0 | PASS 233 ]
  Error: Test failures

nanxstats · 2024-07-19T16:23:51Z

I can reproduce the error in a macOS system. Complete log by running devtools::test():

Click to expand

> devtools::test()
ℹ Testing simtrial
✔ | F W  S  OK | Context
✔ |          4 | double_programming_fit_pwexp                   
✔ |          3 | double_programming_mb_weight                   
✔ |          5 | double_programming_sim_fixed_n [8.2s]          
✔ |         12 | double_programming_sim_pw_surv [14.8s]         
✔ |          4 | independent_test_counting_process              
✔ |         10 | independent_test_cut_data_by_date              
✔ |          7 | independent_test_cut_data_by_event             
✔ |          4 | independent_test_early_zero_weight             
✔ |          2 | independent_test_fh_weight                     
✔ |          1 | independent_test_get_cut_date_by_event         
✔ |          3 | independent_test_pmvnorm                       
✔ |          1 | independent_test_pvalue_maxcombo               
✔ |          3 | independent_test_randomize_by_fixed_block      
✔ |          3 | independent_test_rpw_enroll                    
✔ |          3 | independent_test_rpwexp_inverse_cdf_cpp        
✔ |         28 | independent_test_simfix2simpwsurv              
✔ |          6 | independent_test_wlr                           
✔ |         29 | unvalidated-data.table                         
✔ |          4 | unvalidated-early_zero_weight                  
✔ |         26 | unvalidated-get_analysis_date                  
✔ |         37 | unvalidated-input_checking                     
✔ |          6 | unvalidated-maxcombo                           
✔ |          1 | unvalidated-multitest                          
✔ |         18 | unvalidated-rmst                               
✖ | 11        3 | unvalidated-sim_gs_n [5.4s]                   
──────────────────────────────────────────────────────────────────────────────────────────────
Error (test-unvalidated-sim_gs_n.R:47:3): regular logrank test parallel
Error in `convert_list_to_df_w_list_cols(ans_1sim_new)`: could not find function "convert_list_to_df_w_list_cols"
Backtrace:
    ▆
 1. └─simtrial::sim_gs_n(...) at test-unvalidated-sim_gs_n.R:47:3
 2.   └─... %dofuture% ... at simtrial/R/sim_gs_n.R:271:3
 3.     └─doFuture:::doFuture2(foreach, expr, envir = parent.frame(), data = NULL)

Error (test-unvalidated-sim_gs_n.R:87:3): weighted logrank test by FH(0, 0.5)
Error in `convert_list_to_df_w_list_cols(ans_1sim_new)`: could not find function "convert_list_to_df_w_list_cols"
Backtrace:
    ▆
 1. └─simtrial::sim_gs_n(...) at test-unvalidated-sim_gs_n.R:87:3
 2.   └─... %dofuture% ... at simtrial/R/sim_gs_n.R:271:3
 3.     └─doFuture:::doFuture2(foreach, expr, envir = parent.frame(), data = NULL)

Error (test-unvalidated-sim_gs_n.R:125:3): weighted logrank test by MB(3)
Error in `convert_list_to_df_w_list_cols(ans_1sim_new)`: could not find function "convert_list_to_df_w_list_cols"
Backtrace:
    ▆
 1. └─simtrial::sim_gs_n(...) at test-unvalidated-sim_gs_n.R:125:3
 2.   └─... %dofuture% ... at simtrial/R/sim_gs_n.R:271:3
 3.     └─doFuture:::doFuture2(foreach, expr, envir = parent.frame(), data = NULL)

Error (test-unvalidated-sim_gs_n.R:163:3): weighted logrank test by early zero (6)
Error in `convert_list_to_df_w_list_cols(ans_1sim_new)`: could not find function "convert_list_to_df_w_list_cols"
Backtrace:
    ▆
 1. └─simtrial::sim_gs_n(...) at test-unvalidated-sim_gs_n.R:163:3
 2.   └─... %dofuture% ... at simtrial/R/sim_gs_n.R:271:3
 3.     └─doFuture:::doFuture2(foreach, expr, envir = parent.frame(), data = NULL)

Error (test-unvalidated-sim_gs_n.R:201:3): RMST
Error in `convert_list_to_df_w_list_cols(ans_1sim_new)`: could not find function "convert_list_to_df_w_list_cols"
Backtrace:
    ▆
 1. └─simtrial::sim_gs_n(...) at test-unvalidated-sim_gs_n.R:201:3
 2.   └─... %dofuture% ... at simtrial/R/sim_gs_n.R:271:3
 3.     └─doFuture:::doFuture2(foreach, expr, envir = parent.frame(), data = NULL)

Error (test-unvalidated-sim_gs_n.R:239:3): Milestone
Error in `convert_list_to_df_w_list_cols(ans_1sim_new)`: could not find function "convert_list_to_df_w_list_cols"
Backtrace:
    ▆
 1. └─simtrial::sim_gs_n(...) at test-unvalidated-sim_gs_n.R:239:3
 2.   └─... %dofuture% ... at simtrial/R/sim_gs_n.R:271:3
 3.     └─doFuture:::doFuture2(foreach, expr, envir = parent.frame(), data = NULL)

Error (test-unvalidated-sim_gs_n.R:282:3): WLR with fh(0, 0.5) test at IA1, WLR with mb(6, Inf) at IA2, and milestone test at FA
Error in `convert_list_to_df_w_list_cols(ans_1sim_new)`: could not find function "convert_list_to_df_w_list_cols"
Backtrace:
    ▆
 1. └─simtrial::sim_gs_n(...) at test-unvalidated-sim_gs_n.R:282:3
 2.   └─... %dofuture% ... at simtrial/R/sim_gs_n.R:271:3
 3.     └─doFuture:::doFuture2(foreach, expr, envir = parent.frame(), data = NULL)

Error (test-unvalidated-sim_gs_n.R:319:3): MaxCombo (WLR-FH(0,0) + WLR-FH(0, 0.5))
Error in `convert_list_to_df_w_list_cols(ans_1sim_new)`: could not find function "convert_list_to_df_w_list_cols"
Backtrace:
    ▆
 1. └─simtrial::sim_gs_n(...) at test-unvalidated-sim_gs_n.R:319:3
 2.   └─... %dofuture% ... at simtrial/R/sim_gs_n.R:271:3
 3.     └─doFuture:::doFuture2(foreach, expr, envir = parent.frame(), data = NULL)

Error (test-unvalidated-sim_gs_n.R:363:3): sim_gs_n() accepts different tests per cutting
Error in `convert_list_to_df_w_list_cols(ans_1sim_new)`: could not find function "convert_list_to_df_w_list_cols"
Backtrace:
    ▆
 1. └─simtrial::sim_gs_n(...) at test-unvalidated-sim_gs_n.R:363:3
 2.   └─... %dofuture% ... at simtrial/R/sim_gs_n.R:271:3
 3.     └─doFuture:::doFuture2(foreach, expr, envir = parent.frame(), data = NULL)

Error (test-unvalidated-sim_gs_n.R:424:3): sim_gs_n() can combine wlr(), rmst(), and milestone() tests
Error in `convert_list_to_df_w_list_cols(ans_1sim_new)`: could not find function "convert_list_to_df_w_list_cols"
Backtrace:
    ▆
 1. └─simtrial::sim_gs_n(...) at test-unvalidated-sim_gs_n.R:424:3
 2.   └─... %dofuture% ... at simtrial/R/sim_gs_n.R:271:3
 3.     └─doFuture:::doFuture2(foreach, expr, envir = parent.frame(), data = NULL)

Error (test-unvalidated-sim_gs_n.R:538:3): create_cut() can accept variables as arguments
Error in ``[.data.table`(x, , `:=`(enroll_time, rpwexp_enroll(n, enroll_rate)))`: Supplied 454 items to be assigned to 453 items of column 'enroll_time'. If you wish to 'recycle' the RHS please use rep() to make this intent clear to readers of your code.
Backtrace:
    ▆
 1. └─simtrial::sim_gs_n(...) at test-unvalidated-sim_gs_n.R:538:3
 2.   └─... %dofuture% ... at simtrial/R/sim_gs_n.R:271:3
 3.     └─doFuture:::doFuture2(foreach, expr, envir = parent.frame(), data = NULL)
──────────────────────────────────────────────────────────────────────────────────────────────
Maximum number of failures exceeded; quitting at end of file.
ℹ Increase this number with (e.g.) testthat::set_max_fails(Inf) 

══ Results ═══════════════════════════════════════════════════════════════════════════════════
Duration: 32.3 s

── Failed tests ──────────────────────────────────────────────────────────────────────────────
Error (test-unvalidated-sim_gs_n.R:47:3): regular logrank test parallel
Error in `convert_list_to_df_w_list_cols(ans_1sim_new)`: could not find function "convert_list_to_df_w_list_cols"
Backtrace:
    ▆
 1. └─simtrial::sim_gs_n(...) at test-unvalidated-sim_gs_n.R:47:3
 2.   └─... %dofuture% ... at simtrial/R/sim_gs_n.R:271:3
 3.     └─doFuture:::doFuture2(foreach, expr, envir = parent.frame(), data = NULL)

Error (test-unvalidated-sim_gs_n.R:87:3): weighted logrank test by FH(0, 0.5)
Error in `convert_list_to_df_w_list_cols(ans_1sim_new)`: could not find function "convert_list_to_df_w_list_cols"
Backtrace:
    ▆
 1. └─simtrial::sim_gs_n(...) at test-unvalidated-sim_gs_n.R:87:3
 2.   └─... %dofuture% ... at simtrial/R/sim_gs_n.R:271:3
 3.     └─doFuture:::doFuture2(foreach, expr, envir = parent.frame(), data = NULL)

Error (test-unvalidated-sim_gs_n.R:125:3): weighted logrank test by MB(3)
Error in `convert_list_to_df_w_list_cols(ans_1sim_new)`: could not find function "convert_list_to_df_w_list_cols"
Backtrace:
    ▆
 1. └─simtrial::sim_gs_n(...) at test-unvalidated-sim_gs_n.R:125:3
 2.   └─... %dofuture% ... at simtrial/R/sim_gs_n.R:271:3
 3.     └─doFuture:::doFuture2(foreach, expr, envir = parent.frame(), data = NULL)

Error (test-unvalidated-sim_gs_n.R:163:3): weighted logrank test by early zero (6)
Error in `convert_list_to_df_w_list_cols(ans_1sim_new)`: could not find function "convert_list_to_df_w_list_cols"
Backtrace:
    ▆
 1. └─simtrial::sim_gs_n(...) at test-unvalidated-sim_gs_n.R:163:3
 2.   └─... %dofuture% ... at simtrial/R/sim_gs_n.R:271:3
 3.     └─doFuture:::doFuture2(foreach, expr, envir = parent.frame(), data = NULL)

Error (test-unvalidated-sim_gs_n.R:201:3): RMST
Error in `convert_list_to_df_w_list_cols(ans_1sim_new)`: could not find function "convert_list_to_df_w_list_cols"
Backtrace:
    ▆
 1. └─simtrial::sim_gs_n(...) at test-unvalidated-sim_gs_n.R:201:3
 2.   └─... %dofuture% ... at simtrial/R/sim_gs_n.R:271:3
 3.     └─doFuture:::doFuture2(foreach, expr, envir = parent.frame(), data = NULL)

Error (test-unvalidated-sim_gs_n.R:239:3): Milestone
Error in `convert_list_to_df_w_list_cols(ans_1sim_new)`: could not find function "convert_list_to_df_w_list_cols"
Backtrace:
    ▆
 1. └─simtrial::sim_gs_n(...) at test-unvalidated-sim_gs_n.R:239:3
 2.   └─... %dofuture% ... at simtrial/R/sim_gs_n.R:271:3
 3.     └─doFuture:::doFuture2(foreach, expr, envir = parent.frame(), data = NULL)

Error (test-unvalidated-sim_gs_n.R:282:3): WLR with fh(0, 0.5) test at IA1, WLR with mb(6, Inf) at IA2, and milestone test at FA
Error in `convert_list_to_df_w_list_cols(ans_1sim_new)`: could not find function "convert_list_to_df_w_list_cols"
Backtrace:
    ▆
 1. └─simtrial::sim_gs_n(...) at test-unvalidated-sim_gs_n.R:282:3
 2.   └─... %dofuture% ... at simtrial/R/sim_gs_n.R:271:3
 3.     └─doFuture:::doFuture2(foreach, expr, envir = parent.frame(), data = NULL)

Error (test-unvalidated-sim_gs_n.R:319:3): MaxCombo (WLR-FH(0,0) + WLR-FH(0, 0.5))
Error in `convert_list_to_df_w_list_cols(ans_1sim_new)`: could not find function "convert_list_to_df_w_list_cols"
Backtrace:
    ▆
 1. └─simtrial::sim_gs_n(...) at test-unvalidated-sim_gs_n.R:319:3
 2.   └─... %dofuture% ... at simtrial/R/sim_gs_n.R:271:3
 3.     └─doFuture:::doFuture2(foreach, expr, envir = parent.frame(), data = NULL)

Error (test-unvalidated-sim_gs_n.R:363:3): sim_gs_n() accepts different tests per cutting
Error in `convert_list_to_df_w_list_cols(ans_1sim_new)`: could not find function "convert_list_to_df_w_list_cols"
Backtrace:
    ▆
 1. └─simtrial::sim_gs_n(...) at test-unvalidated-sim_gs_n.R:363:3
 2.   └─... %dofuture% ... at simtrial/R/sim_gs_n.R:271:3
 3.     └─doFuture:::doFuture2(foreach, expr, envir = parent.frame(), data = NULL)

Error (test-unvalidated-sim_gs_n.R:424:3): sim_gs_n() can combine wlr(), rmst(), and milestone() tests
Error in `convert_list_to_df_w_list_cols(ans_1sim_new)`: could not find function "convert_list_to_df_w_list_cols"
Backtrace:
    ▆
 1. └─simtrial::sim_gs_n(...) at test-unvalidated-sim_gs_n.R:424:3
 2.   └─... %dofuture% ... at simtrial/R/sim_gs_n.R:271:3
 3.     └─doFuture:::doFuture2(foreach, expr, envir = parent.frame(), data = NULL)

Error (test-unvalidated-sim_gs_n.R:538:3): create_cut() can accept variables as arguments
Error in ``[.data.table`(x, , `:=`(enroll_time, rpwexp_enroll(n, enroll_rate)))`: Supplied 454 items to be assigned to 453 items of column 'enroll_time'. If you wish to 'recycle' the RHS please use rep() to make this intent clear to readers of your code.
Backtrace:
    ▆
 1. └─simtrial::sim_gs_n(...) at test-unvalidated-sim_gs_n.R:538:3
 2.   └─... %dofuture% ... at simtrial/R/sim_gs_n.R:271:3
 3.     └─doFuture:::doFuture2(foreach, expr, envir = parent.frame(), data = NULL)

[ FAIL 11 | WARN 0 | SKIP 0 | PASS 223 ]
══ Terminated early ══════════════════════════════════════════════════════════════════════════

jdblischak · 2024-07-19T17:01:00Z

I can reproduce the error in a macOS system. Complete log by running devtools::test():

That's a different error. I always get that when using devtools::test(). It will pass with R CMD check

nanxstats · 2024-07-19T17:38:57Z

I can reproduce the exact unit testing error from GitHub Actions macOS runner by running R CMD check on a local macOS system:

* checking tests ...
  Running ‘testthat.R’
 ERROR
Running the tests in ‘tests/testthat.R’ failed.
Last 13 lines of output:
  > test_check("simtrial")
  [ FAIL 1 | WARN 0 | SKIP 0 | PASS 233 ]

  ══ Failed tests ════════════════════════════════════════════════════════════════
  ── Error ('test-unvalidated-sim_gs_n.R:538:3'): create_cut() can accept variables as arguments ──
  Error in ``[.data.table`(x, , `:=`(enroll_time, rpwexp_enroll(n, enroll_rate)))`: Supplied 454 items to be assigned to 453 items of column 'enroll_time'. If you wish to 'recycle' the RHS please use rep() to make this intent clear to readers of your code.
  Backtrace:
      ▆
   1. └─simtrial::sim_gs_n(...) at test-unvalidated-sim_gs_n.R:538:3
   2.   └─... %dofuture% ...
   3.     └─doFuture:::doFuture2(foreach, expr, envir = parent.frame(), data = NULL)

  [ FAIL 1 | WARN 0 | SKIP 0 | PASS 233 ]
  Error: Test failures
  Execution halted

This fixes the macOS issue of simtrial on GHA: Merck/simtrial#262 When `n_analysis = 2`, `seq_along(n_analysis)` is `1`, and only the first `n` in `x_new$analysis` is rounded. We should have rounded all elements in `n` that are close enough to integers, so the loop should go from `1` to `n_analysis`, i.e., the looping indices should be `seq_len(n_analysis)`.

yihui · 2024-08-07T04:44:31Z

I have finally figured out this super weird issue, and submitted the fix Merck/gsDesign2#447. In short, gsDesign2::to_integer() failed to round the second sample size in x$analysis$n, which ended up being 454 - eps, where eps is a tiny number. Then sample('All', 454 - eps, ...) generated 453 elements on macOS but 454 elements on other platforms.

The fix turned out to be super simple, but the debugging process was quite a journey. Initially I was worried that I'd have to jump into data.table's C code. Thank goodness, I didn't have to.

jdblischak · 2024-08-07T20:37:11Z

I need to update the workflow file to temporarily install the latest version of {gsDesign2} from GitHub in order to obtain @yihui's latest fix in Merck/gsDesign2#447

LittleBeannie · 2024-08-07T20:39:13Z

I have finally figured out this super weird issue, and submitted the fix Merck/gsDesign2#447. In short, gsDesign2::to_integer() failed to round the second sample size in x$analysis$n, which ended up being 454 - eps, where eps is a tiny number. Then sample('All', 454 - eps, ...) generated 453 elements on macOS but 454 elements on other platforms.

The fix turned out to be super simple, but the debugging process was quite a journey. Initially I was worried that I'd have to jump into data.table's C code. Thank goodness, I didn't have to.

Thank you so much, Yihui!!!

yihui · 2024-08-07T21:08:06Z

@jdblischak You can add Remotes: Merck/gsDesign2 to DESCRIPTION so that the dev version of gsDesign2 can be automatically installed before checking the package.

After CRAN re-opens on Aug 17, we can send a new version of gsDesign2 to CRAN, and then remove the Remotes field in simtrial.

jdblischak · 2024-08-08T18:39:37Z

CI is green. Ready for review. Thank @yihui for the impressive debugging! 💪

LittleBeannie · 2024-08-08T18:42:25Z

We finally get it through. I will get it merged. Thank you so much, @yihui !

jdblischak · 2024-08-08T18:42:51Z

I have finally figured out this super weird issue, and submitted the fix Merck/gsDesign2#447. In short, gsDesign2::to_integer() failed to round the second sample size in x$analysis$n, which ended up being 454 - eps, where eps is a tiny number. Then sample('All', 454 - eps, ...) generated 453 elements on macOS but 454 elements on other platforms.

My only nagging doubt: do we understand why this affected macOS but not Linux? Using seq_along() instead of seq_len() presumably also affected the behavior on Linux and Windows too. Was the difference only that macOS failed whereas Linux and Windows returned incorrect results (silent errors are the most dangerous!)? Assuming that the returned results on Linux and Windows were previously incorrect, are there tests we could add to {gsDesign2} to detect these errors in the future?

yihui · 2024-08-08T19:25:11Z

It turns that that your doubt was correct and I concluded too early. n[2] = 454 on Linux and Windows (so no rounding was necessary), but it was 454 - 5.684342e-14 on macOS, which failed to be rounded (hence became 453 after the as.integer() coercion in sample()) due to the bug I discovered.

Now the question is why n[2] is different on macOS in the first place. I'll spend a bit more time on this rabbit hole.

yihui · 2024-08-08T21:03:00Z

The script below gives different output on macOS vs other platforms. All code except for the last block was copied from #260.

library(gsDesign2)

ratio = 1

enroll_rate = define_enroll_rate(duration = c(2, 2, 8), rate = c(1, 2, 3))

fail_rate = define_fail_rate(
  duration = c(4, Inf), fail_rate = log(2) / 12, hr = c(1, .6), dropout_rate = .001
)

alpha = 0.025
beta = 0.1

upper = gs_spending_bound
upar = list(sf = gsDesign::sfLDOF, total_spend = alpha)
test_upper = rep(TRUE, 2)

lower = gs_spending_bound
lpar = list(sf = gsDesign::sfLDOF, total_spend = beta)
test_lower = c(TRUE, FALSE)
binding = FALSE

info_frac = NULL
analysis_time = c(24, 36)

x = gs_design_ahr(
  enroll_rate = enroll_rate, fail_rate = fail_rate, 
  alpha = alpha, beta = beta, ratio = ratio,
  info_frac = info_frac, analysis_time = analysis_time, 
  upper = upper, upar = upar, test_upper = test_upper,
  lower = lower, lpar = lpar, test_lower = test_lower,
  binding = binding
)

n2 = x$analysis$n[2]
sample_size_new = ceiling(n2 / 2) * 2  # 454L
n = with(x$enroll_rate, {
  rate = rate * sample_size_new / n2
  sum(rate * duration)
})
n - 454

n is exactly 454 on Linux/Windows, but not on macOS. I can only go this deep for now. I'm not sure if it's worth the time to go to the very bottom. Anyway, the (old) lesson to learn is x * y / y and x / y * y may not be exactly x in floating point arithmetic.

nanxstats · 2024-08-08T22:32:59Z

I wonder if this is from gcc vs. clang. You know, the default compiler for base R under macOS is clang, but it's gcc under both Windows and Linux. So this pattern matches the outcomes we observe.

One quick way to test this hypothesis is to launch an Ubuntu clang build of R via the R-hub v2 GitHub Actions workflow and run the code above. If it generates a not exact 0 result we see on macOS, then it's probably the culprit.

Relevant read: GCC on x86 does not round floating-point divisions to the nearest value

nanxstats · 2024-08-12T02:28:45Z

I used the R-hub v2 workflow to see which combination can reproduce the non-exact-zero results. (Actions page).

I can only reproduce the non-exact-zero result under the macos-arm64 option (clang14 R + Apple Silicon):

Click to expand

* using R Under development (unstable) (2024-08-05 r86980)
* using platform: aarch64-apple-darwin20
* R was compiled by
    Apple clang version 14.0.0 (clang-1400.0.29.202)
    GNU Fortran (GCC) 12.2.0
* running under: macOS Sonoma 14.6

...

── Failure (test-clang-integer.R:49:3): Capture floating point arithmetics output under clang ──
n - 454 (`actual`) not identical to 0 (`expected`).

  `actual`: -0.00000000000006
`expected`:  0.00000000000000
[ FAIL 1 | WARN 0 | SKIP 0 | PASS 0 ]

The result is exact zero under the macos option (clang14 R + Intel x86_64):

Click to expand

* using platform: x86_64-apple-darwin20
* R was compiled by
    Apple clang version 14.0.0 (clang-1400.0.29.202)
    GNU Fortran (GCC) 12.2.0
* running under: macOS Ventura 13.6.8

I also tried ubuntu-clang and clang16 options and they all pass. However, their names may be "misleading" because the R used there are still compiled by gcc, while clang is only used for compiling the package being checked. So these might still represent gcc R results.

Since the r-lib/actions check-standard workflows for macOS also uses the Apple silicon runners (platform: aarch64-apple-darwin20), and it's the only combination we see generating these error so far, while clang + x86_64 works ok, gcc + x86_64 works ok, my brave guess is that it's an Apple silicon issue (clang just happens to be the compiler for R under it). I also found a related blog post about floating-point summation on M1.

If so, it's unlikely something we can fix, and I agree the most important thing is:

Anyway, the (old) lesson to learn is x * y / y and x / y * y may not be exactly x in floating point arithmetic.

jdblischak · 2024-08-13T18:25:50Z

do we understand why this affected macOS but not Linux?

@yihui and @nanxstats thanks for the incredibly detailed investigation! I understand this so much better now.

Using seq_along() instead of seq_len() presumably also affected the behavior on Linux and Windows too. Was the difference only that macOS failed whereas Linux and Windows returned incorrect results (silent errors are the most dangerous!)? Assuming that the returned results on Linux and Windows were previously incorrect, are there tests we could add to {gsDesign2} to detect these errors in the future?

I followed up on this. My worry was unfounded. While the macOS test failed because of different rounding behavior on Apple Silicon chips, the behavior on Linux and Windows was not affected by Merck/gsDesign2#447. In other words, they were producing the correct results prior to the bug fix.

I confirmed this on Windows using the code below:

Confirmed stable behavior on Windows. Click for code:

# Start with CRAN version of gsDesign2 that uses seq_along()
install.packages("gsDesign2")
library("gsDesign2")
packageVersion("gsDesign2")
## [1] ‘1.1.2’

grep("seq_along", deparse(gsDesign2:::to_integer.gs_design), value = TRUE)
## [1] "        for (i in seq_along(n_analysis)) {" "        for (i in seq_along(n_analysis)) {"

library("simtrial")
packageVersion("simtrial")
## [1] ‘0.4.1.7’

ratio <- 1
enroll_rate <- define_enroll_rate(duration = c(2, 2, 8),
                                  rate = c(1, 2, 3))
fail_rate <- define_fail_rate(duration = c(4, Inf),
                              fail_rate = log(2) / 12,
                              hr = c(1, .6),
                              dropout_rate = .001)
alpha <- 0.025
beta <- 0.1
upper <- gsDesign2::gs_spending_bound
upar <- list(sf = gsDesign::sfLDOF, total_spend = alpha)
test_upper <- rep(TRUE, 2)
lower <- gsDesign2::gs_spending_bound
lpar <- list(sf = gsDesign::sfLDOF, total_spend = beta)
test_lower <- c(TRUE, FALSE)
binding <- FALSE
info_frac = NULL
analysis_time = c(24, 36)
x <- gsDesign2::gs_design_ahr(enroll_rate = enroll_rate, fail_rate = fail_rate,
                              alpha = alpha, beta = beta, ratio = ratio,
                              info_frac = info_frac, analysis_time = analysis_time,
                              upper = upper, upar = upar, test_upper = test_upper,
                              lower = lower, lpar = lpar, test_lower = test_lower,
                              binding = binding) |> gsDesign2::to_integer()
ia_cut <- simtrial::create_cut(planned_calendar_time = x$analysis$time[1])
fa_cut <- simtrial::create_cut(planned_calendar_time = x$analysis$time[2])
future::plan("sequential")
set.seed(1)
results1 <- simtrial::sim_gs_n(
  n_sim = 1e2,
  sample_size = x$analysis$n[2],
  enroll_rate = x$enroll_rate,
  fail_rate = x$fail_rate,
  test = simtrial::wlr,
  cut = list(ia = ia_cut, fa = fa_cut),
  weight = simtrial::fh(rho = 0, gamma = 0))

# Switch to GitHub version of gsDesign2 that uses seq_len()
detach("package:gsDesign2")
remove.packages("gsDesign2")
# For some reason Windows won't let R delete the DLL
file.remove("~/../AppData/Local/R/win-library/4.3/gsDesign2/libs/x64/gsDesign2.dll")
## [1] FALSE
## Warning message:
##   In file.remove("~/../AppData/Local/R/win-library/4.3/gsDesign2/libs/x64/gsDesign2.dll") :
##   cannot remove file '~/../AppData/Local/R/win-library/4.3/gsDesign2/libs/x64/gsDesign2.dll', reason 'Permission denied'
system("rm ~/../AppData/Local/R/win-library/4.3/gsDesign2/libs/x64/gsDesign2.dll")
unlink("~/../AppData/Local/R/win-library/4.3/gsDesign2", recursive = TRUE)

remotes::install_github("Merck/gsDesign2@9092288", upgrade = FALSE)
library("gsDesign2")
packageVersion("gsDesign2")
## [1] ‘1.1.2.18’
grep("seq_len", deparse(gsDesign2:::to_integer.gs_design), value = TRUE)
## [1] "        for (i in seq_len(n_analysis)) {" "        for (i in seq_len(n_analysis)) {"

ratio <- 1
enroll_rate <- define_enroll_rate(duration = c(2, 2, 8),
                                  rate = c(1, 2, 3))
fail_rate <- define_fail_rate(duration = c(4, Inf),
                              fail_rate = log(2) / 12,
                              hr = c(1, .6),
                              dropout_rate = .001)
alpha <- 0.025
beta <- 0.1
upper <- gsDesign2::gs_spending_bound
upar <- list(sf = gsDesign::sfLDOF, total_spend = alpha)
test_upper <- rep(TRUE, 2)
lower <- gsDesign2::gs_spending_bound
lpar <- list(sf = gsDesign::sfLDOF, total_spend = beta)
test_lower <- c(TRUE, FALSE)
binding <- FALSE
info_frac = NULL
analysis_time = c(24, 36)
x <- gsDesign2::gs_design_ahr(enroll_rate = enroll_rate, fail_rate = fail_rate,
                              alpha = alpha, beta = beta, ratio = ratio,
                              info_frac = info_frac, analysis_time = analysis_time,
                              upper = upper, upar = upar, test_upper = test_upper,
                              lower = lower, lpar = lpar, test_lower = test_lower,
                              binding = binding) |> gsDesign2::to_integer()
ia_cut <- simtrial::create_cut(planned_calendar_time = x$analysis$time[1])
fa_cut <- simtrial::create_cut(planned_calendar_time = x$analysis$time[2])
future::plan("sequential")
set.seed(1)
results2 <- simtrial::sim_gs_n(
  n_sim = 1e2,
  sample_size = x$analysis$n[2],
  enroll_rate = x$enroll_rate,
  fail_rate = x$fail_rate,
  test = simtrial::wlr,
  cut = list(ia = ia_cut, fa = fa_cut),
  weight = simtrial::fh(rho = 0, gamma = 0))

all.equal(results1, results2)
## [1] TRUE

jdblischak · 2024-08-19T17:52:10Z

the behavior on Linux and Windows was not affected

I confirmed this on Windows using the code below:

Technically I only tested this on Windows. I'm fairly confident Linux should also be fine since macOS seems to be the outlier for this bug. But my inner paranoia convinced me it is better to be safe than sorry.

So I copy-pasted my reprex above for Windows in my WSL Ubuntu 22.04 to be absolutely sure we no longer need to worry about this bug. All good.

Click for results of reprex on Linux

# Start with CRAN version of gsDesign2 that uses seq_along()
install.packages("gsDesign2")
library("gsDesign2")
packageVersion("gsDesign2")
## [1] ‘1.1.2’

grep("seq_along", deparse(gsDesign2:::to_integer.gs_design), value = TRUE)
## [1] "        for (i in seq_along(n_analysis)) {" "        for (i in seq_along(n_analysis)) {"

library("simtrial")
packageVersion("simtrial")
## [1] ‘0.4.1.8’

ratio <- 1
enroll_rate <- define_enroll_rate(duration = c(2, 2, 8),
                                  rate = c(1, 2, 3))
fail_rate <- define_fail_rate(duration = c(4, Inf),
                              fail_rate = log(2) / 12,
                              hr = c(1, .6),
                              dropout_rate = .001)
alpha <- 0.025
beta <- 0.1
upper <- gsDesign2::gs_spending_bound
upar <- list(sf = gsDesign::sfLDOF, total_spend = alpha)
test_upper <- rep(TRUE, 2)
lower <- gsDesign2::gs_spending_bound
lpar <- list(sf = gsDesign::sfLDOF, total_spend = beta)
test_lower <- c(TRUE, FALSE)
binding <- FALSE
info_frac = NULL
analysis_time = c(24, 36)
x <- gsDesign2::gs_design_ahr(enroll_rate = enroll_rate, fail_rate = fail_rate,
                              alpha = alpha, beta = beta, ratio = ratio,
                              info_frac = info_frac, analysis_time = analysis_time,
                              upper = upper, upar = upar, test_upper = test_upper,
                              lower = lower, lpar = lpar, test_lower = test_lower,
                              binding = binding) |> gsDesign2::to_integer()
ia_cut <- simtrial::create_cut(planned_calendar_time = x$analysis$time[1])
fa_cut <- simtrial::create_cut(planned_calendar_time = x$analysis$time[2])
future::plan("sequential")
set.seed(1)
results1 <- simtrial::sim_gs_n(
  n_sim = 1e2,
  sample_size = x$analysis$n[2],
  enroll_rate = x$enroll_rate,
  fail_rate = x$fail_rate,
  test = simtrial::wlr,
  cut = list(ia = ia_cut, fa = fa_cut),
  weight = simtrial::fh(rho = 0, gamma = 0))

# Switch to GitHub version of gsDesign2 that uses seq_len()
detach("package:gsDesign2")
remotes::install_github("Merck/gsDesign2@9092288", upgrade = FALSE)
library("gsDesign2")
packageVersion("gsDesign2")
## [1] ‘1.1.2.18’
grep("seq_len", deparse(gsDesign2:::to_integer.gs_design), value = TRUE)
## [1] "        for (i in seq_len(n_analysis)) {" "        for (i in seq_len(n_analysis)) {"

ratio <- 1
enroll_rate <- define_enroll_rate(duration = c(2, 2, 8),
                                  rate = c(1, 2, 3))
fail_rate <- define_fail_rate(duration = c(4, Inf),
                              fail_rate = log(2) / 12,
                              hr = c(1, .6),
                              dropout_rate = .001)
alpha <- 0.025
beta <- 0.1
upper <- gsDesign2::gs_spending_bound
upar <- list(sf = gsDesign::sfLDOF, total_spend = alpha)
test_upper <- rep(TRUE, 2)
lower <- gsDesign2::gs_spending_bound
lpar <- list(sf = gsDesign::sfLDOF, total_spend = beta)
test_lower <- c(TRUE, FALSE)
binding <- FALSE
info_frac = NULL
analysis_time = c(24, 36)
x <- gsDesign2::gs_design_ahr(enroll_rate = enroll_rate, fail_rate = fail_rate,
                              alpha = alpha, beta = beta, ratio = ratio,
                              info_frac = info_frac, analysis_time = analysis_time,
                              upper = upper, upar = upar, test_upper = test_upper,
                              lower = lower, lpar = lpar, test_lower = test_lower,
                              binding = binding) |> gsDesign2::to_integer()
ia_cut <- simtrial::create_cut(planned_calendar_time = x$analysis$time[1])
fa_cut <- simtrial::create_cut(planned_calendar_time = x$analysis$time[2])
future::plan("sequential")
set.seed(1)
results2 <- simtrial::sim_gs_n(
  n_sim = 1e2,
  sample_size = x$analysis$n[2],
  enroll_rate = x$enroll_rate,
  fail_rate = x$fail_rate,
  test = simtrial::wlr,
  cut = list(ia = ia_cut, fa = fa_cut),
  weight = simtrial::fh(rho = 0, gamma = 0))

all.equal(results1, results2)
## [1] TRUE

sessionInfo()
## R version 4.4.1 (2024-06-14)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 22.04.4 LTS
##
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.10.0
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
##
## locale:
##  [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8
##  [4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8
##  [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C
## [10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C
##
## time zone: America/New_York
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats     graphics  grDevices datasets  utils     methods   base
##
## other attached packages:
## [1] gsDesign2_1.1.2.18 doFuture_1.0.1     future_1.34.0      foreach_1.5.2
## [5] simtrial_0.4.1.8
##
## loaded via a namespace (and not attached):
##  [1] gt_0.11.0           utf8_1.2.4          generics_0.1.3
##  [4] tidyr_1.3.1         bspm_0.5.5          xml2_1.3.6
##  [7] r2rtf_1.1.1         lattice_0.22-6      listenv_0.9.1
## [10] digest_0.6.36       magrittr_2.0.3      grid_4.4.1
## [13] iterators_1.0.14    mvtnorm_1.2-6       fastmap_1.2.0
## [16] Matrix_1.7-0        processx_3.8.4      pkgbuild_1.4.4
## [19] survival_3.7-0      ps_1.7.7            purrr_1.0.2
## [22] fansi_1.0.6         scales_1.3.0        codetools_0.2-20
## [25] cli_3.6.3           rlang_1.1.4         parallelly_1.38.0
## [28] future.apply_1.11.2 munsell_0.5.1       splines_4.4.1
## [31] remotes_2.5.0       withr_3.0.1         gsDesign_3.6.4
## [34] tools_4.4.1         parallel_4.4.1      dplyr_1.1.4
## [37] colorspace_2.1-1    ggplot2_3.5.1       globals_0.16.3
## [40] curl_5.2.1          vctrs_0.6.5         R6_2.5.1
## [43] lifecycle_1.0.4     desc_1.4.3          callr_3.7.6
## [46] pkgconfig_2.0.3     pillar_1.9.0        gtable_0.3.5
## [49] data.table_1.15.4   glue_1.7.0          Rcpp_1.0.13
## [52] tibble_3.2.1        tidyselect_1.2.1    xtable_1.8-4
## [55] htmltools_0.5.8.1   compiler_4.4.1

jdblischak requested review from LittleBeannie and cmansch July 18, 2024 19:15

jdblischak self-assigned this Jul 18, 2024

nanxstats mentioned this pull request Jul 19, 2024

A potential bug of parallel computation in sim_gs_n #260

Closed

jdblischak mentioned this pull request Jul 19, 2024

Fixing Bug for Cut Functions in Parallel #261

Merged

yihui referenced this pull request in Merck/gsDesign2 Aug 7, 2024

fix cmd check

d0159b9

yihui mentioned this pull request Aug 7, 2024

Fix a typo: seq_along() should have been seq_len() Merck/gsDesign2#447

Merged

jdblischak added 2 commits August 8, 2024 14:14

Test and document forced evaluation of promises for parallel execution

650d5eb

Install gsDesign2 from GitHub until next CRAN release

d536333

jdblischak force-pushed the force-parallel-follow-up branch from b2cefb8 to d536333 Compare August 8, 2024 18:20

LittleBeannie approved these changes Aug 8, 2024

View reviewed changes

LittleBeannie merged commit 71f8b78 into Merck:main Aug 8, 2024
7 checks passed

jdblischak deleted the force-parallel-follow-up branch August 8, 2024 18:44

jdblischak mentioned this pull request Nov 5, 2024

296 fix cran pre tests #297

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test and document forced evaluation of promises for parallel execution #262

Test and document forced evaluation of promises for parallel execution #262

jdblischak commented Jul 18, 2024

jdblischak commented Jul 18, 2024

nanxstats commented Jul 19, 2024

jdblischak commented Jul 19, 2024

nanxstats commented Jul 19, 2024

yihui commented Aug 7, 2024

jdblischak commented Aug 7, 2024

LittleBeannie commented Aug 7, 2024

yihui commented Aug 7, 2024

jdblischak commented Aug 8, 2024

LittleBeannie commented Aug 8, 2024

jdblischak commented Aug 8, 2024 •

edited

Loading

yihui commented Aug 8, 2024

yihui commented Aug 8, 2024

nanxstats commented Aug 8, 2024

nanxstats commented Aug 12, 2024

jdblischak commented Aug 13, 2024

jdblischak commented Aug 19, 2024

Test and document forced evaluation of promises for parallel execution #262

Test and document forced evaluation of promises for parallel execution #262

Conversation

jdblischak commented Jul 18, 2024

jdblischak commented Jul 18, 2024

nanxstats commented Jul 19, 2024

jdblischak commented Jul 19, 2024

nanxstats commented Jul 19, 2024

yihui commented Aug 7, 2024

jdblischak commented Aug 7, 2024

LittleBeannie commented Aug 7, 2024

yihui commented Aug 7, 2024

jdblischak commented Aug 8, 2024

LittleBeannie commented Aug 8, 2024

jdblischak commented Aug 8, 2024 • edited Loading

yihui commented Aug 8, 2024

yihui commented Aug 8, 2024

nanxstats commented Aug 8, 2024

nanxstats commented Aug 12, 2024

jdblischak commented Aug 13, 2024

jdblischak commented Aug 19, 2024

jdblischak commented Aug 8, 2024 •

edited

Loading