Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test and document forced evaluation of promises for parallel execution #262

Merged
merged 2 commits into from
Aug 8, 2024

Conversation

jdblischak
Copy link
Collaborator

Follow-up to PR #261

I converted the example from Issue #260 into a test. It's a very frustrating test case though. I could only get it to fail pre-#261 when running the code interactively (though it did also fail via Rscript). When I ran it via R CMD check, it always passed. Unclear why, but I suspect that {testthat} may be responsible.

I also moved the parallel example of sim_gs_n() outside of \dontrun{}. This was added in 749eadc
(#249), and at least when I run R CMD check locally, there is no problem with executing this code.

@jdblischak jdblischak self-assigned this Jul 18, 2024
@jdblischak
Copy link
Collaborator Author

Could someone with a macOS machine please troubleshoot the unexpected error?

  ══ Failed tests ════════════════════════════════════════════════════════════════
  ── Error ('test-unvalidated-sim_gs_n.R:538:3'): create_cut() can accept variables as arguments ──
  Error in ``[.data.table`(x, , `:=`(enroll_time, rpwexp_enroll(n, enroll_rate)))`: Supplied 454 items to be assigned to 453 items of column 'enroll_time'. If you wish to 'recycle' the RHS please use rep() to make this intent clear to readers of your code.
  Backtrace:1. └─simtrial::sim_gs_n(...) at test-unvalidated-sim_gs_n.R:538:3
   2.   └─... %dofuture% ... at simtrial/R/sim_gs_n.R:271:3
   3.     └─doFuture:::doFuture2(foreach, expr, envir = parent.frame(), data = NULL)
  
  [ FAIL 1 | WARN 0 | SKIP 0 | PASS 233 ]
  Error: Test failures

@nanxstats
Copy link
Collaborator

I can reproduce the error in a macOS system. Complete log by running devtools::test():

Click to expand
> devtools::test()
ℹ Testing simtrial
✔ | F W  S  OK | Context
✔ |          4 | double_programming_fit_pwexp                   
✔ |          3 | double_programming_mb_weight                   
✔ |          5 | double_programming_sim_fixed_n [8.2s]          
✔ |         12 | double_programming_sim_pw_surv [14.8s]         
✔ |          4 | independent_test_counting_process              
✔ |         10 | independent_test_cut_data_by_date              
✔ |          7 | independent_test_cut_data_by_event             
✔ |          4 | independent_test_early_zero_weight             
✔ |          2 | independent_test_fh_weight                     
✔ |          1 | independent_test_get_cut_date_by_event         
✔ |          3 | independent_test_pmvnorm                       
✔ |          1 | independent_test_pvalue_maxcombo               
✔ |          3 | independent_test_randomize_by_fixed_block      
✔ |          3 | independent_test_rpw_enroll                    
✔ |          3 | independent_test_rpwexp_inverse_cdf_cpp        
✔ |         28 | independent_test_simfix2simpwsurv              
✔ |          6 | independent_test_wlr                           
✔ |         29 | unvalidated-data.table                         
✔ |          4 | unvalidated-early_zero_weight                  
✔ |         26 | unvalidated-get_analysis_date                  
✔ |         37 | unvalidated-input_checking                     
✔ |          6 | unvalidated-maxcombo                           
✔ |          1 | unvalidated-multitest                          
✔ |         18 | unvalidated-rmst                               
✖ | 11        3 | unvalidated-sim_gs_n [5.4s]                   
──────────────────────────────────────────────────────────────────────────────────────────────
Error (test-unvalidated-sim_gs_n.R:47:3): regular logrank test parallel
Error in `convert_list_to_df_w_list_cols(ans_1sim_new)`: could not find function "convert_list_to_df_w_list_cols"
Backtrace:
    ▆
 1. └─simtrial::sim_gs_n(...) at test-unvalidated-sim_gs_n.R:47:3
 2.   └─... %dofuture% ... at simtrial/R/sim_gs_n.R:271:3
 3.     └─doFuture:::doFuture2(foreach, expr, envir = parent.frame(), data = NULL)

Error (test-unvalidated-sim_gs_n.R:87:3): weighted logrank test by FH(0, 0.5)
Error in `convert_list_to_df_w_list_cols(ans_1sim_new)`: could not find function "convert_list_to_df_w_list_cols"
Backtrace:
    ▆
 1. └─simtrial::sim_gs_n(...) at test-unvalidated-sim_gs_n.R:87:3
 2.   └─... %dofuture% ... at simtrial/R/sim_gs_n.R:271:3
 3.     └─doFuture:::doFuture2(foreach, expr, envir = parent.frame(), data = NULL)

Error (test-unvalidated-sim_gs_n.R:125:3): weighted logrank test by MB(3)
Error in `convert_list_to_df_w_list_cols(ans_1sim_new)`: could not find function "convert_list_to_df_w_list_cols"
Backtrace:
    ▆
 1. └─simtrial::sim_gs_n(...) at test-unvalidated-sim_gs_n.R:125:3
 2.   └─... %dofuture% ... at simtrial/R/sim_gs_n.R:271:3
 3.     └─doFuture:::doFuture2(foreach, expr, envir = parent.frame(), data = NULL)

Error (test-unvalidated-sim_gs_n.R:163:3): weighted logrank test by early zero (6)
Error in `convert_list_to_df_w_list_cols(ans_1sim_new)`: could not find function "convert_list_to_df_w_list_cols"
Backtrace:
    ▆
 1. └─simtrial::sim_gs_n(...) at test-unvalidated-sim_gs_n.R:163:3
 2.   └─... %dofuture% ... at simtrial/R/sim_gs_n.R:271:3
 3.     └─doFuture:::doFuture2(foreach, expr, envir = parent.frame(), data = NULL)

Error (test-unvalidated-sim_gs_n.R:201:3): RMST
Error in `convert_list_to_df_w_list_cols(ans_1sim_new)`: could not find function "convert_list_to_df_w_list_cols"
Backtrace:
    ▆
 1. └─simtrial::sim_gs_n(...) at test-unvalidated-sim_gs_n.R:201:3
 2.   └─... %dofuture% ... at simtrial/R/sim_gs_n.R:271:3
 3.     └─doFuture:::doFuture2(foreach, expr, envir = parent.frame(), data = NULL)

Error (test-unvalidated-sim_gs_n.R:239:3): Milestone
Error in `convert_list_to_df_w_list_cols(ans_1sim_new)`: could not find function "convert_list_to_df_w_list_cols"
Backtrace:
    ▆
 1. └─simtrial::sim_gs_n(...) at test-unvalidated-sim_gs_n.R:239:3
 2.   └─... %dofuture% ... at simtrial/R/sim_gs_n.R:271:3
 3.     └─doFuture:::doFuture2(foreach, expr, envir = parent.frame(), data = NULL)

Error (test-unvalidated-sim_gs_n.R:282:3): WLR with fh(0, 0.5) test at IA1, WLR with mb(6, Inf) at IA2, and milestone test at FA
Error in `convert_list_to_df_w_list_cols(ans_1sim_new)`: could not find function "convert_list_to_df_w_list_cols"
Backtrace:
    ▆
 1. └─simtrial::sim_gs_n(...) at test-unvalidated-sim_gs_n.R:282:3
 2.   └─... %dofuture% ... at simtrial/R/sim_gs_n.R:271:3
 3.     └─doFuture:::doFuture2(foreach, expr, envir = parent.frame(), data = NULL)

Error (test-unvalidated-sim_gs_n.R:319:3): MaxCombo (WLR-FH(0,0) + WLR-FH(0, 0.5))
Error in `convert_list_to_df_w_list_cols(ans_1sim_new)`: could not find function "convert_list_to_df_w_list_cols"
Backtrace:
    ▆
 1. └─simtrial::sim_gs_n(...) at test-unvalidated-sim_gs_n.R:319:3
 2.   └─... %dofuture% ... at simtrial/R/sim_gs_n.R:271:3
 3.     └─doFuture:::doFuture2(foreach, expr, envir = parent.frame(), data = NULL)

Error (test-unvalidated-sim_gs_n.R:363:3): sim_gs_n() accepts different tests per cutting
Error in `convert_list_to_df_w_list_cols(ans_1sim_new)`: could not find function "convert_list_to_df_w_list_cols"
Backtrace:
    ▆
 1. └─simtrial::sim_gs_n(...) at test-unvalidated-sim_gs_n.R:363:3
 2.   └─... %dofuture% ... at simtrial/R/sim_gs_n.R:271:3
 3.     └─doFuture:::doFuture2(foreach, expr, envir = parent.frame(), data = NULL)

Error (test-unvalidated-sim_gs_n.R:424:3): sim_gs_n() can combine wlr(), rmst(), and milestone() tests
Error in `convert_list_to_df_w_list_cols(ans_1sim_new)`: could not find function "convert_list_to_df_w_list_cols"
Backtrace:
    ▆
 1. └─simtrial::sim_gs_n(...) at test-unvalidated-sim_gs_n.R:424:3
 2.   └─... %dofuture% ... at simtrial/R/sim_gs_n.R:271:3
 3.     └─doFuture:::doFuture2(foreach, expr, envir = parent.frame(), data = NULL)

Error (test-unvalidated-sim_gs_n.R:538:3): create_cut() can accept variables as arguments
Error in ``[.data.table`(x, , `:=`(enroll_time, rpwexp_enroll(n, enroll_rate)))`: Supplied 454 items to be assigned to 453 items of column 'enroll_time'. If you wish to 'recycle' the RHS please use rep() to make this intent clear to readers of your code.
Backtrace:
    ▆
 1. └─simtrial::sim_gs_n(...) at test-unvalidated-sim_gs_n.R:538:3
 2.   └─... %dofuture% ... at simtrial/R/sim_gs_n.R:271:3
 3.     └─doFuture:::doFuture2(foreach, expr, envir = parent.frame(), data = NULL)
──────────────────────────────────────────────────────────────────────────────────────────────
Maximum number of failures exceeded; quitting at end of file.
ℹ Increase this number with (e.g.) testthat::set_max_fails(Inf) 

══ Results ═══════════════════════════════════════════════════════════════════════════════════
Duration: 32.3 s

── Failed tests ──────────────────────────────────────────────────────────────────────────────
Error (test-unvalidated-sim_gs_n.R:47:3): regular logrank test parallel
Error in `convert_list_to_df_w_list_cols(ans_1sim_new)`: could not find function "convert_list_to_df_w_list_cols"
Backtrace:
    ▆
 1. └─simtrial::sim_gs_n(...) at test-unvalidated-sim_gs_n.R:47:3
 2.   └─... %dofuture% ... at simtrial/R/sim_gs_n.R:271:3
 3.     └─doFuture:::doFuture2(foreach, expr, envir = parent.frame(), data = NULL)

Error (test-unvalidated-sim_gs_n.R:87:3): weighted logrank test by FH(0, 0.5)
Error in `convert_list_to_df_w_list_cols(ans_1sim_new)`: could not find function "convert_list_to_df_w_list_cols"
Backtrace:
    ▆
 1. └─simtrial::sim_gs_n(...) at test-unvalidated-sim_gs_n.R:87:3
 2.   └─... %dofuture% ... at simtrial/R/sim_gs_n.R:271:3
 3.     └─doFuture:::doFuture2(foreach, expr, envir = parent.frame(), data = NULL)

Error (test-unvalidated-sim_gs_n.R:125:3): weighted logrank test by MB(3)
Error in `convert_list_to_df_w_list_cols(ans_1sim_new)`: could not find function "convert_list_to_df_w_list_cols"
Backtrace:
    ▆
 1. └─simtrial::sim_gs_n(...) at test-unvalidated-sim_gs_n.R:125:3
 2.   └─... %dofuture% ... at simtrial/R/sim_gs_n.R:271:3
 3.     └─doFuture:::doFuture2(foreach, expr, envir = parent.frame(), data = NULL)

Error (test-unvalidated-sim_gs_n.R:163:3): weighted logrank test by early zero (6)
Error in `convert_list_to_df_w_list_cols(ans_1sim_new)`: could not find function "convert_list_to_df_w_list_cols"
Backtrace:
    ▆
 1. └─simtrial::sim_gs_n(...) at test-unvalidated-sim_gs_n.R:163:3
 2.   └─... %dofuture% ... at simtrial/R/sim_gs_n.R:271:3
 3.     └─doFuture:::doFuture2(foreach, expr, envir = parent.frame(), data = NULL)

Error (test-unvalidated-sim_gs_n.R:201:3): RMST
Error in `convert_list_to_df_w_list_cols(ans_1sim_new)`: could not find function "convert_list_to_df_w_list_cols"
Backtrace:
    ▆
 1. └─simtrial::sim_gs_n(...) at test-unvalidated-sim_gs_n.R:201:3
 2.   └─... %dofuture% ... at simtrial/R/sim_gs_n.R:271:3
 3.     └─doFuture:::doFuture2(foreach, expr, envir = parent.frame(), data = NULL)

Error (test-unvalidated-sim_gs_n.R:239:3): Milestone
Error in `convert_list_to_df_w_list_cols(ans_1sim_new)`: could not find function "convert_list_to_df_w_list_cols"
Backtrace:
    ▆
 1. └─simtrial::sim_gs_n(...) at test-unvalidated-sim_gs_n.R:239:3
 2.   └─... %dofuture% ... at simtrial/R/sim_gs_n.R:271:3
 3.     └─doFuture:::doFuture2(foreach, expr, envir = parent.frame(), data = NULL)

Error (test-unvalidated-sim_gs_n.R:282:3): WLR with fh(0, 0.5) test at IA1, WLR with mb(6, Inf) at IA2, and milestone test at FA
Error in `convert_list_to_df_w_list_cols(ans_1sim_new)`: could not find function "convert_list_to_df_w_list_cols"
Backtrace:
    ▆
 1. └─simtrial::sim_gs_n(...) at test-unvalidated-sim_gs_n.R:282:3
 2.   └─... %dofuture% ... at simtrial/R/sim_gs_n.R:271:3
 3.     └─doFuture:::doFuture2(foreach, expr, envir = parent.frame(), data = NULL)

Error (test-unvalidated-sim_gs_n.R:319:3): MaxCombo (WLR-FH(0,0) + WLR-FH(0, 0.5))
Error in `convert_list_to_df_w_list_cols(ans_1sim_new)`: could not find function "convert_list_to_df_w_list_cols"
Backtrace:
    ▆
 1. └─simtrial::sim_gs_n(...) at test-unvalidated-sim_gs_n.R:319:3
 2.   └─... %dofuture% ... at simtrial/R/sim_gs_n.R:271:3
 3.     └─doFuture:::doFuture2(foreach, expr, envir = parent.frame(), data = NULL)

Error (test-unvalidated-sim_gs_n.R:363:3): sim_gs_n() accepts different tests per cutting
Error in `convert_list_to_df_w_list_cols(ans_1sim_new)`: could not find function "convert_list_to_df_w_list_cols"
Backtrace:
    ▆
 1. └─simtrial::sim_gs_n(...) at test-unvalidated-sim_gs_n.R:363:3
 2.   └─... %dofuture% ... at simtrial/R/sim_gs_n.R:271:3
 3.     └─doFuture:::doFuture2(foreach, expr, envir = parent.frame(), data = NULL)

Error (test-unvalidated-sim_gs_n.R:424:3): sim_gs_n() can combine wlr(), rmst(), and milestone() tests
Error in `convert_list_to_df_w_list_cols(ans_1sim_new)`: could not find function "convert_list_to_df_w_list_cols"
Backtrace:
    ▆
 1. └─simtrial::sim_gs_n(...) at test-unvalidated-sim_gs_n.R:424:3
 2.   └─... %dofuture% ... at simtrial/R/sim_gs_n.R:271:3
 3.     └─doFuture:::doFuture2(foreach, expr, envir = parent.frame(), data = NULL)

Error (test-unvalidated-sim_gs_n.R:538:3): create_cut() can accept variables as arguments
Error in ``[.data.table`(x, , `:=`(enroll_time, rpwexp_enroll(n, enroll_rate)))`: Supplied 454 items to be assigned to 453 items of column 'enroll_time'. If you wish to 'recycle' the RHS please use rep() to make this intent clear to readers of your code.
Backtrace:
    ▆
 1. └─simtrial::sim_gs_n(...) at test-unvalidated-sim_gs_n.R:538:3
 2.   └─... %dofuture% ... at simtrial/R/sim_gs_n.R:271:3
 3.     └─doFuture:::doFuture2(foreach, expr, envir = parent.frame(), data = NULL)

[ FAIL 11 | WARN 0 | SKIP 0 | PASS 223 ]
══ Terminated early ══════════════════════════════════════════════════════════════════════════

@jdblischak
Copy link
Collaborator Author

I can reproduce the error in a macOS system. Complete log by running devtools::test():

That's a different error. I always get that when using devtools::test(). It will pass with R CMD check

@nanxstats
Copy link
Collaborator

I can reproduce the exact unit testing error from GitHub Actions macOS runner by running R CMD check on a local macOS system:

* checking tests ...
  Running ‘testthat.R’
 ERROR
Running the tests in ‘tests/testthat.R’ failed.
Last 13 lines of output:
  > test_check("simtrial")
  [ FAIL 1 | WARN 0 | SKIP 0 | PASS 233 ]

  ══ Failed tests ════════════════════════════════════════════════════════════════
  ── Error ('test-unvalidated-sim_gs_n.R:538:3'): create_cut() can accept variables as arguments ──
  Error in ``[.data.table`(x, , `:=`(enroll_time, rpwexp_enroll(n, enroll_rate)))`: Supplied 454 items to be assigned to 453 items of column 'enroll_time'. If you wish to 'recycle' the RHS please use rep() to make this intent clear to readers of your code.
  Backtrace:
      ▆
   1. └─simtrial::sim_gs_n(...) at test-unvalidated-sim_gs_n.R:538:3
   2.   └─... %dofuture% ...
   3.     └─doFuture:::doFuture2(foreach, expr, envir = parent.frame(), data = NULL)

  [ FAIL 1 | WARN 0 | SKIP 0 | PASS 233 ]
  Error: Test failures
  Execution halted

yihui referenced this pull request in Merck/gsDesign2 Aug 7, 2024
yihui added a commit to yihui/gsDesign2 that referenced this pull request Aug 7, 2024
This fixes the macOS issue of simtrial on GHA: Merck/simtrial#262

When `n_analysis = 2`, `seq_along(n_analysis)` is `1`, and only the first `n` in `x_new$analysis` is rounded. We should have rounded all elements in `n` that are close enough to integers, so the loop should go from `1` to `n_analysis`, i.e., the looping indices should be `seq_len(n_analysis)`.
@yihui
Copy link
Contributor

yihui commented Aug 7, 2024

I have finally figured out this super weird issue, and submitted the fix Merck/gsDesign2#447. In short, gsDesign2::to_integer() failed to round the second sample size in x$analysis$n, which ended up being 454 - eps, where eps is a tiny number. Then sample('All', 454 - eps, ...) generated 453 elements on macOS but 454 elements on other platforms.

The fix turned out to be super simple, but the debugging process was quite a journey. Initially I was worried that I'd have to jump into data.table's C code. Thank goodness, I didn't have to.

@jdblischak
Copy link
Collaborator Author

I need to update the workflow file to temporarily install the latest version of {gsDesign2} from GitHub in order to obtain @yihui's latest fix in Merck/gsDesign2#447

@LittleBeannie
Copy link
Collaborator

I have finally figured out this super weird issue, and submitted the fix Merck/gsDesign2#447. In short, gsDesign2::to_integer() failed to round the second sample size in x$analysis$n, which ended up being 454 - eps, where eps is a tiny number. Then sample('All', 454 - eps, ...) generated 453 elements on macOS but 454 elements on other platforms.

The fix turned out to be super simple, but the debugging process was quite a journey. Initially I was worried that I'd have to jump into data.table's C code. Thank goodness, I didn't have to.

Thank you so much, Yihui!!!

@yihui
Copy link
Contributor

yihui commented Aug 7, 2024

@jdblischak You can add Remotes: Merck/gsDesign2 to DESCRIPTION so that the dev version of gsDesign2 can be automatically installed before checking the package.

After CRAN re-opens on Aug 17, we can send a new version of gsDesign2 to CRAN, and then remove the Remotes field in simtrial.

@jdblischak jdblischak force-pushed the force-parallel-follow-up branch from b2cefb8 to d536333 Compare August 8, 2024 18:20
@jdblischak
Copy link
Collaborator Author

CI is green. Ready for review. Thank @yihui for the impressive debugging! 💪

@LittleBeannie
Copy link
Collaborator

We finally get it through. I will get it merged. Thank you so much, @yihui !

@LittleBeannie LittleBeannie merged commit 71f8b78 into Merck:main Aug 8, 2024
7 checks passed
@jdblischak
Copy link
Collaborator Author

jdblischak commented Aug 8, 2024

I have finally figured out this super weird issue, and submitted the fix Merck/gsDesign2#447. In short, gsDesign2::to_integer() failed to round the second sample size in x$analysis$n, which ended up being 454 - eps, where eps is a tiny number. Then sample('All', 454 - eps, ...) generated 453 elements on macOS but 454 elements on other platforms.

My only nagging doubt: do we understand why this affected macOS but not Linux? Using seq_along() instead of seq_len() presumably also affected the behavior on Linux and Windows too. Was the difference only that macOS failed whereas Linux and Windows returned incorrect results (silent errors are the most dangerous!)? Assuming that the returned results on Linux and Windows were previously incorrect, are there tests we could add to {gsDesign2} to detect these errors in the future?

@jdblischak jdblischak deleted the force-parallel-follow-up branch August 8, 2024 18:44
@yihui
Copy link
Contributor

yihui commented Aug 8, 2024

It turns that that your doubt was correct and I concluded too early. n[2] = 454 on Linux and Windows (so no rounding was necessary), but it was 454 - 5.684342e-14 on macOS, which failed to be rounded (hence became 453 after the as.integer() coercion in sample()) due to the bug I discovered.

Now the question is why n[2] is different on macOS in the first place. I'll spend a bit more time on this rabbit hole.

@yihui
Copy link
Contributor

yihui commented Aug 8, 2024

The script below gives different output on macOS vs other platforms. All code except for the last block was copied from #260.

library(gsDesign2)

ratio = 1

enroll_rate = define_enroll_rate(duration = c(2, 2, 8), rate = c(1, 2, 3))

fail_rate = define_fail_rate(
  duration = c(4, Inf), fail_rate = log(2) / 12, hr = c(1, .6), dropout_rate = .001
)

alpha = 0.025
beta = 0.1

upper = gs_spending_bound
upar = list(sf = gsDesign::sfLDOF, total_spend = alpha)
test_upper = rep(TRUE, 2)

lower = gs_spending_bound
lpar = list(sf = gsDesign::sfLDOF, total_spend = beta)
test_lower = c(TRUE, FALSE)
binding = FALSE

info_frac = NULL
analysis_time = c(24, 36)

x = gs_design_ahr(
  enroll_rate = enroll_rate, fail_rate = fail_rate, 
  alpha = alpha, beta = beta, ratio = ratio,
  info_frac = info_frac, analysis_time = analysis_time, 
  upper = upper, upar = upar, test_upper = test_upper,
  lower = lower, lpar = lpar, test_lower = test_lower,
  binding = binding
)

n2 = x$analysis$n[2]
sample_size_new = ceiling(n2 / 2) * 2  # 454L
n = with(x$enroll_rate, {
  rate = rate * sample_size_new / n2
  sum(rate * duration)
})
n - 454

n is exactly 454 on Linux/Windows, but not on macOS. I can only go this deep for now. I'm not sure if it's worth the time to go to the very bottom. Anyway, the (old) lesson to learn is x * y / y and x / y * y may not be exactly x in floating point arithmetic.

@nanxstats
Copy link
Collaborator

I wonder if this is from gcc vs. clang. You know, the default compiler for base R under macOS is clang, but it's gcc under both Windows and Linux. So this pattern matches the outcomes we observe.

One quick way to test this hypothesis is to launch an Ubuntu clang build of R via the R-hub v2 GitHub Actions workflow and run the code above. If it generates a not exact 0 result we see on macOS, then it's probably the culprit.

Relevant read: GCC on x86 does not round floating-point divisions to the nearest value

@nanxstats
Copy link
Collaborator

I used the R-hub v2 workflow to see which combination can reproduce the non-exact-zero results. (Actions page).

I can only reproduce the non-exact-zero result under the macos-arm64 option (clang14 R + Apple Silicon):

Click to expand
* using R Under development (unstable) (2024-08-05 r86980)
* using platform: aarch64-apple-darwin20
* R was compiled by
    Apple clang version 14.0.0 (clang-1400.0.29.202)
    GNU Fortran (GCC) 12.2.0
* running under: macOS Sonoma 14.6

...

── Failure (test-clang-integer.R:49:3): Capture floating point arithmetics output under clang ──
n - 454 (`actual`) not identical to 0 (`expected`).

  `actual`: -0.00000000000006
`expected`:  0.00000000000000
[ FAIL 1 | WARN 0 | SKIP 0 | PASS 0 ]

The result is exact zero under the macos option (clang14 R + Intel x86_64):

Click to expand
* using platform: x86_64-apple-darwin20
* R was compiled by
    Apple clang version 14.0.0 (clang-1400.0.29.202)
    GNU Fortran (GCC) 12.2.0
* running under: macOS Ventura 13.6.8

I also tried ubuntu-clang and clang16 options and they all pass. However, their names may be "misleading" because the R used there are still compiled by gcc, while clang is only used for compiling the package being checked. So these might still represent gcc R results.

Since the r-lib/actions check-standard workflows for macOS also uses the Apple silicon runners (platform: aarch64-apple-darwin20), and it's the only combination we see generating these error so far, while clang + x86_64 works ok, gcc + x86_64 works ok, my brave guess is that it's an Apple silicon issue (clang just happens to be the compiler for R under it). I also found a related blog post about floating-point summation on M1.

If so, it's unlikely something we can fix, and I agree the most important thing is:

Anyway, the (old) lesson to learn is x * y / y and x / y * y may not be exactly x in floating point arithmetic.

@jdblischak
Copy link
Collaborator Author

do we understand why this affected macOS but not Linux?

@yihui and @nanxstats thanks for the incredibly detailed investigation! I understand this so much better now.

Using seq_along() instead of seq_len() presumably also affected the behavior on Linux and Windows too. Was the difference only that macOS failed whereas Linux and Windows returned incorrect results (silent errors are the most dangerous!)? Assuming that the returned results on Linux and Windows were previously incorrect, are there tests we could add to {gsDesign2} to detect these errors in the future?

I followed up on this. My worry was unfounded. While the macOS test failed because of different rounding behavior on Apple Silicon chips, the behavior on Linux and Windows was not affected by Merck/gsDesign2#447. In other words, they were producing the correct results prior to the bug fix.

I confirmed this on Windows using the code below:

Confirmed stable behavior on Windows. Click for code:
# Start with CRAN version of gsDesign2 that uses seq_along()
install.packages("gsDesign2")
library("gsDesign2")
packageVersion("gsDesign2")
## [1] ‘1.1.2’

grep("seq_along", deparse(gsDesign2:::to_integer.gs_design), value = TRUE)
## [1] "        for (i in seq_along(n_analysis)) {" "        for (i in seq_along(n_analysis)) {"

library("simtrial")
packageVersion("simtrial")
## [1] ‘0.4.1.7’

ratio <- 1
enroll_rate <- define_enroll_rate(duration = c(2, 2, 8),
                                  rate = c(1, 2, 3))
fail_rate <- define_fail_rate(duration = c(4, Inf),
                              fail_rate = log(2) / 12,
                              hr = c(1, .6),
                              dropout_rate = .001)
alpha <- 0.025
beta <- 0.1
upper <- gsDesign2::gs_spending_bound
upar <- list(sf = gsDesign::sfLDOF, total_spend = alpha)
test_upper <- rep(TRUE, 2)
lower <- gsDesign2::gs_spending_bound
lpar <- list(sf = gsDesign::sfLDOF, total_spend = beta)
test_lower <- c(TRUE, FALSE)
binding <- FALSE
info_frac = NULL
analysis_time = c(24, 36)
x <- gsDesign2::gs_design_ahr(enroll_rate = enroll_rate, fail_rate = fail_rate,
                              alpha = alpha, beta = beta, ratio = ratio,
                              info_frac = info_frac, analysis_time = analysis_time,
                              upper = upper, upar = upar, test_upper = test_upper,
                              lower = lower, lpar = lpar, test_lower = test_lower,
                              binding = binding) |> gsDesign2::to_integer()
ia_cut <- simtrial::create_cut(planned_calendar_time = x$analysis$time[1])
fa_cut <- simtrial::create_cut(planned_calendar_time = x$analysis$time[2])
future::plan("sequential")
set.seed(1)
results1 <- simtrial::sim_gs_n(
  n_sim = 1e2,
  sample_size = x$analysis$n[2],
  enroll_rate = x$enroll_rate,
  fail_rate = x$fail_rate,
  test = simtrial::wlr,
  cut = list(ia = ia_cut, fa = fa_cut),
  weight = simtrial::fh(rho = 0, gamma = 0))

# Switch to GitHub version of gsDesign2 that uses seq_len()
detach("package:gsDesign2")
remove.packages("gsDesign2")
# For some reason Windows won't let R delete the DLL
file.remove("~/../AppData/Local/R/win-library/4.3/gsDesign2/libs/x64/gsDesign2.dll")
## [1] FALSE
## Warning message:
##   In file.remove("~/../AppData/Local/R/win-library/4.3/gsDesign2/libs/x64/gsDesign2.dll") :
##   cannot remove file '~/../AppData/Local/R/win-library/4.3/gsDesign2/libs/x64/gsDesign2.dll', reason 'Permission denied'
system("rm ~/../AppData/Local/R/win-library/4.3/gsDesign2/libs/x64/gsDesign2.dll")
unlink("~/../AppData/Local/R/win-library/4.3/gsDesign2", recursive = TRUE)

remotes::install_github("Merck/gsDesign2@9092288", upgrade = FALSE)
library("gsDesign2")
packageVersion("gsDesign2")
## [1] ‘1.1.2.18’
grep("seq_len", deparse(gsDesign2:::to_integer.gs_design), value = TRUE)
## [1] "        for (i in seq_len(n_analysis)) {" "        for (i in seq_len(n_analysis)) {"

ratio <- 1
enroll_rate <- define_enroll_rate(duration = c(2, 2, 8),
                                  rate = c(1, 2, 3))
fail_rate <- define_fail_rate(duration = c(4, Inf),
                              fail_rate = log(2) / 12,
                              hr = c(1, .6),
                              dropout_rate = .001)
alpha <- 0.025
beta <- 0.1
upper <- gsDesign2::gs_spending_bound
upar <- list(sf = gsDesign::sfLDOF, total_spend = alpha)
test_upper <- rep(TRUE, 2)
lower <- gsDesign2::gs_spending_bound
lpar <- list(sf = gsDesign::sfLDOF, total_spend = beta)
test_lower <- c(TRUE, FALSE)
binding <- FALSE
info_frac = NULL
analysis_time = c(24, 36)
x <- gsDesign2::gs_design_ahr(enroll_rate = enroll_rate, fail_rate = fail_rate,
                              alpha = alpha, beta = beta, ratio = ratio,
                              info_frac = info_frac, analysis_time = analysis_time,
                              upper = upper, upar = upar, test_upper = test_upper,
                              lower = lower, lpar = lpar, test_lower = test_lower,
                              binding = binding) |> gsDesign2::to_integer()
ia_cut <- simtrial::create_cut(planned_calendar_time = x$analysis$time[1])
fa_cut <- simtrial::create_cut(planned_calendar_time = x$analysis$time[2])
future::plan("sequential")
set.seed(1)
results2 <- simtrial::sim_gs_n(
  n_sim = 1e2,
  sample_size = x$analysis$n[2],
  enroll_rate = x$enroll_rate,
  fail_rate = x$fail_rate,
  test = simtrial::wlr,
  cut = list(ia = ia_cut, fa = fa_cut),
  weight = simtrial::fh(rho = 0, gamma = 0))

all.equal(results1, results2)
## [1] TRUE

@jdblischak
Copy link
Collaborator Author

the behavior on Linux and Windows was not affected

I confirmed this on Windows using the code below:

Technically I only tested this on Windows. I'm fairly confident Linux should also be fine since macOS seems to be the outlier for this bug. But my inner paranoia convinced me it is better to be safe than sorry.

So I copy-pasted my reprex above for Windows in my WSL Ubuntu 22.04 to be absolutely sure we no longer need to worry about this bug. All good.

Click for results of reprex on Linux
# Start with CRAN version of gsDesign2 that uses seq_along()
install.packages("gsDesign2")
library("gsDesign2")
packageVersion("gsDesign2")
## [1] ‘1.1.2’

grep("seq_along", deparse(gsDesign2:::to_integer.gs_design), value = TRUE)
## [1] "        for (i in seq_along(n_analysis)) {" "        for (i in seq_along(n_analysis)) {"

library("simtrial")
packageVersion("simtrial")
## [1] ‘0.4.1.8’

ratio <- 1
enroll_rate <- define_enroll_rate(duration = c(2, 2, 8),
                                  rate = c(1, 2, 3))
fail_rate <- define_fail_rate(duration = c(4, Inf),
                              fail_rate = log(2) / 12,
                              hr = c(1, .6),
                              dropout_rate = .001)
alpha <- 0.025
beta <- 0.1
upper <- gsDesign2::gs_spending_bound
upar <- list(sf = gsDesign::sfLDOF, total_spend = alpha)
test_upper <- rep(TRUE, 2)
lower <- gsDesign2::gs_spending_bound
lpar <- list(sf = gsDesign::sfLDOF, total_spend = beta)
test_lower <- c(TRUE, FALSE)
binding <- FALSE
info_frac = NULL
analysis_time = c(24, 36)
x <- gsDesign2::gs_design_ahr(enroll_rate = enroll_rate, fail_rate = fail_rate,
                              alpha = alpha, beta = beta, ratio = ratio,
                              info_frac = info_frac, analysis_time = analysis_time,
                              upper = upper, upar = upar, test_upper = test_upper,
                              lower = lower, lpar = lpar, test_lower = test_lower,
                              binding = binding) |> gsDesign2::to_integer()
ia_cut <- simtrial::create_cut(planned_calendar_time = x$analysis$time[1])
fa_cut <- simtrial::create_cut(planned_calendar_time = x$analysis$time[2])
future::plan("sequential")
set.seed(1)
results1 <- simtrial::sim_gs_n(
  n_sim = 1e2,
  sample_size = x$analysis$n[2],
  enroll_rate = x$enroll_rate,
  fail_rate = x$fail_rate,
  test = simtrial::wlr,
  cut = list(ia = ia_cut, fa = fa_cut),
  weight = simtrial::fh(rho = 0, gamma = 0))

# Switch to GitHub version of gsDesign2 that uses seq_len()
detach("package:gsDesign2")
remotes::install_github("Merck/gsDesign2@9092288", upgrade = FALSE)
library("gsDesign2")
packageVersion("gsDesign2")
## [1] ‘1.1.2.18’
grep("seq_len", deparse(gsDesign2:::to_integer.gs_design), value = TRUE)
## [1] "        for (i in seq_len(n_analysis)) {" "        for (i in seq_len(n_analysis)) {"

ratio <- 1
enroll_rate <- define_enroll_rate(duration = c(2, 2, 8),
                                  rate = c(1, 2, 3))
fail_rate <- define_fail_rate(duration = c(4, Inf),
                              fail_rate = log(2) / 12,
                              hr = c(1, .6),
                              dropout_rate = .001)
alpha <- 0.025
beta <- 0.1
upper <- gsDesign2::gs_spending_bound
upar <- list(sf = gsDesign::sfLDOF, total_spend = alpha)
test_upper <- rep(TRUE, 2)
lower <- gsDesign2::gs_spending_bound
lpar <- list(sf = gsDesign::sfLDOF, total_spend = beta)
test_lower <- c(TRUE, FALSE)
binding <- FALSE
info_frac = NULL
analysis_time = c(24, 36)
x <- gsDesign2::gs_design_ahr(enroll_rate = enroll_rate, fail_rate = fail_rate,
                              alpha = alpha, beta = beta, ratio = ratio,
                              info_frac = info_frac, analysis_time = analysis_time,
                              upper = upper, upar = upar, test_upper = test_upper,
                              lower = lower, lpar = lpar, test_lower = test_lower,
                              binding = binding) |> gsDesign2::to_integer()
ia_cut <- simtrial::create_cut(planned_calendar_time = x$analysis$time[1])
fa_cut <- simtrial::create_cut(planned_calendar_time = x$analysis$time[2])
future::plan("sequential")
set.seed(1)
results2 <- simtrial::sim_gs_n(
  n_sim = 1e2,
  sample_size = x$analysis$n[2],
  enroll_rate = x$enroll_rate,
  fail_rate = x$fail_rate,
  test = simtrial::wlr,
  cut = list(ia = ia_cut, fa = fa_cut),
  weight = simtrial::fh(rho = 0, gamma = 0))

all.equal(results1, results2)
## [1] TRUE

sessionInfo()
## R version 4.4.1 (2024-06-14)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 22.04.4 LTS
##
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.10.0
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
##
## locale:
##  [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8
##  [4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8
##  [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C
## [10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C
##
## time zone: America/New_York
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats     graphics  grDevices datasets  utils     methods   base
##
## other attached packages:
## [1] gsDesign2_1.1.2.18 doFuture_1.0.1     future_1.34.0      foreach_1.5.2
## [5] simtrial_0.4.1.8
##
## loaded via a namespace (and not attached):
##  [1] gt_0.11.0           utf8_1.2.4          generics_0.1.3
##  [4] tidyr_1.3.1         bspm_0.5.5          xml2_1.3.6
##  [7] r2rtf_1.1.1         lattice_0.22-6      listenv_0.9.1
## [10] digest_0.6.36       magrittr_2.0.3      grid_4.4.1
## [13] iterators_1.0.14    mvtnorm_1.2-6       fastmap_1.2.0
## [16] Matrix_1.7-0        processx_3.8.4      pkgbuild_1.4.4
## [19] survival_3.7-0      ps_1.7.7            purrr_1.0.2
## [22] fansi_1.0.6         scales_1.3.0        codetools_0.2-20
## [25] cli_3.6.3           rlang_1.1.4         parallelly_1.38.0
## [28] future.apply_1.11.2 munsell_0.5.1       splines_4.4.1
## [31] remotes_2.5.0       withr_3.0.1         gsDesign_3.6.4
## [34] tools_4.4.1         parallel_4.4.1      dplyr_1.1.4
## [37] colorspace_2.1-1    ggplot2_3.5.1       globals_0.16.3
## [40] curl_5.2.1          vctrs_0.6.5         R6_2.5.1
## [43] lifecycle_1.0.4     desc_1.4.3          callr_3.7.6
## [46] pkgconfig_2.0.3     pillar_1.9.0        gtable_0.3.5
## [49] data.table_1.15.4   glue_1.7.0          Rcpp_1.0.13
## [52] tibble_3.2.1        tidyselect_1.2.1    xtable_1.8-4
## [55] htmltools_0.5.8.1   compiler_4.4.1

@jdblischak jdblischak mentioned this pull request Nov 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants