Fix a typo: `seq_along()` should have been `seq_len()` #447

yihui · 2024-08-07T04:22:44Z

This fixes the macOS issue of simtrial on GHA: Merck/simtrial#262

When n_analysis = 2, seq_along(n_analysis) is 1, and only the first n in x_new$analysis is rounded. We should have rounded all elements in n that are close enough to integers, so the loop should go from 1 to n_analysis, i.e., the looping indices should be seq_len(n_analysis).

This problem was originally introduced by d0159b9.

This fixes the macOS issue of simtrial on GHA: Merck/simtrial#262 When `n_analysis = 2`, `seq_along(n_analysis)` is `1`, and only the first `n` in `x_new$analysis` is rounded. We should have rounded all elements in `n` that are close enough to integers, so the loop should go from `1` to `n_analysis`, i.e., the looping indices should be `seq_len(n_analysis)`.

jdblischak · 2024-08-07T16:10:01Z

xref: #350, #351

jdblischak · 2024-08-07T16:26:52Z

I confirmed that this continues to work on Windows. I installed {gsDesign2} from this PR along with the latest version of {simtrial}. I then ran R CMD check for {simtrial} as well as interactively ran the original test code from Merck/simtrial#260

LittleBeannie

Thank you, @yihui !

yihui · 2024-08-07T17:49:30Z

You are welcome!

yihui · 2024-08-07T21:14:11Z

@LittleBeannie I forgot to ask a question: since n and event are not guaranteed to be integers, are they meaningful when they are not integers? It feels odd to me if a sample size is not an integer. I don't have the subject matter expertise on design of experiments, but I'd expect sample sizes to be always integers.

LittleBeannie · 2024-08-08T13:43:46Z

@LittleBeannie I forgot to ask a question: since n and event are not guaranteed to be integers, are they meaningful when they are not integers? It feels odd to me if a sample size is not an integer. I don't have the subject matter expertise on design of experiments, but I'd expect sample sizes to be always integers.

Hi Yihui, thanks for sharing your insights. Following please find my explanation of the sample size/events.

The calculation of sample size is based on the targeted power. For example, we target for 90% power, which require for 100 statistical information (1/variance). To get exactly 100 statistical information, the sufficient sample size is 300.4 (non-integer).

If users prefer to get integer sample size, then we will around it to 302 (if it is equal randomization), which will gives you statistical information slightly larger than 100, leading to a power slightly over 90%.

# gives non-integer sample size
gs_design_ahr(...)  

# gives integer sample size
gs_design_ahr(...) |> to_integer()

yihui · 2024-08-08T14:38:20Z

Thanks for the explanation! I understand it better now. My question was specifically for the function to_integer(). I think it can be surprising that to_integer() is not guaranteed to produce integer output but depends on how close a number is to its rounded value, i.e., is_almost_k(n, round(n)). Does it make more sense if to_integer() rounds the numbers unconditionally?

LittleBeannie · 2024-08-08T15:02:18Z

Thanks for the explanation! I understand it better now. My question was specifically for the function to_integer(). I think it can be surprising that to_integer() is not guaranteed to produce integer output but depends on how close a number is to its rounded value, i.e., is_almost_k(n, round(n)). Does it make more sense if to_integer() rounds the numbers unconditionally?

I guess it is a numerical issue...where I am struggling with....

Following please find a toy example to demonstrate the logical of to_integer. Do you happen to have any suggestions where I could make the sample size exactly as integer?

x <- gsDesgn2::gs_design_ahr(
  analysis_time = c(18, 30),
  upper = gsDesgn2::gs_spending_bound,
  upar = list(sf = gsDesign::sfLDOF, total_spend = 0.025, param = NULL),
  lower = gsDesgn2::gs_b,
  lpar = c(-Inf, -Inf))

# enrollment rate with non-integer sample size
enroll_rate <- x$enroll_rate
sample_size_noninteger <- sum(enroll_rate$rate * enroll_rate$duration)

# suppose it is equal randomization. make the sample size into integer
sample_size_integer <- ceiling(sample_size_noninteger / 2) * 2

# update the enroll rate so that sum(rate * duration) is the integer sample size
enroll_rate_integer <- enroll_rate %>% mutate(rate = rate * sample_size_integer / sample_size_noninteger)
sum(enroll_rate_integer$rate * enroll_rate_integer$duration) - 592

yihui · 2024-08-08T16:25:21Z

Okay, I understand the logic now. Unfortunately, I think that's impossible for computers---you can't expect that x / y * y == x or x * y / y == x (due to limitation of floating point arithmetic).

The above logic can be simplified to:

c = a * b
c2 = f(c)
a2 = a * c2 / c

What you were trying to achieve is a2 * b == c2, which is true in theory (and only in theory):

a2 * b = (a * c2 / c) * b
       = (a * c2 / (a * b)) * b
       = (c2 / b) * b
       = c2

Now the question boils down to whether $|x / y \cdot y - x| < \sqrt{\epsilon}$ is always true (where $\epsilon$ = .Machine$double.eps). The answer is no, especially when $x$ and $y$ are large, e.g.,

e = sqrt(.Machine$double.eps)
x = runif(10000, 0, 1e8)
y = runif(10000, 0, 1e8)
d = x / y * y - x
i = abs(d) < e
all(i)  # likely to be FALSE
plot(x, y, pch = ifelse(i, '.', 'x'))

So it's impossible to make the new rate * duration = integer sample size exactly. My suggestion is that you abandon the is_almost_k() checks, and always round n in to_integer(). Even if rate * duration - sample size is bigger than $\sqrt{\epsilon}$, the difference won't be so large as to worth concerns.

plot(abs(d), las = 1)

Perhaps you can check with Keaven to see if this makes sense.

LittleBeannie · 2024-08-08T19:52:52Z

Thanks, @yihui for the detailed clarification. I will follow-up with Keaven when he is back.

yihui mentioned this pull request Aug 7, 2024

Test and document forced evaluation of promises for parallel execution Merck/simtrial#262

Merged

jdblischak requested review from LittleBeannie and jdblischak August 7, 2024 16:06

LittleBeannie approved these changes Aug 7, 2024

View reviewed changes

LittleBeannie merged commit 9092288 into Merck:main Aug 7, 2024
7 checks passed

yihui deleted the patch-1 branch August 7, 2024 17:49

yihui mentioned this pull request Aug 13, 2024

Let to_integer() always produce integer analysis$n and analysis$event #452

Merged

LittleBeannie assigned yihui Aug 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix a typo: `seq_along()` should have been `seq_len()` #447

Fix a typo: `seq_along()` should have been `seq_len()` #447

yihui commented Aug 7, 2024

jdblischak commented Aug 7, 2024

jdblischak commented Aug 7, 2024

LittleBeannie left a comment

yihui commented Aug 7, 2024

yihui commented Aug 7, 2024

LittleBeannie commented Aug 8, 2024

yihui commented Aug 8, 2024 •

edited

Loading

LittleBeannie commented Aug 8, 2024

yihui commented Aug 8, 2024

LittleBeannie commented Aug 8, 2024

Fix a typo: seq_along() should have been seq_len() #447

Fix a typo: seq_along() should have been seq_len() #447

Conversation

yihui commented Aug 7, 2024

jdblischak commented Aug 7, 2024

jdblischak commented Aug 7, 2024

LittleBeannie left a comment

Choose a reason for hiding this comment

yihui commented Aug 7, 2024

yihui commented Aug 7, 2024

LittleBeannie commented Aug 8, 2024

yihui commented Aug 8, 2024 • edited Loading

LittleBeannie commented Aug 8, 2024

yihui commented Aug 8, 2024

LittleBeannie commented Aug 8, 2024

Fix a typo: `seq_along()` should have been `seq_len()` #447

Fix a typo: `seq_along()` should have been `seq_len()` #447

yihui commented Aug 8, 2024 •

edited

Loading