-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
case_when() Lacks Safe Handling for Unexpected Values #7653
Comments
In In SQL the
Same in
In
PS: I'm a simple user. |
library(dplyr)
replace_func <- function(x) {
res <- case_when(
x == "A" ~ 1,
x == "B" ~ 2,
x == "C" ~ 3,
# If there is a different value I want the function to throw an error
# and stop the execution
.default = NA
)
if (any(is.na(res))) {
stop(cat(paste0("Invalid value : ", x[which(is.na(res))]), sep="\n", fill=TRUE))
}
res
}
data <- tibble(x = c("A", "B", "A", "C"))
data1 <- tibble(x = c("A", "B", "A", "C", "D"))
data2 <- tibble(x = c("A", "B", "E" ,"A", "C" , "D" ,"F"))
data %>% mutate(new_x = replace_func(x))
#> # A tibble: 4 × 2
#> x new_x
#> <chr> <dbl>
#> 1 A 1
#> 2 B 2
#> 3 A 1
#> 4 C 3 data1 %>% mutate(new_x = replace_func(x))
#> Invalid value : D
#> Error in `mutate()`:
#> ℹ In argument: `new_x = replace_func(x)`.
#> Caused by error in `replace_func()`: data2 %>% mutate(new_x = replace_func(x))
#> Invalid value : E
#> Invalid value : D
#> Invalid value : F
#> Error in `mutate()`:
#> ℹ In argument: `new_x = replace_func(x)`.
#> Caused by error in `replace_func()`: |
@philibe Thank you. You are correct this does have the expected behavior but I think it's more of a workaround than a proper solution.
In these cases, what I ended up doing is using a named vector to do my replacements instead and then checking if there are any values not in the names of that vector. This is more code than it would be under Lastly, this workaround works fine inside a function but in more direct scenarios it becomes harder to implement. For example when doing mutate(new_x = function(x) {
x = case_when(...)
if (any(x) == 'default value') {
stop(...)
}
x
}) or: mutate(new_x = \(x) {
x = case_when(...)
if (any(x) == 'default value') {
stop(...)
}
x
}) or similar. Which when compared to mutate(new_x = case_when(..., .default = stop(...))) Seems like it is particularly convoluted. |
Anyway I wouldn't like that the stop you ask becomes the default behavior, but only an optional feature. I like to choose for each case I need. :) In a reprex environment on small R Shiny App I understand, but on my big app with many menus, I don't want that everything stop, and if it is not the case I create functions, in the lowest level of my application, which encapsulated original function with a conditionnal |
I am also dealing with this. I have data on surgical complications that are coming in as NA = "No surgery yet", 1 = "Yes", 2 = "No". My code to handle it looks like this:
Turning this into a function makes my head spin, and the current version will make my novice students' heads explode. So add me to the list of people looking for a built-in solution. |
Currently,
case_when()
does not provide a built-in way to validate categorical inputs and throw an error when an unexpected value is encountered. The function requires all return values to have the same type, making it impossible to safely use in cases where an unexpected value is encountered. The function is also incompatible in most cases withstop()
.This makes
case_when()
unsafe in cases where developers need both:Reproducible Example:
Currently, the only alternatives for handling unknown values in
case_when()
are:case_when()
, which is an imperfect solution with unnecessary complexity or.default = NA
, which can lead to silent failures—an unknown value that should have been handled explicitly might be mistakenly transformed intoNA
instead of triggering an error.Neither of these solutions is ideal.
Proposed Solution
I believe the default behavior should be something along the lines of
.default = stop(paste0("Unknown value: ", x))
. This would force users to explicitly handle unknown values within their program, ensuring safer data transformations. If users want to allow unknown values to default toNA
, they should be required to specify it explicitly by using.default = NA
orTRUE ~ NA
. This approach would provide better safety by default, preventing unintendedNA
values from propagating due to missing mappings incase_when()
.Would love to hear your thoughts on this!
The text was updated successfully, but these errors were encountered: