-
Notifications
You must be signed in to change notification settings - Fork 70
Ensure if()
is always fed objects of length 1
#442
Comments
Yeah, these need to get cleaned up eventually, but I'd hesitate believing that there's one-size-fits-all solution for every instance of this. Let's approach each variation specifically. Maybe we can identify them here as we find them? |
Okay. I think whether or not we review all Before I close I paste a snippet of a global search for "if(" in .R files. Just here I see four instances of |
Personally, I don't see an issue like this --with a reasonable goal, well thought options for how to fix them, and some preliminary investigation of where the fixes need to happen-- as tech debt, though I admittedly don't know nor really care so much what the precise definition of that is, so I could be wrong, but in any case, I would leave this issue open. I'm also not opposed to fixing these all at once, only opposed to blindly applying the same "fix" to all of them... as you've already pointed out, in some cases |
1: I think this qualifies as "tech debt" in the sense that the system (mostly) works now, but this could cause us problems in the future. 2: I agree that resolving this issue is going to be handled on a case-by-case basis. Maybe we can make a coordinated effort to tackle this, but If they haven't been causing (many) serious issues yet, I think that addressing these whenever we're touching those files anyway seems like a reasonable approach. |
@AlexAxthelm does that mean that this issue thread being open is tech debt, or that the problem that some |
I view the |
I dug into this today a bit. Just want to note here that in some places, especially when there's a default value, I actually prefer the warning in the example... test <- function(financial_timestamp) {
if (financial_timestamp == "2017Q4") {
"31.12.2017"
} else if (financial_timestamp == "2018Q4") {
"31.12.2018"
} else if (financial_timestamp == "2019Q4") {
"31.12.2019"
} else {
financial_timestamp
}
}
test2 <- function(financial_timestamp) {
if (identical(financial_timestamp, "2017Q4")) {
"31.12.2017"
} else if (identical(financial_timestamp, "2018Q4")) {
"31.12.2018"
} else if (identical(financial_timestamp, "2019Q4")) {
"31.12.2019"
} else {
financial_timestamp
}
}
test(c("2018Q4", "2019Q4"))
#> [1] "31.12.2018"
#> Warning messages:
#> 1: In if (financial_timestamp == "2017Q4") { :
#> the condition has length > 1 and only the first element will be used
#> 2: In if (financial_timestamp == "2018Q4") { :
#> the condition has length > 1 and only the first element will be used
test2(c("2018Q4", "2019Q4"))
#> [1] "2018Q4" "2019Q4" |
I think this is a good example of why it's important to document assumptions, even implicitly. for example, I would say that either of those functions is fine, with the following changes: # this code not tested, for illustrative purpose only.
validate_financial_timestamp <- function(financial_timestamp) {
# financial timestamp should be a single point in time, in form of YYYYQN,
# where YYYY is a 4 digit year, and N is a number 1-4 (inclusive)
stopifnot(length(financial_timestamp) == 1L) # could also use assert, if available
stopifnot(grepl(x = financial_timestamp, pattern = "^[:digit]{4}Q[1-4]$"))
return(invisible(financial_timestamp))
}
test <- function(financial_timestamp) {
validate_financial_timestamp(financial_timestamp)
if (financial_timestamp == "2017Q4") {
"31.12.2017"
} else if (financial_timestamp == "2018Q4") {
"31.12.2018"
} else if (financial_timestamp == "2019Q4") {
"31.12.2019"
} else {
financial_timestamp
}
} |
basically, validation functions should be quick to run, and therefore, we're willing to use them more frequently (basically every time we ingest an object through a public function). The other major alternative to this would be to take a more OOP based approach, but that's a can of worms that I don't know if we want to go down. |
#436 is one example of a problem I've seen multiple times. Maybe we should review all calls to
if()
and ensure they are all fed objects of length 1?This is a likely source of bugs and the fix is fairly straight forward. Nicely, it will force us to review the logic of our programs and decide when we should wrap in
all()
,any()
, useidentical()
, extract the first element, or do something else.We could continue to fix them as we bump into them but likely it's more efficient to fix them all in one go, allowing us to fix a bunch of bugs with relatively little code and time.
The text was updated successfully, but these errors were encountered: