Ensure `if()` is always fed objects of length 1 #442

maurolepore · 2021-04-02T14:20:36Z

#436 is one example of a problem I've seen multiple times. Maybe we should review all calls to if() and ensure they are all fed objects of length 1?

This is a likely source of bugs and the fix is fairly straight forward. Nicely, it will force us to review the logic of our programs and decide when we should wrap in all(), any(), use identical(), extract the first element, or do something else.

We could continue to fix them as we bump into them but likely it's more efficient to fix them all in one go, allowing us to fix a bunch of bugs with relatively little code and time.

The text was updated successfully, but these errors were encountered:

cjyetman · 2021-04-02T17:03:48Z

Yeah, these need to get cleaned up eventually, but I'd hesitate believing that there's one-size-fits-all solution for every instance of this. Let's approach each variation specifically. Maybe we can identify them here as we find them?

maurolepore · 2021-04-03T00:19:40Z

Okay. I think whether or not we review all if() statements or tackle them as we bump into them is a strategic decision about how we tackle technical debt and therefore a decision that @AlexAxthelm will likely prefer to make. I would then close this issue because it'll be sitting here for a while and we already know there is a lot of technical debt.

Before I close I paste a snippet of a global search for "if(" in .R files. Just here I see four instances of == inside an if() statement. Compared to identical(), using == inside an if() statement is suspicious because it can return objects of length > 1. Therefore, all such if() statements are good candidates for review.

cjyetman · 2021-04-03T15:11:40Z

Personally, I don't see an issue like this --with a reasonable goal, well thought options for how to fix them, and some preliminary investigation of where the fixes need to happen-- as tech debt, though I admittedly don't know nor really care so much what the precise definition of that is, so I could be wrong, but in any case, I would leave this issue open.

I'm also not opposed to fixing these all at once, only opposed to blindly applying the same "fix" to all of them... as you've already pointed out, in some cases all() or any() might be more appropriate than identical() or vector[[1]] == "something"... it might not be obvious which is most appropriate in each case and require deeper investigation.

AlexAxthelm · 2021-04-06T09:44:13Z

1: I think this qualifies as "tech debt" in the sense that the system (mostly) works now, but this could cause us problems in the future.

2: I agree that resolving this issue is going to be handled on a case-by-case basis. Maybe we can make a coordinated effort to tackle this, but If they haven't been causing (many) serious issues yet, I think that addressing these whenever we're touching those files anyway seems like a reasonable approach.

cjyetman · 2021-04-06T10:10:24Z

@AlexAxthelm does that mean that this issue thread being open is tech debt, or that the problem that some if statements are not well defined is tech debt?

AlexAxthelm · 2021-04-06T10:39:49Z

I view the if statements as tech debt. It opens up an avenue for errors, but it seems like #436 is the only one that has popped up as an actual problem so far.

cjyetman · 2021-11-29T15:51:40Z

I dug into this today a bit. Just want to note here that in some places, especially when there's a default value, I actually prefer the warning in the if statement because it reveals something that should be fixed upstream, rather than obscuring it by circumventing the problem and passing on garbage. For instance, in the code below, financial_timestamp conceptually should never be more than a single character element. If somehow we make a change that causes financial_timestamp to have more than one element under specific conditions, I would rather the below conditionals use the first element and throw a warning than simply pass garbage back and not leave any message.

https://github.com/2DegreesInvesting/PACTA_analysis/blob/e46ce20befa09278b2f3884340bf2795f97226c1/0_reporting_functions.R#L394-L404

example...

test <- function(financial_timestamp) {
  if (financial_timestamp == "2017Q4") {
    "31.12.2017"
  } else if (financial_timestamp == "2018Q4") {
    "31.12.2018"
  } else if (financial_timestamp == "2019Q4") {
    "31.12.2019"
  } else {
    financial_timestamp
  }
}

test2 <- function(financial_timestamp) {
  if (identical(financial_timestamp, "2017Q4")) {
    "31.12.2017"
  } else if (identical(financial_timestamp, "2018Q4")) {
    "31.12.2018"
  } else if (identical(financial_timestamp, "2019Q4")) {
    "31.12.2019"
  } else {
    financial_timestamp
  }
}

test(c("2018Q4", "2019Q4"))
#> [1] "31.12.2018"
#> Warning messages:
#> 1: In if (financial_timestamp == "2017Q4") { :
#>   the condition has length > 1 and only the first element will be used
#> 2: In if (financial_timestamp == "2018Q4") { :
#>   the condition has length > 1 and only the first element will be used

test2(c("2018Q4", "2019Q4"))
#> [1] "2018Q4" "2019Q4"

AlexAxthelm · 2021-11-29T20:10:35Z

I think this is a good example of why it's important to document assumptions, even implicitly.

for example, I would say that either of those functions is fine, with the following changes:

# this code not tested, for illustrative purpose only.
validate_financial_timestamp <- function(financial_timestamp) {
  # financial timestamp should be a single point in time, in form of YYYYQN, 
  # where YYYY is a 4 digit year, and N is a number 1-4 (inclusive)
  stopifnot(length(financial_timestamp) == 1L) # could also use assert, if available
  stopifnot(grepl(x = financial_timestamp, pattern = "^[:digit]{4}Q[1-4]$"))
  return(invisible(financial_timestamp))
}

test <- function(financial_timestamp) {
  validate_financial_timestamp(financial_timestamp)
  if (financial_timestamp == "2017Q4") {
    "31.12.2017"
  } else if (financial_timestamp == "2018Q4") {
    "31.12.2018"
  } else if (financial_timestamp == "2019Q4") {
    "31.12.2019"
  } else {
    financial_timestamp
  }
}

AlexAxthelm · 2021-11-29T20:13:12Z

basically, validation functions should be quick to run, and therefore, we're willing to use them more frequently (basically every time we ingest an object through a public function). The other major alternative to this would be to take a more OOP based approach, but that's a can of worms that I don't know if we want to go down.

maurolepore closed this as completed Apr 3, 2021

cjyetman reopened this Apr 3, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ensure `if()` is always fed objects of length 1 #442

Ensure `if()` is always fed objects of length 1 #442

maurolepore commented Apr 2, 2021 •

edited

Loading

cjyetman commented Apr 2, 2021

maurolepore commented Apr 3, 2021 •

edited

Loading

cjyetman commented Apr 3, 2021

AlexAxthelm commented Apr 6, 2021

cjyetman commented Apr 6, 2021

AlexAxthelm commented Apr 6, 2021

cjyetman commented Nov 29, 2021

AlexAxthelm commented Nov 29, 2021

AlexAxthelm commented Nov 29, 2021

Ensure if() is always fed objects of length 1 #442

Ensure if() is always fed objects of length 1 #442

Comments

maurolepore commented Apr 2, 2021 • edited Loading

cjyetman commented Apr 2, 2021

maurolepore commented Apr 3, 2021 • edited Loading

cjyetman commented Apr 3, 2021

AlexAxthelm commented Apr 6, 2021

cjyetman commented Apr 6, 2021

AlexAxthelm commented Apr 6, 2021

cjyetman commented Nov 29, 2021

AlexAxthelm commented Nov 29, 2021

AlexAxthelm commented Nov 29, 2021

Ensure `if()` is always fed objects of length 1 #442

Ensure `if()` is always fed objects of length 1 #442

maurolepore commented Apr 2, 2021 •

edited

Loading

maurolepore commented Apr 3, 2021 •

edited

Loading