Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider ways to work around stats::filter() when applying dplyr::filter() lints #2078

Closed
MichaelChirico opened this issue Aug 10, 2023 · 11 comments · Fixed by #2169
Closed
Labels
false-negative code that should lint, but doesn't feature a feature request or enhancement
Milestone

Comments

@MichaelChirico
Copy link
Collaborator

MichaelChirico commented Aug 10, 2023

Follow-up to #2077.

That PR is conservative about what filter() calls to include. We do not match filter() unless it's namespace-qualified, to avoid false positives involving stats::filter(). I'm not so familiar with stats::filter(), but my understanding is it'd be really weird to use & there too, but in any case the lint message would look strange.

We might parameterize this instead to increase the reach, e.g. assume_dplyr , when TRUE all filter() calls are matched, or allow_conjunct_filter, when TRUE all filter() calls are skipped.

Another less conservative option: assume filter() is dplyr::filter(), unless it's namespace-qualified as coming from another namespace. That gives users an out if needed, but defaults to assuming everything comes from dplyr (which seems most likely, among users who'd have this lint active)

@MichaelChirico MichaelChirico added feature a feature request or enhancement false-negative code that should lint, but doesn't labels Aug 10, 2023
@MichaelChirico
Copy link
Collaborator Author

Some tests I'd written about this future behavior:

# TODO(michaelchirico): shut these off to stay on the conservative side and
#   only lint for calls that we _know_ are coming from dplyr. consider
#   whether to use an argument to change this, or if we can improve the
#   logic to ensure dplyr::filter() is being used.
# using & in stats::filter() calls should be uncommon, but ensure
#   either dplyr:: is used or there's no namespace qualification
expect_lint(
  "stats::filter(A & B)",
  NULL,
  conjunct_test_linter()
)
expect_lint(
  "ns::filter(A & B)",
  NULL,
  conjunct_test_linter()
)
expect_lint(
  "DF %>% filter(A & B)",
  "Use dplyr::filter\\(DF, A, B\\) instead of dplyr::filter\\(DF, A & B\\)",
  conjunct_test_linter()
)

@AshesITR
Copy link
Collaborator

assume_dpylr_loaded = FALSE should be opt-in I think.

Or instead we do a dynamic check if the globally loaded / NAMESPACE-Imported filter() is from dplyr to remove the NS prefix condition in the XPath?

I think both approaches would greatly increase the amount of true positives.

Also good call we should never lint package-prefixed calls from other packages than dplyr.

@AshesITR
Copy link
Collaborator

I've also seen dplyr::filter(DF, A && B) a lot from beginners. Would this be the right place to also lint this common mistake?

@MichaelChirico
Copy link
Collaborator Author

I've also seen this... maybe more appropriate for vector_logic_linter()?

@MichaelChirico MichaelChirico added this to the 3.1.1 milestone Aug 22, 2023
@MichaelChirico
Copy link
Collaborator Author

MichaelChirico commented Sep 7, 2023

Now that we have allow_filter, maybe we should combine this issue with that FR.

WDYT about allow_filter = c("assume_dplyr", "strict_dplyr", "always")? The first applies to all unqualified filter() calls; the second only to dplyr::filter(); the third doesn't check filter() calls. cc @salim-b for viz.

@AshesITR
Copy link
Collaborator

AshesITR commented Sep 7, 2023

The options, especially "strict_dplyr" seem unintuitive to me.

Maybe "never", "not_dplyr", "always"?

@MichaelChirico
Copy link
Collaborator Author

"never" also sounds too strict, right? Since qualified stats::filter() would never lint. Maybe "qualified_dplyr" sounds better than "strict_dplyr"?

@AshesITR
Copy link
Collaborator

AshesITR commented Sep 7, 2023

But we dont "allow_filter" "qualified_dplyr" 😅

We allow everything but qualified dplyr calls.
Not happy with allow_filter = c("qualified"? "unqualified", "always"), but maybe it inspires you.

@AshesITR
Copy link
Collaborator

AshesITR commented Sep 7, 2023

Another idea: allow_dplyr_filter = c("never", "unqualified", "always")

@salim-b
Copy link
Contributor

salim-b commented Sep 7, 2023

Just my 5 cents: If it's possible for lintr to auto-detect this, i.e. to

do a dynamic check if the globally loaded / NAMESPACE-Imported filter() is from dplyr to remove the NS prefix condition in the XPath?

I'd strongly prefer that instead of extending parameters. 🙂

@MichaelChirico
Copy link
Collaborator Author

I'd strongly prefer that instead of extending parameters. 🙂

We might do that in addition to extending parameters, but requiring that is not appealing to me in general -- it requires the machine executing the linter to have dplyr installed, which may be true for developers working locally on their own package, but is not true by default in distributed work environments.

Another suggestion:

allow_filter = c("assuming_dplyr", "checking_dplyr", "always")

Otherwise, Maybe "never", "not_dplyr", "always"? is growing on me...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
false-negative code that should lint, but doesn't feature a feature request or enhancement
Projects
None yet
3 participants