-
Notifications
You must be signed in to change notification settings - Fork 186
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use file-level where possible for faster computation #2449
base: main
Are you sure you want to change the base?
Conversation
Some thoughts:
and a run phase
|
Not bad so far, but probably still some room for improvement: devtools::load_all()
system.time( lint_package())
# here:
# user system elapsed
# 89.799 0.460 90.362
# main:
# user system elapsed
# 104.109 0.473 104.654 |
# Conflicts: # R/any_is_na_linter.R # R/vector_logic_linter.R
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #2449 +/- ##
==========================================
- Coverage 98.53% 98.13% -0.41%
==========================================
Files 126 126
Lines 5676 5799 +123
==========================================
+ Hits 5593 5691 +98
- Misses 83 108 +25 ☔ View full report in Codecov by Sentry. |
Now just have to fix all the unexpected problems and add some tests for the newly introduced branches :) |
Caching seems broken, but it's hard to reproduce locally for some reason. Seeing the time for linting roughly halved without caching at least shows there is merit in this approach 🥲 |
indeed looks amazing! I also wonder if we should add an option to get_source_expressions() to skip building the expr-level objects. this after Rdatatable/data.table#5830 (comment) where a massive file spends the vast majority of compute time on this step. anyway, I'm thinking it's prudent to save this PR for after release -- messing around with the caching seems like a minefield for hard-to-catch bugs. would be good to let this hang around in dev for longer to see what bubbles up. |
I had thought about lazily building them, but it's not useful as long as not all linters support "batches", because they are needed at least once anyway. Maybe there is a more performant way to create the objects under the assumption that the tree is read-only. |
There are only two linters incompatible with file-level lints (as evidenced by the hacky PR failures here):
All other linters could compute on the single file-level source expression, for potentially huge gains by avoiding function calls, loops, appends, ...
What needs to be solved is how to cache and retrieve lints in this run mode.
Once that's done, we can add a new attribute (
max_level
?) toLinter()
that signalslint()
that the linter can handle parallel linting of all expression-level source expressions.WDYT about the idea?
Do you have any ideas on the cache part?
I'm especially interested in the scenario where a cache entry is available for most, but not all individual expressions.