Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support ruff check --fix as a formatter #758

Open
zmeir opened this issue Oct 14, 2024 · 9 comments
Open

Support ruff check --fix as a formatter #758

zmeir opened this issue Oct 14, 2024 · 9 comments
Assignees
Labels
enhancement New feature or request
Milestone

Comments

@zmeir
Copy link

zmeir commented Oct 14, 2024

#744 will add support for ruff format as a code-formatter.

Ruff also supports the ruff check --fix in its linter command, that allows it to automatically fix some linter errors. This essentially turns ruff check from a linter to a code-formatter.

It would be nice if we could also use ruff check --fix as a code-formatter with Darker.
There are also additional flags to control the auto-fixing behavior, that maybe need to be taken into account: --fix-only, --unsafe-fixes.

Why not just use Graylint?

Graylint is a perfect solution for running linter commands that do not modify the code.
However, with ruff check --fix, the "linter" will automatically modify the code the fix the linter errors. This means that ruff check --fix is not only a linter but also a code-formatter, and as such is incompatible with Graylint.

@akaihola akaihola added the enhancement New feature or request label Oct 19, 2024
@akaihola akaihola self-assigned this Oct 19, 2024
@akaihola akaihola added this to the Darker 3.1.0 milestone Oct 19, 2024
@akaihola
Copy link
Owner

@zmeir, there's one complication we need to think about.

The algorithm Darker uses to match code changes from 1) the user and 2) reformatting is not foolproof. It does a best effort of comparing and matching diff chunks, but in some edge cases it will get confused and reassemble the Python source code file in the wrong way.

To mitigate this, Darker always compares the AST before and after applying the reformatted chunks, and if the AST didn't stay intact, it will run the whole diff matching again but enlarging the chunks considered to have been edited by the user. This process is repeated until the AST matches, or until the entire file is considered as edited and thus all reformatting changes are applied.

Now, without investigation and testing, it's impossible to know whether the changes done by ruff check --fix will trigger such an edge case. One thing is sure though: the AST won't be unchanged after ruff's fixes, so AST verification can't be used to mitigate mistakes done by Darker's diff matching algorithm.

So investigation, testing and considering different mitigation strategies is needed.

For earlier discussions on the diff algorithm, see:

@zmeir
Copy link
Author

zmeir commented Oct 20, 2024

Thanks for the response @akaihola !

I was aware of the AST verification Darker does, but it hadn't occurred to me that ruff check --fix will modify the AST until you pointed it out. Indeed, rewriting code is more than just formatting...

I'll try to play around with graylint --lint="ruff check" and see if I can use the output to construct a command to ruff check --fix-only that will only target the specific issues that were reported in the diff.

@akaihola
Copy link
Owner

akaihola commented Oct 20, 2024

One obvious (but currently impossible) way to solve this would be to somehow have ruff check --fix report only the Nth fix, with N being accepted on the command line.

Darker could then start with fix 1, and only apply it if the lines it changes intersect with any of the user modified lines.

Darker would track changed line numbers of user modifications (we have code for this in Graylint), and proceed to fix 2 (if fix 1 was not applied) or the new fix 1 (if the original fix 1 was applied).

Unfortunately it's unrealistic to expect ruff to implement such a feature. The performance penalty might also be noticeable.

@akaihola
Copy link
Owner

I'm experimenting with a pyupgrade formatter plugin in #755, and it has exactly the same problem. Currently I'm just sidestepping the issue by disabling AST verificating and enforcing applying all pyupgrade changes to all user-modified files.

We could of course make it one notch smarter by only applying all changes to the file if at least one changed line intersects with user modifications.

@akaihola
Copy link
Owner

I've also pondered whether running a non-AST-preserving formatter first for the old, unmodified version of the code (e.g. main) and then again for the modified one (e.g. feature branch or working tree) would give enough information to figure out how to isolate changes touching the modified lines.

But it becomes rather complicated with many moving parts, and I suspect this approach won't fly either.

@akaihola
Copy link
Owner

I'll try to play around with graylint --lint="ruff check" and see if I can use the output to construct a command to ruff check --fix-only that will only target the specific issues that were reported in the diff.

Do you mean adding --fix-only and --select with a comma-separated list of specific rules?

That could indeed be another way to increase granularity – probably in most cases it would help only apply changes touching modified lines.

But the case in which it fails is when the same fix is applied both to lines which have been modified by the user and to lines which haven't. In that case I'm afraid the best we can do is to apply all instances of the same fix inside a file.

@zmeir
Copy link
Author

zmeir commented Oct 21, 2024

Do you mean adding --fix-only and --select with a comma-separated list of specific rules?

Yes. I was hoping to find other ruff configurations to help me narrow the scope even more, like an option to limit the line range to check, but I found nothing like this.

One obvious (but currently impossible) way to solve this would be to somehow have ruff check --fix report only the Nth fix, with N being accepted on the command line.

Obvious but impossible. That sums up what I'm looking for :)

I'm experimenting with a pyupgrade formatter plugin in #755 [...] We could of course make it one notch smarter by only applying all changes to the file if at least one changed line intersects with user modifications.

That sounds like a great solution. This combined with some narrowing of rules using --select sounds like it could work very well.

I've also pondered whether running a non-AST-preserving formatter first for the old, unmodified version of the code (e.g. main) and then again for the modified one (e.g. feature branch or working tree) would give enough information to figure out how to isolate changes touching the modified lines.

I had a similar thought. What if we do something like this?

  1. Run ruff check --fix on the unmodified code
  2. Run ruff check --fix on the modified code
  3. Save the diff before running step (2) on the modified code - call it "branch diff"
  4. Save the diff between the result of (1) and (2) - call it "fix diff"
  5. Keep any diff chunks from "branch diff" that intersect with at least one chunk in "fix diff"

I'm sure there are holes in this flow. Just a thought...

@akaihola
Copy link
Owner

Oops @zmeir, I didn't mean to edit your comment. I must have selected the wrong drop-down menu item when I was going to do Quote reply. I wonder if I can somehow recover the original content...

@zmeir
Copy link
Author

zmeir commented Oct 23, 2024

Oops @zmeir, I didn't mean to edit your comment. I must have selected the wrong drop-down menu item when I was going to do Quote reply. I wonder if I can somehow recover the original content...

No worries, I restored the original comment from the edit history.

As for your question:

What if both (1) and (2) make the same fix to different parts of the code, and (1) also adds an import which is needed for the fix? That import will appear in the "branch diff" but not in the "fix diff".

Yeah that's a fair point. I hadn't thought about a single fix that can possibly change multiple "chunks" of code. My mental model was that a fix is completely contained in a contiguous range of code lines, but that's because I'm used to looking at code changes of formatters, not linters...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Development

No branches or pull requests

2 participants