Replace non-custom flake8 linting with ruff#21037
Conversation
|
If it's the same speed or faster, it's a +1 for me. Question: What's the perf like once we get rid of flake entirely? |
|
I did some basic experiments running
Running ruff outside of pants runs much faster (e.g. (Potentially there's some processes like discovering Python interpreters that are being included in the |
|
That's still a decent improvement, but clearly could be a lot more. There is something to also be said about creating sandboxes and whatnot for lint/check-based operations that won't mutate the environment. Maybe something like the option to run those in the new workspace environments 🤷🏽 That's a can of worms all on its own, but overall, to answer the original question - it seems like swapping over is a win all around! And will likely become more of a win as we optimize some more steps. |
| # flake8-2020 | ||
| "YTT", | ||
| # flake8-implicit-str-concat, but only on a single line (includes bytes) | ||
| "ISC001", |
There was a problem hiding this comment.
I think we also need to add ISC002 to match NIC002 ? If it includes bytes too, I only found one instance of that (in a test), so it might be easier to fix it there
One thing to remember is that some linters are just formatter we run and check if they've modified any files |
Yeah, case-by-case - take the optimizations where we can, for those formatters that don't come baked with a "check" option, we can default to comparing sandbox code. As much as I hate splitting up code paths, if there is any non-trivial perf improvement, I generally think build tools should tend towards those options. |
|
I think we can get rid of a lot of the sandbox overhead for fast tools (i.e. where the overhead of building sandboxes is a high proportion of the total time) by creating fewer sandboxes/putting more files in each one. See analysis/ideas in #18570 (comment) In particular, running things in sandboxes has correctness advantages (e.g. handling of edits that happen concurrently with a running processes #21051 (comment)), that'd be nice to avoid losing by default, I think. |
|
From your linked comment, what is the cost of sandboxing X files on a modern computer? In my mind, if the cost to sandbox ever exceeds the cost of running the process, we'd probably want an escape hatch to run it in a workspace or skip the sandbox. With a single Your equation from that linked comment I think is the right way, so long as the minimum number of sandboxes could be 0, not 1 (by default, 0 sandboxes would need to be opt-in though, as it's riskier to your concurrent edit points above). Running fast tools without a sandbox in CI, for example, might be nice. And I can basically guarantee that I would always opt into whatever is fastest :) |
|
Can I suggest we move the discussion of tweaking sandboxing behaviour to #18570 or somewhere else, doesn't feel relevant to this PR? |
|
Oh yeah, sure. I'm +1 for this change |
|
FYI: #22900 replaced pyupgrade with ruff-check - so, it's in the repo now, we're free to add away. Also, I don't think we need to be overly concerned with 1:1 matching what the existing rules/bugs are. If there are improvements to be had, we should add them. The custom rules are also ripe for re-evaluation.
An alternative is to not make Flake8 plugins, but just write a custom, small linter that surfaces Pants-only issues. Similar to how we have custom migrations and whatnot. |
|
Thanks for the PR Huon! I took your work added to it in #23330 |
This swaps all flake8 usage to ruff via the built-in Ruff backend, except for the custom PNT20 and PNT30 custom rules.
We still have to run flake8, and it does not run much faster (still ~10s). So, this is doing "strictly" more work than previously: run flake8 and ruff.
However this change does open the door for easily enabling some additional lints that are built-in to ruff, like #21018, rather than requiring searching out and installing a flake8 plugin.
I did some basic experiments running
main:pants --no-local-cache lint --stats-log --only=flake8 ::pants --no-local-cache lint --stats-log --only=ruff-check ::local_process_total_time_run_ms(B)Running ruff outside of pants runs much faster (e.g.
ruff check .takes like 300ms, so I'd expect a lot of the 7.7 seconds is Pants overhead, e.g. setting up sandboxes or unzipping pexes or whatever. #18570 is potentially related.Thoughts?