Skip to content

Replace non-custom flake8 linting with ruff#21037

Closed
huonw wants to merge 4 commits into
mainfrom
huonw/ruff-not-flake8
Closed

Replace non-custom flake8 linting with ruff#21037
huonw wants to merge 4 commits into
mainfrom
huonw/ruff-not-flake8

Conversation

@huonw
Copy link
Copy Markdown
Contributor

@huonw huonw commented Jun 9, 2024

This swaps all flake8 usage to ruff via the built-in Ruff backend, except for the custom PNT20 and PNT30 custom rules.

We still have to run flake8, and it does not run much faster (still ~10s). So, this is doing "strictly" more work than previously: run flake8 and ruff.

However this change does open the door for easily enabling some additional lints that are built-in to ruff, like #21018, rather than requiring searching out and installing a flake8 plugin.

I did some basic experiments running

  • flake8 on main: pants --no-local-cache lint --stats-log --only=flake8 ::
  • ruff on this PR: pants --no-local-cache lint --stats-log --only=ruff-check ::
timing flake8 ruff
building requirements/PEX (A) 0.8 0.9
local_process_total_time_run_ms (B) 27.757 8.643
time spent actually running the linter (B - A) ~27 7.7

Running ruff outside of pants runs much faster (e.g. ruff check . takes like 300ms, so I'd expect a lot of the 7.7 seconds is Pants overhead, e.g. setting up sandboxes or unzipping pexes or whatever. #18570 is potentially related.

Thoughts?

@huonw huonw added the category:internal CI, fixes for not-yet-released features, etc. label Jun 9, 2024
@sureshjoshi
Copy link
Copy Markdown
Member

If it's the same speed or faster, it's a +1 for me.

Question: What's the perf like once we get rid of flake entirely?

@huonw
Copy link
Copy Markdown
Contributor Author

huonw commented Jun 11, 2024

I did some basic experiments running

  • flake8 on main: pants --no-local-cache lint --stats-log --only=flake8 ::
  • ruff on this PR: pants --no-local-cache lint --stats-log --only=ruff-check ::
timing flake8 ruff
building requirements/PEX (A) 0.8 0.9
local_process_total_time_run_ms (B) 27.757 8.643
time spent actually running the linter (B - A) ~27 7.7

Running ruff outside of pants runs much faster (e.g. ruff check . takes like 300ms, so I'd expect a lot of the 7.7 seconds is Pants overhead, e.g. setting up sandboxes or unzipping pexes or whatever. #18570 is potentially related.

(Potentially there's some processes like discovering Python interpreters that are being included in the local_process_total_time_run_ms counter, but looking at the [INFO] Completed: Lint with `ruff check` ... lines shows 7 invocations that finish about a second after the [INFO] Completed: Building ruff.pex ... line, so ~7s seems plausible)

@sureshjoshi
Copy link
Copy Markdown
Member

sureshjoshi commented Jun 11, 2024

That's still a decent improvement, but clearly could be a lot more. There is something to also be said about creating sandboxes and whatnot for lint/check-based operations that won't mutate the environment.

Maybe something like the option to run those in the new workspace environments 🤷🏽

That's a can of worms all on its own, but overall, to answer the original question - it seems like swapping over is a win all around! And will likely become more of a win as we optimize some more steps.

@huonw huonw marked this pull request as ready for review June 20, 2024 04:39
Comment thread pyproject.toml
# flake8-2020
"YTT",
# flake8-implicit-str-concat, but only on a single line (includes bytes)
"ISC001",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we also need to add ISC002 to match NIC002 ? If it includes bytes too, I only found one instance of that (in a test), so it might be easier to fix it there

@lilatomic
Copy link
Copy Markdown
Contributor

That's still a decent improvement, but clearly could be a lot more. There is something to also be said about creating sandboxes and whatnot for lint/check-based operations that won't mutate the environment.

One thing to remember is that some linters are just formatter we run and check if they've modified any files

@sureshjoshi
Copy link
Copy Markdown
Member

One thing to remember is that some linters are just formatter we run and check if they've modified any files

Yeah, case-by-case - take the optimizations where we can, for those formatters that don't come baked with a "check" option, we can default to comparing sandbox code. As much as I hate splitting up code paths, if there is any non-trivial perf improvement, I generally think build tools should tend towards those options.

@huonw
Copy link
Copy Markdown
Contributor Author

huonw commented Jun 24, 2024

I think we can get rid of a lot of the sandbox overhead for fast tools (i.e. where the overhead of building sandboxes is a high proportion of the total time) by creating fewer sandboxes/putting more files in each one. See analysis/ideas in #18570 (comment)

In particular, running things in sandboxes has correctness advantages (e.g. handling of edits that happen concurrently with a running processes #21051 (comment)), that'd be nice to avoid losing by default, I think.

@sureshjoshi
Copy link
Copy Markdown
Member

From your linked comment, what is the cost of sandboxing X files on a modern computer? In my mind, if the cost to sandbox ever exceeds the cost of running the process, we'd probably want an escape hatch to run it in a workspace or skip the sandbox.

With a single format, it's not a huge deal, but a large monorepo with 20 tools across 4 languages - it really adds up.

Your equation from that linked comment I think is the right way, so long as the minimum number of sandboxes could be 0, not 1 (by default, 0 sandboxes would need to be opt-in though, as it's riskier to your concurrent edit points above).

Running fast tools without a sandbox in CI, for example, might be nice. And I can basically guarantee that I would always opt into whatever is fastest :)

@huonw
Copy link
Copy Markdown
Contributor Author

huonw commented Jun 24, 2024

Can I suggest we move the discussion of tweaking sandboxing behaviour to #18570 or somewhere else, doesn't feel relevant to this PR?

@sureshjoshi
Copy link
Copy Markdown
Member

Oh yeah, sure. I'm +1 for this change

@sureshjoshi
Copy link
Copy Markdown
Member

FYI: #22900 replaced pyupgrade with ruff-check - so, it's in the repo now, we're free to add away.

Also, I don't think we need to be overly concerned with 1:1 matching what the existing rules/bugs are. If there are improvements to be had, we should add them.

The custom rules are also ripe for re-evaluation.

  • PNT20: "Prefer softwrap over dedent in non-test code." - but we have like 200 "noqa" blocks for it. Just wondering if there is a different approach here.
  • PNT30: "Disallow 'await' in a loop." - this is likely still a good one.

An alternative is to not make Flake8 plugins, but just write a custom, small linter that surfaces Pants-only issues. Similar to how we have custom migrations and whatnot.

@sureshjoshi
Copy link
Copy Markdown
Member

Thanks for the PR Huon! I took your work added to it in #23330

@sureshjoshi sureshjoshi closed this May 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

category:internal CI, fixes for not-yet-released features, etc.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants