Skip to content

enable optimizer in tests#5155

Merged
alexpyattaev merged 1 commit into
anza-xyz:masterfrom
alexpyattaev:enable_optimizer_in_tests
Mar 13, 2026
Merged

enable optimizer in tests#5155
alexpyattaev merged 1 commit into
anza-xyz:masterfrom
alexpyattaev:enable_optimizer_in_tests

Conversation

@alexpyattaev
Copy link
Copy Markdown

@alexpyattaev alexpyattaev commented Mar 5, 2025

Problem

  • Unittests are very slow: optimization is not enabled for them at all, so we wait hours for CI to complete
  • Unittests do not catch optimizer-induced UB (as optimizer is not enabled) so much of the unsound unsafe code gets through unittests

Summary of Changes

  • Enable optimizer for unittests

@alexpyattaev
Copy link
Copy Markdown
Author

@willhickey @yihau please take a look

Comment thread accounts-db/src/append_vec.rs Outdated
Comment thread accounts-db/src/append_vec.rs Outdated
Comment thread Cargo.toml Outdated
Comment thread accounts-db/src/append_vec.rs Outdated
@alexpyattaev alexpyattaev force-pushed the enable_optimizer_in_tests branch 2 times, most recently from 0e9cab7 to cd25b7a Compare March 10, 2025 14:58
Comment thread accounts-db/src/append_vec.rs Outdated
Comment thread accounts-db/src/append_vec.rs Outdated
@alexpyattaev alexpyattaev force-pushed the enable_optimizer_in_tests branch from cd25b7a to fb1a433 Compare March 10, 2025 16:41
@alexpyattaev alexpyattaev marked this pull request as ready for review March 10, 2025 18:48
@alexpyattaev
Copy link
Copy Markdown
Author

alexpyattaev commented Mar 10, 2025

Disabled optimizations for some crates until a more permanent solution for them can be found to tests there. Now that #5212 has landed accountsdb is good to go.

@alexpyattaev alexpyattaev force-pushed the enable_optimizer_in_tests branch from fb1a433 to f218da0 Compare March 10, 2025 20:05
@alexpyattaev alexpyattaev added the automerge automerge Merge this Pull Request automatically once CI passes label Mar 10, 2025
Copy link
Copy Markdown
Member

@yihau yihau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. however, I think we have some unsolved discussions 🤔 @ilya-bobyr do you still think we need to use a custom profile? or should we give this one a try. if things get worse, we can always revert.

Comment thread gossip/src/cluster_info.rs Outdated
Copy link
Copy Markdown

@steviez steviez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, do you have some numbers you can share for overall CI ? No worries if not, mostly curious and faster is better (as long as we aren't compromising on flakiness)

Comment thread Cargo.toml Outdated
Comment thread gossip/src/cluster_info.rs Outdated
@alexpyattaev
Copy link
Copy Markdown
Author

To clarify about the speed improvements (this is for runtime, not compile time) you may refer to some timings I've collected below.

TL;DR - CPU bound tests become 4x faster, IO bound tests become 25% faster.

HOWEVER the actual time to complete all tests will not change much as it is dominated by the slowest test in any given set.

In the end this change will not cut CI time by 4x, but it will cut CPU usage in CI quite substantially, allowing us to run a larger volume of small tests in parallel.

As noted above, compile time difference for full rebuild of agave is 15 seconds (10% more), for incremental build of one unittest the change is not perceptable.

time RUST_LOG="error" cargo nextest run -p solana-accounts-db
________________________optimized ________________________________
Executed in   30.38 secs    fish           external
   usr time   57.29 secs    0.00 micros   57.29 secs
   sys time   84.66 secs  895.00 micros   84.66 secs

______________________baseline__________________________________
Executed in   33.08 secs    fish           external
   usr time  255.77 secs    0.00 micros  255.77 secs
   sys time   89.46 secs  964.00 micros   89.46 secs

time RUST_LOG="error" cargo nextest run -p solana-core
_______________________optimized_________________________________
Executed in  411.47 secs    fish           external
   usr time   16.74 mins    0.00 micros   16.74 mins
   sys time    4.00 mins  906.00 micros    4.00 mins

_______________________baseline_________________________________
Executed in  411.63 secs    fish           external
   usr time   43.01 mins    0.00 micros   43.01 mins
   sys time    3.66 mins  896.00 micros    3.66 mins


time RUST_LOG="error" cargo nextest run -p solana-tpu-client-next
________________________optimized________________________________
Executed in    4.58 secs    fish           external
   usr time    1.09 secs  392.00 micros    1.09 secs
   sys time    0.71 secs  237.00 micros    0.71 secs

_____________________baseline___________________________________
Executed in    4.60 secs    fish           external
   usr time    1.25 secs  380.00 micros    1.25 secs
   sys time    0.82 secs  234.00 micros    0.82 secs

@anza-team
Copy link
Copy Markdown
Collaborator

😱 New commits were pushed while the automerge label was present.

@anza-team anza-team removed the automerge automerge Merge this Pull Request automatically once CI passes label Mar 11, 2025
@illia-bobyr
Copy link
Copy Markdown

To clarify about the speed improvements (this is for runtime, not compile time) you may refer to some timings I've collected below.

TL;DR - CPU bound tests become 4x faster, IO bound tests become 25% faster.

HOWEVER the actual time to complete all tests will not change much as it is dominated by the slowest test in any given set.

In the end this change will not cut CI time by 4x, but it will cut CPU usage in CI quite substantially, allowing us to run a larger volume of small tests in parallel.

As noted above, compile time difference for full rebuild of agave is 15 seconds (10% more), for incremental build of one unittest the change is not perceptable.

time RUST_LOG="error" cargo nextest run -p solana-accounts-db
________________________optimized ________________________________
Executed in   30.38 secs    fish           external
   usr time   57.29 secs    0.00 micros   57.29 secs
   sys time   84.66 secs  895.00 micros   84.66 secs

______________________baseline__________________________________
Executed in   33.08 secs    fish           external
   usr time  255.77 secs    0.00 micros  255.77 secs
   sys time   89.46 secs  964.00 micros   89.46 secs

time RUST_LOG="error" cargo nextest run -p solana-core
_______________________optimized_________________________________
Executed in  411.47 secs    fish           external
   usr time   16.74 mins    0.00 micros   16.74 mins
   sys time    4.00 mins  906.00 micros    4.00 mins

_______________________baseline_________________________________
Executed in  411.63 secs    fish           external
   usr time   43.01 mins    0.00 micros   43.01 mins
   sys time    3.66 mins  896.00 micros    3.66 mins


time RUST_LOG="error" cargo nextest run -p solana-tpu-client-next
________________________optimized________________________________
Executed in    4.58 secs    fish           external
   usr time    1.09 secs  392.00 micros    1.09 secs
   sys time    0.71 secs  237.00 micros    0.71 secs

_____________________baseline___________________________________
Executed in    4.60 secs    fish           external
   usr time    1.25 secs  380.00 micros    1.25 secs
   sys time    0.82 secs  234.00 micros    0.82 secs

Thank you for checking the incremental rebuild times.

I do not think our CI is running more than one job per build machine.
This is related to race conditions and time sensitivity in some tests.
This is my knowledge as of some time ago, when we actually tried to pack more jobs into our team dedicated test cluster, and it didn't work.
Things might have changed since then.

I do support more efficient resource usage.
But just want to make sure this is still the right optimization.

The way I understand it right now is:

  • Wall clock for the CI execution will stay the same.
  • We are not using build agents for more than one job at a time, so the CPU savings will not be obtainable.
    (But, it would be nice to double-check this).
  • You also said that we may potentially degrade the interactive debugging experience.
    (Though, I do not know the extent of this degradation at the lower optimization levels).

Do you think it is still the right change to make?

@alexpyattaev
Copy link
Copy Markdown
Author

I do not think our CI is running more than one job per build machine. This is related to race conditions and time sensitivity in some tests. This is my knowledge as of some time ago, when we actually tried to pack more jobs into our team dedicated test cluster, and it didn't work. Things might have changed since then.

If CI is running tests sequentially, they will go 4x faster. If CI runs tests in parallel, the wall clock improvements will be marginal. From what I have seen so far, CI is running most tests in parallel.

Do you think it is still the right change to make?

It is 100% worth it just for the sake of not allowing compiler-induced UB to pass the test suites. "you test what you ship" is a thing for a reason. Arguably, a seprate CI profile should probably be made with the same build settings as --release.

@illia-bobyr
Copy link
Copy Markdown

I do not think our CI is running more than one job per build machine. This is related to race conditions and time sensitivity in some tests. This is my knowledge as of some time ago, when we actually tried to pack more jobs into our team dedicated test cluster, and it didn't work. Things might have changed since then.

If CI is running tests sequentially, they will go 4x faster. If CI runs tests in parallel, the wall clock improvements will be marginal. From what I have seen so far, CI is running most tests in parallel.

Do you think it is still the right change to make?

It is 100% worth it just for the sake of not allowing compiler-induced UB to pass the test suites. "you test what you ship" is a thing for a reason. Arguably, a seprate CI profile should probably be made with the same build settings as --release.

I think having a separate CI profile is the best option.
Such that developers run local tests with low (or no) optimization, and CI runs the test faster and with a different optimization level.
I am still somewhat skeptical that any given optimization level is considerably more likely to catch UB than any other level.
But if we run the code at two different levels, we do have a chance to notice at least the deterministic things.

At the same time, adding a CI specific profile is probably more work than just changing the existing profile.
So I can understand the desire to merge a simpler change.
If you are going to explore an alternative profile approach - maybe we could try that instead of this PR.
But if you are not going to do it, maybe we could merge this change as is.

@alexpyattaev alexpyattaev force-pushed the enable_optimizer_in_tests branch from 09ac822 to 98b9da4 Compare April 14, 2025 07:58
@alexpyattaev alexpyattaev force-pushed the enable_optimizer_in_tests branch 2 times, most recently from 55884a8 to 439165a Compare April 24, 2025 14:23
@alexpyattaev alexpyattaev force-pushed the enable_optimizer_in_tests branch from 439165a to 45b08f9 Compare August 8, 2025 20:23
@roryharr roryharr removed their request for review October 30, 2025 23:08
@github-actions github-actions Bot added the stale label Feb 12, 2026
@alexpyattaev alexpyattaev force-pushed the enable_optimizer_in_tests branch from 7740d7b to 4d6a474 Compare February 12, 2026 13:04
@alexpyattaev alexpyattaev force-pushed the enable_optimizer_in_tests branch 2 times, most recently from a0c49fa to e7ae94e Compare February 25, 2026 06:09
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Feb 25, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 83.1%. Comparing base (a94c35e) to head (2fddaec).
⚠️ Report is 1 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #5155   +/-   ##
=======================================
  Coverage    83.1%    83.1%           
=======================================
  Files         837      837           
  Lines      316869   316869           
=======================================
+ Hits       263476   263507   +31     
+ Misses      53393    53362   -31     
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@alexpyattaev alexpyattaev force-pushed the enable_optimizer_in_tests branch from e7ae94e to c5e258f Compare February 25, 2026 14:17
@anza-xyz anza-xyz deleted a comment from github-actions Bot Feb 25, 2026
@alexpyattaev alexpyattaev requested a review from steviez February 25, 2026 15:51
@alexpyattaev alexpyattaev requested a review from yihau March 7, 2026 21:53
yihau
yihau previously approved these changes Mar 9, 2026
Copy link
Copy Markdown
Member

@yihau yihau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for reviving this improvement. Let's do it!

Comment thread Cargo.toml Outdated
Comment thread Cargo.toml Outdated
@alexpyattaev alexpyattaev force-pushed the enable_optimizer_in_tests branch from c5e258f to 2fddaec Compare March 9, 2026 22:08
@alexpyattaev alexpyattaev changed the title enable optimizer in tests and procmacros enable optimizer in tests Mar 9, 2026
@alexpyattaev alexpyattaev requested a review from steviez March 9, 2026 22:09
Copy link
Copy Markdown

@steviez steviez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume @yihau will be happy with the changes, but maybe we give him one more chance to look. Otherwise, I think we've addressed all known issues and this will make CI match "production" a bit more closely.

I wouldn't be surprised if some tests get more flaky as a result of this. But, I'm inclined to think that any issues like that would be from the test being inherently flaky

One minor nit - can you update the PR description to reflect the change in direction. Namely, the PR is now only impacting CI and no longer doing "Rebuilds are slow: optimizations for procmacros are not enabled either" for regular dev flow

@alexpyattaev
Copy link
Copy Markdown
Author

I wouldn't be surprised if some tests get more flaky as a result of this. But, I'm inclined to think that any issues like that would be from the test being inherently flaky

That is sort of the point. If optimizer adds UB we will have a chance to see it.

One minor nit - can you update the PR description to reflect the change in direction. Namely, the PR is now only impacting CI and no longer doing "Rebuilds are slow: optimizations for procmacros are not enabled either" for regular dev flow

Done!

Copy link
Copy Markdown
Member

@yihau yihau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔥

@alexpyattaev alexpyattaev added this pull request to the merge queue Mar 13, 2026
Merged via the queue into anza-xyz:master with commit a436ddd Mar 13, 2026
63 checks passed
@alexpyattaev alexpyattaev deleted the enable_optimizer_in_tests branch March 13, 2026 13:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants