Skip to content

feat: scheduled repair configuration + deterministic test fixes (v2)#1490

Open
0rlych1kk4 wants to merge 17 commits intoEricsson:masterfrom
0rlych1kk4:feature/scheduled-repair-v2
Open

feat: scheduled repair configuration + deterministic test fixes (v2)#1490
0rlych1kk4 wants to merge 17 commits intoEricsson:masterfrom
0rlych1kk4:feature/scheduled-repair-v2

Conversation

@0rlych1kk4
Copy link
Copy Markdown
Contributor

Summary

This PR introduces improvements to scheduled repair configuration handling and stabilizes related test behavior.

Changes

  • Refactored schedule configuration handling:

    • Always mounts schedule.yaml
    • Applies overrides only when explicitly provided
    • Preserves upstream/default behavior when configuration is empty
  • Fixed non-deterministic test behavior:

    • Hardened Awaitility usage (poll intervals + timeouts)
    • Removed race conditions in TestScheduleManager
    • Ensured deterministic job execution and validation
  • Cleaned up test framework interactions:

    • Avoided global scheduler side effects across tests
    • Improved isolation of configuration per test instance

Motivation

While working on DatacenterAware multi-agent scenarios, test instability and configuration side effects were observed:

  • Race conditions in scheduling tests
  • Non-deterministic timing behavior
  • Global configuration leaking between tests

This PR addresses those issues to provide a stable foundation for:

  • Scheduled repair scenarios
  • Multi-agent concurrency tests

Validation

  • Full test suite executed locally:
    • mvn -pl core -am test
    • All tests passing (including Testcontainers / Cassandra integration tests)

Notes

  • This PR focuses on stability and correctness
  • Follow-up work will introduce scheduled repair concurrency scenarios for multi-agent tests

@0rlych1kk4 0rlych1kk4 requested a review from a team as a code owner April 19, 2026 07:38
@0rlych1kk4
Copy link
Copy Markdown
Contributor Author

0rlych1kk4 commented Apr 19, 2026

Hi @VictorCavichioli

Summary of CI Failures Investigation

It looks like the failing checks are related to timing sensitivity and environment differences in CI (multi-node Cassandra + parallel scheduling), rather than functional regressions.

Planned fixes:

  • Increase Awaitility timeouts and polling intervals to better handle CI latency
  • Ensure scheduler instances are fully isolated per test (no shared/static state)
  • Review clock usage to avoid reliance on system time where possible
  • Validate Testcontainers isolation across test runs
  • Strengthen assertions using eventual consistency patterns (await().untilAsserted)

Locally, tests are stable, but I’ll push updates to improve determinism under CI conditions.

Let me know if there are known CI constraints or preferred patterns for timing-sensitive tests.

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Apr 19, 2026

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 79.74%. Comparing base (9f4bd4e) to head (049df93).
⚠️ Report is 651 commits behind head on master.
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@             Coverage Diff              @@
##             master    #1490      +/-   ##
============================================
+ Coverage     77.45%   79.74%   +2.28%     
- Complexity     1308     1728     +420     
============================================
  Files           135      164      +29     
  Lines          5566     6565     +999     
  Branches        579      679     +100     
============================================
+ Hits           4311     5235     +924     
- Misses         1062     1087      +25     
- Partials        193      243      +50     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@0rlych1kk4 0rlych1kk4 force-pushed the feature/scheduled-repair-v2 branch from 049df93 to 015e639 Compare April 20, 2026 13:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants