Skip to content

CNDB-16394: Add SAI tests with more data#2269

Open
pkolaczk wants to merge 4 commits intomainfrom
c16394-more-data-in-sai-tests
Open

CNDB-16394: Add SAI tests with more data#2269
pkolaczk wants to merge 4 commits intomainfrom
c16394-more-data-in-sai-tests

Conversation

@pkolaczk
Copy link
Copy Markdown

  • CNDB-17012: Improve TrieMemoryIndex row count estimation
  • CNDB-16394: Fix possible deadlock if flush fails
  • CNDB-16394: Add tests for SAI that use a larger amount of random data

Piotr Kołaczkowski added 3 commits March 10, 2026 15:26
Fixes a bug in TermsDistribution#toBigDecimal which didn't preserve
order for some types and broke the assertion
in TrieMemoryIndex#estimateNumRowsMatchingRange.

Additionally, replaces the row count estimation algorithm with a better
one:
- simpler
- more efficient (no need to search and iterate the trie)
- more accurate (especially for wider ranges)
`JVMStabilityInspector.inspectThrowable(t)` can throw an exception.
In that case the control flow never reaches
`postFlush.latch.countDown()` and threads waiting for flush to finish
never unlock and never get a chance to propagate the exception
up the call chain. This ends up in a bad lockup with no error logged
anywhere, system appearing to be performing a pending flush,
but no flush actually running. This scenario has been observed in tests
when the system hit the limit of open files.

This commit moves `latch.countDown()` to a finally block
so it can never be skipped.
Some issues can never be found when testing SAI on a few manually
inserted rows we had until now.

During adding this set of tests it was already found that:
- flushes may deadlock under specific failure scenarios
- SAI tests were not cleaning the data after themselves which caused
  running out of file descriptors
- some query optimizer code was computing incorrect estimates
  for some query bounds (CNDB-17012)
@github-actions
Copy link
Copy Markdown

Checklist before you submit for review

  • This PR adheres to the Definition of Done
  • Make sure there is a PR in the CNDB project updating the Converged Cassandra version
  • Use NoSpamLogger for log lines that may appear frequently in the logs
  • Verify test results on Butler
  • Test coverage for new/modified code is > 80%
  • Proper code formatting
  • Proper title for each commit staring with the project-issue number, like CNDB-1234
  • Each commit has a meaningful description
  • Each commit is not very long and contains related changes
  • Renames, moves and reformatting are in distinct commits
  • All new files should contain the DataStax copyright header instead of the Apache License one

@pkolaczk pkolaczk changed the title c16394 more data in sai tests CNDB-16394: Add SAI tests with more data Mar 12, 2026
@pkolaczk pkolaczk requested a review from adelapena March 12, 2026 14:42
@pkolaczk
Copy link
Copy Markdown
Author

The newly added tests passed after reducing row count to 1000: http://10.169.74.112:8081/job/ds-cassandra-build/2122/

@sonarqubecloud
Copy link
Copy Markdown

@cassci-bot
Copy link
Copy Markdown

❌ Build ds-cassandra-pr-gate/PR-2269 rejected by Butler


2 regressions found
See build details here


Found 2 new test failures

Test Explanation Runs Upstream
o.a.c.index.sai.cql.VectorCompaction100dTest.testOneToOneCompaction[ec false] REGRESSION 🔴 0 / 24
o.a.c.index.sai.cql.VectorSiftSmallTest.testRerankKZeroOrderMatchesFullPrecisionSimilarity[ec true false] REGRESSION 🔴 0 / 24

Found 7 known test failures

@k-rus
Copy link
Copy Markdown
Member

k-rus commented Apr 16, 2026

@pkolaczk Can you add proper PR description and also explain non-test changes? Can you also rebase and resolve the conflict on non-test file?

@k-rus k-rus requested a review from a team April 16, 2026 10:17
@k-rus
Copy link
Copy Markdown
Member

k-rus commented Apr 16, 2026

I think this PR needs to be split into separate PRs, e.g., CNDB-17012 and CNDB-16394 are not related to each other. I am not sure which of them gains from CNDB-16394.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants