Skip to content

CNDB-15669: Fully off-heap memtable#2308

Open
blambov wants to merge 10 commits intomain-5.0from
CNDB-15669
Open

CNDB-15669: Fully off-heap memtable#2308
blambov wants to merge 10 commits intomain-5.0from
CNDB-15669

Conversation

@blambov
Copy link
Copy Markdown

@blambov blambov commented Apr 7, 2026

What is the issue

https://github.com/riptano/cndb/issues/15669
https://github.com/riptano/cndb/issues/10302

What does this PR fix and why was it fixed

Implementation of the fully off-heap, tombstone-aware memtable.

The first commit is CNDB-10302 as reviewed in #2005, adding tombstone support. The second refactors some of the access interfaces to combine the cursor position into a single long for efficiency and extra flexibility, which the third commit uses to lift some restrictions in the kinds of ranges that the tries could support. The fifth commit extends the memtable trie all the way to individual cells, and the sixth makes it possible to store data in trie cells. When used with offheap_objects allocation type, this memtable is fully off-heap, with ~100KiB of on-heap presence irrespective of data size.

Each commit should compile and pass tests, and comes with documentation in the included markdown files.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 7, 2026

Checklist before you submit for review

  • This PR adheres to the Definition of Done
  • Make sure there is a PR in the CNDB project updating the Converged Cassandra version
  • Use NoSpamLogger for log lines that may appear frequently in the logs
  • Verify test results on Butler
  • Test coverage for new/modified code is > 80%
  • Proper code formatting
  • Proper title for each commit staring with the project-issue number, like CNDB-1234
  • Each commit has a meaningful description
  • Each commit is not very long and contains related changes
  • Renames, moves and reformatting are in distinct commits
  • All new files should contain the DataStax copyright header instead of the Apache License one

@lesnik2u lesnik2u self-requested a review April 8, 2026 13:47
blambov and others added 10 commits April 9, 2026 10:28
Implements a row-level trie memtable that uses deletion-aware
tries to store deletions separately from live data, together
with the associated TrieBackedPartition and TriePartitionUpdate.

Refactors trie hierarchy to support multiple trie types:
- plain
- range, which stores range boundaries and is able to answer
  questions about the range that applies to every point in the
  trie
- deletion aware, which combines a data part and a deletion range
  trie

Every trie type supports suitable operations, including merging
and intersection that make sense for the type of trie. In particular,
deletion-aware tries apply range branches to delete data during
merges.

Adds a new method to UnfilteredRowIterator that is implemented
by the new trie-backed partitions to ask them to stop issuing
tombstones. This is done on filtering (i.e. conversion from
UnfilteredRowIterator to RowIterator) where tombstones have already
done their job and are no longer needed.

Adds JMH tests of tombstones that demonstrate tombstone-independent
performance on memtable queries.
in a combined `encodedState` returned by advancing methods.
This saves megamorphic calls to `incomingTransition` and can
be augmented by further information at no cost.
This functionality has two main applications:
- it allows reverse walks that present prefix content in the correct
  byte-comparable order (i.e. prefixes after children)
- it makes it possible to have full control over what is and isn't
  included in a trie ranges (e.g. making it possible to have a branch
  set and nested ranges)
…and TrieMemtable to Stage3 version

Remove duplicate configuration object and add tests for stage 3
This change extends the coverage of the memtable trie to the
cell level, defining mappings of trie branches to and from the
legacy concepts of complex columns and rows.
This makes it possible to have completely off-heap trie memtable,
where cell data is stored inside the trie structure if it is small
enough to fit, or placed in natively-allocated memory and referenced
by memory address.
@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud bot commented Apr 9, 2026

@cassci-bot
Copy link
Copy Markdown

❌ Build ds-cassandra-pr-gate/PR-2308 rejected by Butler


653 regressions found
See build details here


Found 653 new test failures

Showing only first 15 new test failures

Test Explanation Runs Upstream
junit.framework.TestSuite.org.apache.cassandra.distributed.test.sai.datamodels.QueryRowDeletionsTest-_jdk11 REGRESSION 🔵🔴 0 / 30
junit.framework.TestSuite.org.apache.cassandra.distributed.test.sai.datamodels.QueryTimeToLiveTest-_jdk11 REGRESSION 🔵🔴 0 / 30
junit.framework.TestSuite.org.apache.cassandra.distributed.test.sai.datamodels.QueryWriteLifecycleTest-_jdk11 REGRESSION 🔵🔴 0 / 30
o.a.c.cql3.validation.entities.SecondaryIndexOnMapEntriesTest.testShouldRecognizeAlteredOrDeletedMapEntries (compression) REGRESSION 🔵🔴 0 / 30
o.a.c.cql3.validation.entities.SecondaryIndexOnStaticColumnTest.testIndexOnCollections (compression) REGRESSION 🔴🔴 0 / 30
o.a.c.cql3.validation.entities.SecondaryIndexTest.testDeletions (compression) REGRESSION 🔴🔴 0 / 30
o.a.c.cql3.validation.entities.SecondaryIndexTest.testUpdatesToMemtableData (compression) REGRESSION 🔵🔴 0 / 30
o.a.c.cql3.validation.entities.StaticColumnsTest.testStaticColumns (compression) REGRESSION 🔴🔴 0 / 30
o.a.c.cql3.validation.entities.UFJavaTest.testJavaSimpleCollections (compression) REGRESSION 🔴🔴 0 / 30
o.a.c.cql3.validation.entities.UFJavaTest.testJavaTupleTypeCollection (compression) REGRESSION 🔴🔴 0 / 30
o.a.c.cql3.validation.entities.UFJavaTest.testJavaUTCollections (compression) REGRESSION 🔴🔴 0 / 30
o.a.c.cql3.validation.entities.UFJavaTest.testJavaUserType (compression) REGRESSION 🔴🔴 0 / 30
o.a.c.cql3.validation.entities.UFJavaTest.testJavaUserTypeWithUse (compression) REGRESSION 🔴🔴 0 / 30
o.a.c.cql3.validation.entities.UFTypesTest.testComplexNullValues (compression) REGRESSION 🔴🔴 0 / 30
o.a.c.cql3.validation.miscellaneous.TombstonesTest.initializationError (compression) NEW 🔴🔴 0 / 30

Found 22 known test failures

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants