CNDB-15669: Fully off-heap memtable by blambov · Pull Request #2308 · datastax/cassandra

blambov · 2026-04-07T11:50:34Z

What is the issue

https://github.com/riptano/cndb/issues/15669
https://github.com/riptano/cndb/issues/10302

What does this PR fix and why was it fixed

Implementation of the fully off-heap, tombstone-aware memtable.

The first commit is CNDB-10302 as reviewed in #2005, adding tombstone support. The second refactors some of the access interfaces to combine the cursor position into a single long for efficiency and extra flexibility, which the third commit uses to lift some restrictions in the kinds of ranges that the tries could support. The fifth commit extends the memtable trie all the way to individual cells, and the sixth makes it possible to store data in trie cells. When used with offheap_objects allocation type, this memtable is fully off-heap, with ~100KiB of on-heap presence irrespective of data size.

Each commit should compile and pass tests, and comes with documentation in the included markdown files.

github-actions · 2026-04-07T11:50:52Z

Implements a row-level trie memtable that uses deletion-aware tries to store deletions separately from live data, together with the associated TrieBackedPartition and TriePartitionUpdate. Refactors trie hierarchy to support multiple trie types: - plain - range, which stores range boundaries and is able to answer questions about the range that applies to every point in the trie - deletion aware, which combines a data part and a deletion range trie Every trie type supports suitable operations, including merging and intersection that make sense for the type of trie. In particular, deletion-aware tries apply range branches to delete data during merges. Adds a new method to UnfilteredRowIterator that is implemented by the new trie-backed partitions to ask them to stop issuing tombstones. This is done on filtering (i.e. conversion from UnfilteredRowIterator to RowIterator) where tombstones have already done their job and are no longer needed. Adds JMH tests of tombstones that demonstrate tombstone-independent performance on memtable queries.

in a combined `encodedState` returned by advancing methods. This saves megamorphic calls to `incomingTransition` and can be augmented by further information at no cost.

This functionality has two main applications: - it allows reverse walks that present prefix content in the correct byte-comparable order (i.e. prefixes after children) - it makes it possible to have full control over what is and isn't included in a trie ranges (e.g. making it possible to have a branch set and nested ranges)

…and TrieMemtable to Stage3 version Remove duplicate configuration object and add tests for stage 3

This change extends the coverage of the memtable trie to the cell level, defining mappings of trie branches to and from the legacy concepts of complex columns and rows.

This makes it possible to have completely off-heap trie memtable, where cell data is stored inside the trie structure if it is small enough to fit, or placed in natively-allocated memory and referenced by memory address.

sonarqubecloud · 2026-04-09T12:32:18Z

Quality Gate passed

Issues
75 New issues
0 Accepted issues

Measures
0 Security Hotspots
82.4% Coverage on New Code
4.5% Duplication on New Code

See analysis details on SonarQube Cloud

cassci-bot · 2026-04-09T12:38:16Z

❌ Build ds-cassandra-pr-gate/PR-2308 rejected by Butler

653 regressions found
See build details here

Found 653 new test failures

Showing only first 15 new test failures

Test	Explanation	Runs	Upstream
junit.framework.TestSuite.org.apache.cassandra.distributed.test.sai.datamodels.QueryRowDeletionsTest-_jdk11	REGRESSION	🔵🔴	0 / 30
junit.framework.TestSuite.org.apache.cassandra.distributed.test.sai.datamodels.QueryTimeToLiveTest-_jdk11	REGRESSION	🔵🔴	0 / 30
junit.framework.TestSuite.org.apache.cassandra.distributed.test.sai.datamodels.QueryWriteLifecycleTest-_jdk11	REGRESSION	🔵🔴	0 / 30
o.a.c.cql3.validation.entities.SecondaryIndexOnMapEntriesTest.testShouldRecognizeAlteredOrDeletedMapEntries (compression)	REGRESSION	🔵🔴	0 / 30
o.a.c.cql3.validation.entities.SecondaryIndexOnStaticColumnTest.testIndexOnCollections (compression)	REGRESSION	🔴🔴	0 / 30
o.a.c.cql3.validation.entities.SecondaryIndexTest.testDeletions (compression)	REGRESSION	🔴🔴	0 / 30
o.a.c.cql3.validation.entities.SecondaryIndexTest.testUpdatesToMemtableData (compression)	REGRESSION	🔵🔴	0 / 30
o.a.c.cql3.validation.entities.StaticColumnsTest.testStaticColumns (compression)	REGRESSION	🔴🔴	0 / 30
o.a.c.cql3.validation.entities.UFJavaTest.testJavaSimpleCollections (compression)	REGRESSION	🔴🔴	0 / 30
o.a.c.cql3.validation.entities.UFJavaTest.testJavaTupleTypeCollection (compression)	REGRESSION	🔴🔴	0 / 30
o.a.c.cql3.validation.entities.UFJavaTest.testJavaUTCollections (compression)	REGRESSION	🔴🔴	0 / 30
o.a.c.cql3.validation.entities.UFJavaTest.testJavaUserType (compression)	REGRESSION	🔴🔴	0 / 30
o.a.c.cql3.validation.entities.UFJavaTest.testJavaUserTypeWithUse (compression)	REGRESSION	🔴🔴	0 / 30
o.a.c.cql3.validation.entities.UFTypesTest.testComplexNullValues (compression)	REGRESSION	🔴🔴	0 / 30
o.a.c.cql3.validation.miscellaneous.TombstonesTest.initializationError (compression)	NEW	🔴🔴	0 / 30

Found 22 known test failures

blambov force-pushed the CNDB-15669 branch from fbca7ea to 1ffd142 Compare April 7, 2026 12:09

lesnik2u self-requested a review April 8, 2026 13:47

blambov and others added 10 commits April 9, 2026 10:28

Change trie interfaces to combine depth and incoming character

34043db

in a combined `encodedState` returned by advancing methods. This saves megamorphic calls to `incomingTransition` and can be augmented by further information at no cost.

Test fixes

2e5dd3e

Copy TrieBackedPartition, TriePartitionUpdate, TriePartritionUpdater …

40c0a77

…and TrieMemtable to Stage3 version Remove duplicate configuration object and add tests for stage 3

Implements cell-level trie

d362f65

This change extends the coverage of the memtable trie to the cell level, defining mappings of trie branches to and from the legacy concepts of complex columns and rows.

Test fixes

3652603

Permit in-memory tries to store bytes in the trie structure

6078584

This makes it possible to have completely off-heap trie memtable, where cell data is stored inside the trie structure if it is small enough to fit, or placed in natively-allocated memory and referenced by memory address.

Implement InMemoryRangeCursor.getNearestContent directly

ff3149b

Provide merged rows to indexer

0f133d7

blambov force-pushed the CNDB-15669 branch from bdb5b8c to 0f133d7 Compare April 9, 2026 11:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CNDB-15669: Fully off-heap memtable#2308

CNDB-15669: Fully off-heap memtable#2308
blambov wants to merge 10 commits intomain-5.0from
CNDB-15669

blambov commented Apr 7, 2026

Uh oh!

github-actions bot commented Apr 7, 2026

Uh oh!

sonarqubecloud bot commented Apr 9, 2026

Uh oh!

cassci-bot commented Apr 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

blambov commented Apr 7, 2026

What is the issue

What does this PR fix and why was it fixed

Uh oh!

github-actions bot commented Apr 7, 2026

Checklist before you submit for review

Uh oh!

sonarqubecloud bot commented Apr 9, 2026

Quality Gate passed

Uh oh!

cassci-bot commented Apr 9, 2026

❌ Build ds-cassandra-pr-gate/PR-2308 rejected by Butler

Found 653 new test failures

Found 22 known test failures

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants