Track the dirty status of individual elements in `AtomicSparseBufferVec`. by pcwalton · Pull Request #24078 · bevyengine/bevy

pcwalton · 2026-05-02T13:19:00Z

Today, AtomicSparseBufferVec tracks the dirty status of individual pages of elements and performs a sparse upload when the number of modified pages is less than 15% of the total number of pages. (The default size of a page is 256 elements.) The reason why it doesn't track the dirty status of individual elements and instead only tracks pages is that it was assumed that frequently-changed elements would tend to cluster together, leading to low fragmentation. Unfortunately, though, this assumption has turned out to be false in practice. We extract meshes from the main world in parallel, so mesh instances end up scattered throughout in the MeshInputUniform buffer as the various extraction threads send the meshes they extract over a shared channel. Because of this, real-world workloads tend to dirty a disproportionately-large number of pages, even if they're only modifying a few mesh instances. The end result is that we rarely ever perform sparse updates unless no mesh instances have been updated at all, largely defeating the purpose of AtomicSparseBufferVec.

This patch fixes the issue by tracking the dirty status of individual elements, not just of individual pages. For efficiency, we now use a two-level atomically-updated bit vector to track the dirty status of elements. The lower level, dirty_bits, is a simple flat list of bits, one for each element and grouped into 64-bit words, 0 for "not modified" and 1 for "modified". The higher level, summary, contains one bit for each 64-bit word in the lower level, which is 0 if no elements in that word have been modified and 1 if at least one element in that word has been modified. (In other words, each bit in summary represents the logical or of every bit in the corresponding word in dirty_bits.) When searching for modified elements to upload sparsely, we use bit manipulation instructions on the summary words to skip up to 64 words in dirty_bits (i.e. 64² = 4096 elements) at a time.

Because the bit manipulation that this PR performs is tricky, it's factored out into separate functions that are individually tested via proptest randomized testing. This caught several bugs, some of which I believe to also be present in the existing code. Testing also verified that sparse buffer uploads are properly memory-bound as expected and that the dirty bit tracking has little overhead in practice.

The motivation for this PR was the discovery that bevy_city wasn't performing sparse uploads. Unfortunately, even with this patch, bevy_city still doesn't perform sparse uploads, because the number of moving cars (approximately 18%) exceeds 15% of the total mesh instances, and so sparse uploads aren't useful. I believe that bevy_city should be changed to increase the ratio of static buildings to cars in order to represent a more realistic workload. Once that's done, this patch should be helpful to help bevy_city scale, especially once transforms receive their own buffer.

`AtomicSparseBufferVec`. Today, `AtomicSparseBufferVec` tracks the dirty status of individual *pages* of elements and performs a sparse upload when the number of modified pages is less than 15% of the total number of pages. (The default size of a page is 256 elements.) The reason why it doesn't track the dirty status of individual elements and instead only tracks pages is that it was assumed that frequently-changed elements would tend to cluster together, leading to low fragmentation. Unfortunately, though, this assumption has turned out to be false in practice. We extract meshes from the main world in parallel, so mesh instances end up scattered throughout in the `MeshInputUniform` buffer as the various extraction threads send the meshes they extract over a shared channel. Because of this, real-world workloads tend to dirty a disproportionately-large number of pages, even if they're only modifying a few mesh instances. The end result is that we rarely ever perform sparse updates unless no mesh instances have been updated at all, largely defeating the purpose of `AtomicSparseBufferVec`. This patch fixes the issue by tracking the dirty status of individual elements, not just of individual pages. For efficiency, we now use a two-level atomically-updated bit vector to track the dirty status of elements. The lower level, `dirty_bits`, is a simple flat list of bits, one for each element and grouped into 64-bit words, 0 for "not modified" and 1 for "modified". The higher level, `summary`, contains one bit for each 64-bit word in the lower level, which is 0 if no elements in that word have been modified and 1 if at least one element in that word has been modified. (In other words, each bit in `summary` represents the logical *or* of every bit in the corresponding word in `dirty_bits`.) When searching for modified elements to upload sparsely, we use bit manipulation instructions on the summary words to skip up to 64 words in `dirty_bits` (i.e. 64² = 4096 elements) at a time. Because the bit manipulation that this PR performs is tricky, it's factored out into separate functions that are individually tested via `proptest` randomized testing. This caught several bugs, some of which I believe to also be present in the existing code. Testing also verified that sparse buffer uploads are properly memory-bound as expected and that the dirty bit tracking has little overhead in practice. The motivation for this PR was the discovery that `bevy_city` wasn't performing sparse uploads. Unfortunately, even with this patch, `bevy_city` still doesn't perform sparse uploads, because the number of moving cars (approximately 18%) exceeds 15% of the total mesh instances, and so sparse uploads aren't useful. I believe that `bevy_city` should be changed to increase the ratio of static buildings to cars in order to represent a more realistic workload. Once that's done, this patch should be helpful to help `bevy_city` scale, especially once transforms receive their own buffer.

pcwalton requested review from IceSentry, atlv24 and tychedelia May 2, 2026 13:19

pcwalton added the A-Rendering Drawing game state to the screen label May 2, 2026

github-project-automation Bot added this to Rendering May 2, 2026

github-project-automation Bot moved this to Needs SME Triage in Rendering May 2, 2026

pcwalton added C-Performance A change motivated by improving speed, memory usage or compile times C-Bug An unexpected or incorrect behavior S-Needs-Review Needs reviewer attention (from anyone!) to move forward labels May 2, 2026

cart force-pushed the main branch from af894e5 to 017ffc5 Compare May 4, 2026 23:35

Merge branch 'main' into per-element-sparse-buffers

ddafe6c

cart closed this May 5, 2026

github-project-automation Bot moved this from Needs SME Triage to Done in Rendering May 5, 2026

cart reopened this May 5, 2026

github-project-automation Bot moved this from Done to Needs SME Triage in Rendering May 5, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Track the dirty status of individual elements in `AtomicSparseBufferVec`.#24078

Track the dirty status of individual elements in `AtomicSparseBufferVec`.#24078
pcwalton wants to merge 2 commits intobevyengine:mainfrom
pcwalton:per-element-sparse-buffers

pcwalton commented May 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

pcwalton commented May 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants