Track the dirty status of individual elements in AtomicSparseBufferVec.#24078
Open
pcwalton wants to merge 2 commits intobevyengine:mainfrom
Open
Track the dirty status of individual elements in AtomicSparseBufferVec.#24078pcwalton wants to merge 2 commits intobevyengine:mainfrom
AtomicSparseBufferVec.#24078pcwalton wants to merge 2 commits intobevyengine:mainfrom
Conversation
`AtomicSparseBufferVec`. Today, `AtomicSparseBufferVec` tracks the dirty status of individual *pages* of elements and performs a sparse upload when the number of modified pages is less than 15% of the total number of pages. (The default size of a page is 256 elements.) The reason why it doesn't track the dirty status of individual elements and instead only tracks pages is that it was assumed that frequently-changed elements would tend to cluster together, leading to low fragmentation. Unfortunately, though, this assumption has turned out to be false in practice. We extract meshes from the main world in parallel, so mesh instances end up scattered throughout in the `MeshInputUniform` buffer as the various extraction threads send the meshes they extract over a shared channel. Because of this, real-world workloads tend to dirty a disproportionately-large number of pages, even if they're only modifying a few mesh instances. The end result is that we rarely ever perform sparse updates unless no mesh instances have been updated at all, largely defeating the purpose of `AtomicSparseBufferVec`. This patch fixes the issue by tracking the dirty status of individual elements, not just of individual pages. For efficiency, we now use a two-level atomically-updated bit vector to track the dirty status of elements. The lower level, `dirty_bits`, is a simple flat list of bits, one for each element and grouped into 64-bit words, 0 for "not modified" and 1 for "modified". The higher level, `summary`, contains one bit for each 64-bit word in the lower level, which is 0 if no elements in that word have been modified and 1 if at least one element in that word has been modified. (In other words, each bit in `summary` represents the logical *or* of every bit in the corresponding word in `dirty_bits`.) When searching for modified elements to upload sparsely, we use bit manipulation instructions on the summary words to skip up to 64 words in `dirty_bits` (i.e. 64² = 4096 elements) at a time. Because the bit manipulation that this PR performs is tricky, it's factored out into separate functions that are individually tested via `proptest` randomized testing. This caught several bugs, some of which I believe to also be present in the existing code. Testing also verified that sparse buffer uploads are properly memory-bound as expected and that the dirty bit tracking has little overhead in practice. The motivation for this PR was the discovery that `bevy_city` wasn't performing sparse uploads. Unfortunately, even with this patch, `bevy_city` still doesn't perform sparse uploads, because the number of moving cars (approximately 18%) exceeds 15% of the total mesh instances, and so sparse uploads aren't useful. I believe that `bevy_city` should be changed to increase the ratio of static buildings to cars in order to represent a more realistic workload. Once that's done, this patch should be helpful to help `bevy_city` scale, especially once transforms receive their own buffer.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Today,
AtomicSparseBufferVectracks the dirty status of individual pages of elements and performs a sparse upload when the number of modified pages is less than 15% of the total number of pages. (The default size of a page is 256 elements.) The reason why it doesn't track the dirty status of individual elements and instead only tracks pages is that it was assumed that frequently-changed elements would tend to cluster together, leading to low fragmentation. Unfortunately, though, this assumption has turned out to be false in practice. We extract meshes from the main world in parallel, so mesh instances end up scattered throughout in theMeshInputUniformbuffer as the various extraction threads send the meshes they extract over a shared channel. Because of this, real-world workloads tend to dirty a disproportionately-large number of pages, even if they're only modifying a few mesh instances. The end result is that we rarely ever perform sparse updates unless no mesh instances have been updated at all, largely defeating the purpose ofAtomicSparseBufferVec.This patch fixes the issue by tracking the dirty status of individual elements, not just of individual pages. For efficiency, we now use a two-level atomically-updated bit vector to track the dirty status of elements. The lower level,
dirty_bits, is a simple flat list of bits, one for each element and grouped into 64-bit words, 0 for "not modified" and 1 for "modified". The higher level,summary, contains one bit for each 64-bit word in the lower level, which is 0 if no elements in that word have been modified and 1 if at least one element in that word has been modified. (In other words, each bit insummaryrepresents the logical or of every bit in the corresponding word indirty_bits.) When searching for modified elements to upload sparsely, we use bit manipulation instructions on the summary words to skip up to 64 words indirty_bits(i.e. 64² = 4096 elements) at a time.Because the bit manipulation that this PR performs is tricky, it's factored out into separate functions that are individually tested via
proptestrandomized testing. This caught several bugs, some of which I believe to also be present in the existing code. Testing also verified that sparse buffer uploads are properly memory-bound as expected and that the dirty bit tracking has little overhead in practice.The motivation for this PR was the discovery that
bevy_citywasn't performing sparse uploads. Unfortunately, even with this patch,bevy_citystill doesn't perform sparse uploads, because the number of moving cars (approximately 18%) exceeds 15% of the total mesh instances, and so sparse uploads aren't useful. I believe thatbevy_cityshould be changed to increase the ratio of static buildings to cars in order to represent a more realistic workload. Once that's done, this patch should be helpful to helpbevy_cityscale, especially once transforms receive their own buffer.