perf: pre-initialized Blake2b cloning for hash hot paths#3089
Draft
wbbradley wants to merge 3 commits intowbbradley/resource-managementfrom
Draft
perf: pre-initialized Blake2b cloning for hash hot paths#3089wbbradley wants to merge 3 commits intowbbradley/resource-managementfrom
wbbradley wants to merge 3 commits intowbbradley/resource-managementfrom
Conversation
7d38f81 to
9d1f54f
Compare
4fb1826 to
6ee5104
Compare
9d1f54f to
f0f01c0
Compare
6ee5104 to
3fd08b0
Compare
f0dc785 to
b7f6cc2
Compare
3fd08b0 to
a6c5085
Compare
b7f6cc2 to
c7a8cc4
Compare
a6c5085 to
606c1c1
Compare
Parallelize the two most expensive phases of blob encoding: 1. Primary encoding + leaf hashing (48% of CPU time): each column's RS-encode and Blake2b hash computation now runs in parallel via rayon, with per-task encoders and a sequential scatter phase for writing results back. 2. Merkle tree construction (46% of CPU time): the 2*n_shards independent Merkle tree builds in compute_metadata_from_symbol_hashes() now use into_par_iter(). Both encode_with_metadata() and compute_metadata() share the same parallel pattern: materialize all secondary slivers, then par_iter over all n_shards columns for primary encoding + hashing. Also adds --threads flag to the profile_encoding example for controlling rayon's thread pool size during profiling runs.
c7a8cc4 to
b21a0fb
Compare
606c1c1 to
f9e63d8
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Every
leaf_hashandinner_hashcall creates a fresh Blake2b256 hasher throughnew_with_params, which runs IV setup and parameter validation. With ~3M hash calls per 32 MiB encode (1M leaf + ~2M inner), this accounts for 1.1% of self-time in profiling.This change bypasses
HashFunctionWrapperin hot paths by working withblake2::Blake2b<U32>directly. Two staticLazyLockhashers are pre-initialized with the leaf/inner prefix already fed in. Each hash call clones the pre-initialized state (~200 byte memcpy) instead of running full construction.Adds
leaf_hash_blake2b256andinner_hash_blake2b256as specialized fast-path functions, andMerkleTree::build_from_leaf_hashes_fastfor Merkle tree construction. The genericleaf_hash<T>andinner_hash<T>remain unchanged for non-hot-path usage (proofs, generic tree builds).Test plan
profile_encoding; results are within noise of the previous commit, consistent with the small (~1%) targeted overhead.Release notes