Optimize unsigned LEB128 decoding more#875
Merged
Merged
Conversation
Override the default Reader::read_u8() implementation for EndianSlice with a direct split_first() call. The default implementation goes through a rather long call chain (read_u8 -> read_u8_array::<[u8;1]> -> read_slice(&mut [u8;1]) -> inherent read_slice(1) + copy_from_slice), which some compiler/flag combinations fail to inline. When that happens, each byte read in the LEB128 hot loop compiles to an indirect function call (with duplicated bounds check etc.). On compilers that already inlined properly, this change should have no negative effect, but on those that didn't the improvement can be quite significant: leb128 unsigned small: -72% leb128 unsigned large: -77% leb128 u16 small: -66% parse .debug_info expressions: -44% evaluate .debug_info expressions: -10% Signed-off-by: Daniel Müller <deso@posteo.net>
In read::unsigned(), the shift == 63 overflow check ran on every iteration of the loop despite only being relevant on the 10th (and final) byte. Move it out by capping the loop with shift >= 63 and handling the last byte separately after the loop exits. This has two effects: First, it removes a comparison and conditional branch from each iteration of the hot loop. Second -- and more impactful -- it gives the compiler a known upper bound on the iteration count, which can enable LLVM to fully unroll the loop. I checked results on two systems, both showing positive outcomes: System 1: > leb128 unsigned small time: [459.40 ns 460.70 ns 462.23 ns] > change: [−4.9535% −4.4800% −4.0365%] (p = 0.00 < 0.05) > Performance has improved. > leb128 unsigned large time: [104.40 ns 104.57 ns 104.82 ns] > change: [−15.018% −14.476% −13.628%] (p = 0.00 < 0.05) > Performance has improved. > leb128 u16 small time: [461.00 ns 462.73 ns 464.39 ns] > change: [−22.716% −22.316% −21.947%] (p = 0.00 < 0.05) > Performance has improved. > parse .debug_info expressions > time: [63.729 µs 63.913 µs 64.141 µs] > change: [−0.8179% +0.5747% +1.9911%] (p = 0.43 > 0.05) > No change in performance detected. > evaluate .debug_info expressions > time: [517.71 µs 519.14 µs 520.82 µs] > change: [−1.5918% −0.8249% +0.0344%] (p = 0.04 < 0.05) System 2: > leb128 unsigned small time: [896.75 ns 902.94 ns 911.08 ns] > change: [−9.8227% −8.8646% −7.9089%] (p = 0.00 < 0.05) > Performance has improved. > leb128 unsigned large time: [164.77 ns 166.96 ns 170.68 ns] > change: [−44.354% −43.307% −42.114%] (p = 0.00 < 0.05) > Performance has improved. > leb128 u16 small time: [890.62 ns 898.04 ns 907.43 ns] > change: [−9.3899% −8.1392% −6.8999%] (p = 0.00 < 0.05) > Performance has improved. > parse .debug_info expressions > time: [128.24 µs 129.25 µs 130.30 µs] > change: [−13.582% −10.412% −7.0530%] (p = 0.00 < 0.05) > Performance has improved. > evaluate .debug_info expressions > time: [849.52 µs 854.60 µs 860.50 µs] > change: [−3.3028% +0.4156% +3.8783%] (p = 0.82 > 0.05) > No change in performance detected. Signed-off-by: Daniel Müller <deso@posteo.net>
philipc
approved these changes
Apr 15, 2026
Collaborator
philipc
left a comment
There was a problem hiding this comment.
Thanks! Yeah those traits rely a lot on the optimiser, there might more improvements to be found.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Optimize LEB128 decoding some more (after #795). Improvements are more significant as per my testing. Please refer to individual commits.