Skip to content

Optimize unsigned LEB128 decoding more#875

Merged
philipc merged 2 commits into
gimli-rs:mainfrom
d-e-s-o:topic/optimize-leb128
Apr 15, 2026
Merged

Optimize unsigned LEB128 decoding more#875
philipc merged 2 commits into
gimli-rs:mainfrom
d-e-s-o:topic/optimize-leb128

Conversation

@d-e-s-o
Copy link
Copy Markdown
Contributor

@d-e-s-o d-e-s-o commented Apr 14, 2026

Optimize LEB128 decoding some more (after #795). Improvements are more significant as per my testing. Please refer to individual commits.

d-e-s-o added 2 commits April 14, 2026 11:47
Override the default Reader::read_u8() implementation for EndianSlice
with a direct split_first() call. The default implementation goes
through a rather long call chain (read_u8 -> read_u8_array::<[u8;1]> ->
read_slice(&mut [u8;1]) -> inherent read_slice(1) + copy_from_slice),
which some compiler/flag combinations fail to inline. When that happens,
each byte read in the LEB128 hot loop compiles to an indirect function
call (with duplicated bounds check etc.). On compilers that already
inlined properly, this change should have no negative effect, but on
those that didn't the improvement can be quite significant:

  leb128 unsigned small:             -72%
  leb128 unsigned large:             -77%
  leb128 u16 small:                  -66%
  parse .debug_info expressions:     -44%
  evaluate .debug_info expressions:  -10%

Signed-off-by: Daniel Müller <deso@posteo.net>
In read::unsigned(), the shift == 63 overflow check ran on every
iteration of the loop despite only being relevant on the 10th (and
final) byte. Move it out by capping the loop with shift >= 63 and
handling the last byte separately after the loop exits.

This has two effects: First, it removes a comparison and conditional
branch from each iteration of the hot loop. Second -- and more
impactful -- it gives the compiler a known upper bound on the iteration
count, which can enable LLVM to fully unroll the loop.

I checked results on two systems, both showing positive outcomes:

System 1:
> leb128 unsigned small   time:   [459.40 ns 460.70 ns 462.23 ns]
>                         change: [−4.9535% −4.4800% −4.0365%] (p = 0.00 < 0.05)
>                         Performance has improved.
> leb128 unsigned large   time:   [104.40 ns 104.57 ns 104.82 ns]
>                         change: [−15.018% −14.476% −13.628%] (p = 0.00 < 0.05)
>                         Performance has improved.
> leb128 u16 small        time:   [461.00 ns 462.73 ns 464.39 ns]
>                         change: [−22.716% −22.316% −21.947%] (p = 0.00 < 0.05)
>                         Performance has improved.
> parse .debug_info expressions
>                         time:   [63.729 µs 63.913 µs 64.141 µs]
>                         change: [−0.8179% +0.5747% +1.9911%] (p = 0.43 > 0.05)
>                         No change in performance detected.
> evaluate .debug_info expressions
>                         time:   [517.71 µs 519.14 µs 520.82 µs]
>                         change: [−1.5918% −0.8249% +0.0344%] (p = 0.04 < 0.05)

System 2:
> leb128 unsigned small   time:   [896.75 ns 902.94 ns 911.08 ns]
>                         change: [−9.8227% −8.8646% −7.9089%] (p = 0.00 < 0.05)
>                         Performance has improved.
> leb128 unsigned large   time:   [164.77 ns 166.96 ns 170.68 ns]
>                         change: [−44.354% −43.307% −42.114%] (p = 0.00 < 0.05)
>                         Performance has improved.
> leb128 u16 small        time:   [890.62 ns 898.04 ns 907.43 ns]
>                         change: [−9.3899% −8.1392% −6.8999%] (p = 0.00 < 0.05)
>                         Performance has improved.
> parse .debug_info expressions
>                         time:   [128.24 µs 129.25 µs 130.30 µs]
>                         change: [−13.582% −10.412% −7.0530%] (p = 0.00 < 0.05)
>                         Performance has improved.
> evaluate .debug_info expressions
>                         time:   [849.52 µs 854.60 µs 860.50 µs]
>                         change: [−3.3028% +0.4156% +3.8783%] (p = 0.82 > 0.05)
>                         No change in performance detected.

Signed-off-by: Daniel Müller <deso@posteo.net>
Copy link
Copy Markdown
Collaborator

@philipc philipc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Yeah those traits rely a lot on the optimiser, there might more improvements to be found.

@philipc philipc merged commit 3d74b4c into gimli-rs:main Apr 15, 2026
19 checks passed
@d-e-s-o d-e-s-o deleted the topic/optimize-leb128 branch April 15, 2026 00:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants