Skip to content

refactor: Use sonic for JSON un/marshaling#3588

Open
brbrr wants to merge 15 commits into
mainfrom
try/sonic-json
Open

refactor: Use sonic for JSON un/marshaling#3588
brbrr wants to merge 15 commits into
mainfrom
try/sonic-json

Conversation

@brbrr
Copy link
Copy Markdown
Contributor

@brbrr brbrr commented Apr 29, 2026

This PR replaces encoding/json with sonic for JSON un/marshaling.

Initially, I was planning to add it only for the RPCv10, but to reduce the complexity/conditioning, I decided to migrate all RPC versions.

( ! ) Note: It introduces breaking changes - the error messages (that user may encounter) are differs from what encoding/json were producing, also, they are different between ARM64 and AMD64 architectures.

Changes:

  • Introduce utils/jsonx as a centralized marshal/unmarshal/decoder wrapper around sonic.
  • Migrate many call sites from encoding/json to jsonx across VM, RPC versions, clients, compiler, and core hashing.
  • Rework LimitSlice decoding to use sonic’s lazy AST iteration.

TODO: Pretouch logging needs to be removed before merging the PR.

@brbrr brbrr self-assigned this Apr 29, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 29, 2026

Codecov Report

❌ Patch coverage is 77.33333% with 68 lines in your changes missing coverage. Please review.
✅ Project coverage is 76.00%. Comparing base (60ecba4) to head (553042a).

Files with missing lines Patch % Lines
jsonrpc/server.go 88.33% 11 Missing and 10 partials ⚠️
clients/feeder/feeder.go 7.69% 1 Missing and 11 partials ⚠️
rpc/rpccore/limit_slice.go 73.68% 2 Missing and 3 partials ⚠️
rpc/v10/block_id.go 0.00% 4 Missing ⚠️
rpc/v10/simulation.go 0.00% 4 Missing ⚠️
rpc/v10/transaction.go 50.00% 1 Missing and 2 partials ⚠️
rpc/v10/sync.go 0.00% 2 Missing ⚠️
rpc/v8/sync.go 0.00% 2 Missing ⚠️
rpc/v8/transaction.go 66.66% 0 Missing and 2 partials ⚠️
rpc/v9/sync.go 0.00% 2 Missing ⚠️
... and 9 more
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3588      +/-   ##
==========================================
+ Coverage   75.81%   76.00%   +0.18%     
==========================================
  Files         387      388       +1     
  Lines       34975    35000      +25     
==========================================
+ Hits        26518    26600      +82     
+ Misses       6595     6530      -65     
- Partials     1862     1870       +8     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@brbrr brbrr temporarily deployed to Development May 1, 2026 08:26 — with GitHub Actions Inactive
@brbrr brbrr marked this pull request as ready for review May 1, 2026 09:36
Copilot AI review requested due to automatic review settings May 1, 2026 09:36
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Refactors the codebase’s JSON handling (especially along the RPC path) to use a new utils/jsonx wrapper backed by bytedance/sonic, aiming to improve JSON encode/decode performance while pinning key behavioral semantics via tests.

Changes:

  • Introduce utils/jsonx as a centralized marshal/unmarshal/decoder wrapper around sonic.
  • Migrate many call sites from encoding/json to jsonx across VM, RPC versions, clients, compiler, and core hashing.
  • Rework rpccore.LimitSlice decoding to use sonic’s lazy AST iteration (plus add/adjust tests for sonic-specific behavior and error message differences).

Reviewed changes

Copilot reviewed 43 out of 44 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
vm/vm.go Switch state diff/trace/receipt JSON (un)marshal calls to jsonx.
vm/transaction.go Switch transaction envelope marshaling to jsonx.
vm/trace.go Switch custom MarshalJSON implementation to jsonx.
vm/class.go Switch class info marshaling to jsonx.
utils/jsonx/jsonx.go Add sonic-backed JSON helper package (Marshal, Unmarshal, Decoder, etc.).
utils/jsonx/jsonx_test.go Add tests pinning jsonx semantics (UseNumber, RawMessage, ordering, decoder surface, malformed input).
starknet/compiler/compiler.go Switch compiler request/response JSON handling to jsonx.
starknet/compiler/compile_ffi.go Switch FFI JSON handling to jsonx.
starknet/class.go Switch several JSON marshal/unmarshal paths to jsonx while keeping encoding/json types.
rpc/v9/transaction.go Switch gateway payload JSON handling to jsonx.
rpc/v9/trace.go Switch custom MarshalJSON to jsonx.
rpc/v9/sync.go Switch custom MarshalJSON to jsonx.
rpc/v9/subscriptions.go Use jsonx and replace map[string]any params with typed SubscriptionParams for stable field order.
rpc/v9/class.go Switch declared class decode to jsonx.
rpc/v9/block.go Switch BlockID unmarshal to jsonx.
rpc/v8/transaction.go Switch gateway payload JSON handling to jsonx.
rpc/v8/trace.go Switch custom MarshalJSON to jsonx.
rpc/v8/sync.go Switch custom MarshalJSON to jsonx.
rpc/v8/subscriptions.go Use jsonx and replace map[string]any params with typed SubscriptionParams.
rpc/v8/class.go Switch declared class decode to jsonx.
rpc/v8/block.go Switch BlockID unmarshal to jsonx.
rpc/v10/transaction_types.go Switch custom MarshalJSON to jsonx.
rpc/v10/transaction_test.go Add regression test to lock contract_class passthrough behavior with sonic.
rpc/v10/transaction.go Switch gateway payload JSON handling and class decode to jsonx.
rpc/v10/trace_invocation.go Switch custom MarshalJSON to jsonx.
rpc/v10/sync.go Switch custom MarshalJSON to jsonx.
rpc/v10/subscriptions.go Switch subscription response marshaling to jsonx and use typed params.
rpc/v10/subscription_types.go Introduce SubscriptionParams type (shared by v10 subscriptions).
rpc/v10/storage.go Switch response flag decode + response marshaling to jsonx.
rpc/v10/simulation.go Switch conditional response marshaling to jsonx.
rpc/v10/response_flags_test.go Update expected error substring for sonic-driven decode error.
rpc/v10/response_flags.go Switch flags/tags unmarshal to jsonx.
rpc/v10/events.go Switch AddressList unmarshal to jsonx.
rpc/v10/block_id.go Switch BlockID unmarshal to jsonx.
rpc/rpccore/limit_slice_test.go Update tests to use jsonx and add laziness/edge-case coverage for new AST-based decoder.
rpc/rpccore/limit_slice.go Reimplement LimitSlice.UnmarshalJSON using sonic AST lazy iteration to enforce limit early.
node/upgrader/upgrader.go Switch GitHub release JSON decode to jsonx decoder.
jsonrpc/server_test.go Adjust tests to tolerate sonic’s arch-dependent parse error text; add helper to ignore error.data contents where needed.
jsonrpc/server.go Switch decoding/encoding to jsonx, add sonic pretouch compilation, and log pretouch stats on registration.
core/class_hash.go Switch class definition marshaling to jsonx before hashing.
clients/gateway/gateway.go Switch request/response JSON handling to jsonx.
clients/feeder/feeder.go Switch feeder JSON decoding to jsonx decoder and class decode to jsonx.
go.mod Add github.com/bytedance/sonic (and related indirect deps).
go.sum Add checksums for sonic and its transitive dependencies.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread jsonrpc/server.go
Comment thread jsonrpc/server.go
Comment thread jsonrpc/server.go Outdated
Comment thread rpc/rpccore/limit_slice_test.go Outdated
@brbrr
Copy link
Copy Markdown
Contributor Author

brbrr commented May 6, 2026

here's the benchmarking data:

tl;dr: sonic uses less CPU (−12%) at the same throughput, holds more memory (+30%), and shows higher tail latency (p95 +9%). Median and p90 are flat to slightly better.

Metric sonic main delta
http_req_duration avg 63.1 ms 62.8 ms +0.5%
http_req_duration median 50.4 ms 51.3 ms −1.7%
http_req_duration p90 90.2 ms 92.4 ms −2.4%
http_req_duration p95 116.3 ms 106.6 ms +9.1%
http_req_duration max 2.11 s 720 ms +193%
iteration_duration avg 85.1 ms 84.5 ms +0.7%
iteration_duration p95 169.6 ms 159.8 ms +6.1%
%CPU mean 116% 133% −12.5%
%CPU p95 135% 149% −9.4%
RAM mean 910 MiB 696 MiB +30.7%
RAM drift / run +123 MiB +73 MiB +69%

Comment thread jsonrpc/server_test.go
Comment on lines +406 to +407
//nolint:lll // We can't break the json value
res: `[{"jsonrpc":"2.0","error":{"code":-32600,"message":"Invalid Request","data":"placeholder"},"id":null},{"jsonrpc":"2.0","error":{"code":-32600,"message":"Invalid Request","data":"placeholder"},"id":null}]`,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not?

Comment thread jsonrpc/server_test.go
if test.res != "" {
assert.JSONEq(t, test.res, string(res))
if test.ignoreErrorData {
assertResponseIgnoringErrorData(t, test.res, string(res))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function seems unnecessary, why aren't the JSONs populated with the right error data?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tl:dr; AMD and ARM builds produce different error outputs because they use different parsers under the hood, so we do exact quality checks in our tests, they will fail in CI on different archs.

The idea is captured in the function doc comment:
https://github.com/NethermindEth/juno/pull/3588/changes#diff-a2e8c524803b22cc1bb50a1992a011374f1a29e432d04b54c984cb5ec9c7396dR24-R30

Comment thread jsonrpc/server_test.go
Comment on lines +897 to +906
// TestPretouchProductionHandlers registers Juno's real RPC methods
// (v8, v9, v10) on jsonrpc.Server instances and dumps the resulting
// PretouchedTypes list. The list is the actual set of Go types that
// sonic compiled encode/decode paths for at server startup, with no
// placeholder/stub handlers.
//
// Network and storage interfaces (bcReader, syncReader, vm) are passed
// nil — those are only touched when handlers run, not at registration.
func TestPretouchProductionHandlers(t *testing.T) {
logger := log.NewNopZapLogger()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This tests looks out of place, it should be in the RPC package

}

type SubscriptionParams struct {
Result any `json:"result"`
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why Result can be Any?

Also define the sub type above the parent type

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's how it's used currently - we pass various types as a Result. Potentially, we could use a generic type here, but I believe it's out of scope for this PR. Will move the type above the parent.

@brbrr brbrr force-pushed the try/sonic-json branch from 2c6b4ff to c50f1db Compare May 8, 2026 11:15
@brbrr brbrr temporarily deployed to Development May 8, 2026 11:58 — with GitHub Actions Inactive
@brbrr
Copy link
Copy Markdown
Contributor Author

brbrr commented May 8, 2026

Some bench data before/after f1e08fe (local arm64 run)

Metric Δ
ns/op −11.02%
allocs/op −7.85%
B/op +9.60%

The bytes regression is ast.Node's fixed parse-tree overhead — paid once per request regardless of element count. Multi-param requests amortize it; single-param requests pay full price. Empirically tested two alternatives ([]sonic.NoCopyRawMessage and bigger bufio buffer) — both regressed. ast.Node turns out to be the cheap option.

sec/op per axis

Axis base (µ) after (µ) Δ
no_params 1.377 1.145 −16.85%
no_params_ctx 1.586 1.386 ~
one_scalar_pos 1.850 1.927 ~
one_struct_pos 2.376 2.012 −15.32%
three_pos_mixed 3.222 2.610 −18.99%
three_named_required 3.530 3.005 −14.87%
named_opt_missing 2.137 2.272 ~
named_opt_present 3.372 2.978 −11.68%
validator_pass 2.383 2.107 −11.58%
validator_fail 2.926 2.643 −9.67%
type_error 3.123 2.957 ~
batch_small (5) 15.04 10.66 −29.13%
batch_large (50) 105.40 95.00 −9.87%
response_100_structs 12.63 12.29 ~

allocs/op per axis

Axis base after Δ
one_scalar_pos 29 28 −3.45%
one_struct_pos 32 28 −12.50%
three_pos_mixed 46 35 −23.91%
three_named_required 46 36 −21.74%
named_opt_missing 30 29 −3.33%
named_opt_present 46 36 −21.74%
validator_pass 33 29 −12.12%
validator_fail 45 41 −8.89%
type_error 43 42 −2.33%

(no-params, batch, response axes unchanged at 18 / 117 / 888 / 18)

cpu.out.zip
Here's the cpu profile data for the new bench tests added from amd64 arch. (had to archive the above file to upload it)

to access it, use: go tool pprof -http=:8080 cpu.out

Preview for one of the tests:

Screenshot 2026-05-08 at 17 24 18

@brbrr brbrr temporarily deployed to Development May 8, 2026 13:07 — with GitHub Actions Inactive
@brbrr
Copy link
Copy Markdown
Contributor Author

brbrr commented May 8, 2026

Allocations

main:
Screenshot 2026-05-08 at 22 58 52

sonic:
Screenshot 2026-05-08 at 22 58 30

And here's the GC metrics data (as reported by node) before and after running the tests:

main     requests=96094  cpu=  gc=0.0113134s  alloc=1858906912 bytes
         per-req:  cpu=0  alloc=19344.7 bytes  gc-time=1.17733e-07
sonic    requests=96108  cpu=  gc=0.00684841s  alloc=1689997264 bytes
         per-req:  cpu=0  alloc=17584.4 bytes  gc-time=7.12574e-08

@brbrr
Copy link
Copy Markdown
Contributor Author

brbrr commented May 8, 2026

CPU data from AMD64 arch running end-to-end tests

tl;dr: Un/Marshaling took 11% vs 7% overal cpu time (main vs sonic) or 5.5s vs 3.4s CPU time, so ~70% decrease.

main:
Screenshot 2026-05-08 at 23 08 42

sonic:
Screenshot 2026-05-08 at 23 08 57

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants