Skip to content

[4/n][guardian-integration] wid-keyed idempotency cache#463

Closed
0xsiddharthks wants to merge 1 commit intosiddharth/guardian-integrationfrom
siddharth/guardian-wid-cache
Closed

[4/n][guardian-integration] wid-keyed idempotency cache#463
0xsiddharthks wants to merge 1 commit intosiddharth/guardian-integrationfrom
siddharth/guardian-wid-cache

Conversation

@0xsiddharthks
Copy link
Copy Markdown
Contributor

@0xsiddharthks 0xsiddharthks commented Apr 17, 2026

Stacked on #423#466#449.

Summary

Retries of the same withdrawal (leader restart, leader rotation, lost response, or a seq-mismatch retry on the hashi side) previously debited the bucket once per retry — even though wid is deterministic and the actual on-chain withdrawal only happens once.

This PR adds an LRU cache of signed responses keyed by wid on EnclaveState (cap 1024). standard_withdrawal consults the cache before touching committee-sig verification, the limiter, or BTC signing; a hit returns the previously signed response untouched. Entries are inserted only after a successful S3 log commit, so the cache and bucket always agree.

Why wid

wid is already computed as u64::from_le_bytes(blake2b256(bcs(request_ids))[..8]) deterministically on the hashi side. The guardian re-derives the same value from the on-chain request set, so a retry from any leader at any epoch collides to the same wid.

What this unlocks

Tests

  • test_standard_withdrawal_wid_cache_is_idempotent — same wid twice returns an identical signed response; bucket reflects exactly one debit.
  • test_standard_withdrawal_failures_not_cached — failed withdrawals are NOT cached; the next attempt runs the live path.

Follow-ups (not in this PR)

  • S3-replay rehydration of the cache on guardian restart (PR-5 in the stack).

Retries of the same withdrawal (leader restart, leader rotation, lost
response, seq-mismatch retry on the client) previously debited the
bucket once per retry — even though wid is deterministic and the
"real" withdrawal only happens once on-chain.

Add an LRU cache of signed responses keyed by wid on EnclaveState
(cap 1024). standard_withdrawal consults the cache before touching
the committee sig, limiter, or BTC key; a hit returns the previously
signed response untouched. Entries are inserted only after a
successful S3 log commit, so the cache and bucket always agree.

Tests cover:
- same wid twice returns the same response and debits the bucket once
- a failed withdrawal is NOT cached; the next attempt runs the live
  path again
@0xsiddharthks 0xsiddharthks force-pushed the siddharth/guardian-wid-cache branch from 6d3c9ff to 229cd1e Compare April 23, 2026 10:15
@0xsiddharthks 0xsiddharthks changed the title [3/n][guardian-integration] wid-keyed idempotency cache [4/n][guardian-integration] wid-keyed idempotency cache Apr 23, 2026
@0xsiddharthks
Copy link
Copy Markdown
Contributor Author

Deferred alongside #465. The wid-keyed response cache is a real correctness improvement — without it, cross-tick retries of the same withdrawal double-debit the bucket. That is a rate-limiter accuracy concern, not a safety one: BTC withdrawal remains exactly-once (enforced by Sui Move .withdrawal_txns state + MPC signing independent of the guardian).

For the MVP signet/devnet landing we are shipping #423 with both soft + hard guardian touchpoints and accepting the retry double-debit as a known limitation. This PR will be re-applied before we care about strict bucket accuracy (pre-scale / pre-mainnet).

See .claude/plans/golden-finding-castle.md for the full investigation + decision rationale. Branch siddharth/guardian-wid-cache preserved locally for the re-application.

0xsiddharthks added a commit that referenced this pull request Apr 23, 2026
Replaces the soft-reserve round-trip at Step 2 with a local
`capacity_at(ts)` check, and moves the hard reserve to post-MPC via
`validate_consume` → guardian `StandardWithdrawal` → verify Ed25519
response → `apply_consume`. Any rejection (seq mismatch, rate-limited,
unavailable) snaps local state to the guardian and bails so the next
leader tick retries cleanly. Serializes hard reserves to concurrency=1
when the guardian is configured so timestamps arrive monotonic; the
baseline (no guardian) keeps the configured cap.

- Step 2 in `process_approved_withdrawal_request_batch` skips the
  iteration when `capacity_at(checkpoint_timestamp_ms/1000)` is below
  the aggregate external-out amount. No round-trip.
- Step 3 runs `finalize_withdrawal_through_guardian` after MPC: picks
  `seq` from `LocalLimiter::validate_consume`, fans out
  `SignGuardianWithdrawalRequest` BLS signatures to the committee
  (each validator re-fetches the txn from chain and reconstructs the
  same `StandardWithdrawalRequest` deterministically), forwards the
  signed request to the guardian, verifies the response envelope,
  then `apply_consume`.
- New BridgeService RPC `SignGuardianWithdrawalRequest` +
  `build_guardian_withdrawal_request` / `compute_withdrawal_wid`
  helpers in `withdrawals.rs`.
- Guardian side: `RateLimiter::consume` now takes a `wid` (unused
  for now, prepping the idempotency cache in #463).
- E2E test `test_bitcoin_withdrawal_with_guardian_e2e_flow` asserts
  `local_limiter().snapshot() == guardian.state.limiter_state()`
  and `next_seq == 1` after a successful withdrawal.

Follow-ups (known gaps):
- Wid-keyed idempotency cache on `consume` (#463): transient RPC
  failures currently double-debit on retry.
- Guardian restart safety / S3 rehydrate (#465).
- Step 2/Step 3 timestamp unification via a Move-side change.
0xsiddharthks added a commit that referenced this pull request Apr 23, 2026
Replaces the soft-reserve round-trip at Step 2 with a local
`capacity_at(ts)` check, and moves the hard reserve to post-MPC via
`validate_consume` → guardian `StandardWithdrawal` → verify Ed25519
response → `apply_consume`. Any rejection (seq mismatch, rate-limited,
unavailable) snaps local state to the guardian and bails so the next
leader tick retries cleanly. Serializes hard reserves to concurrency=1
when the guardian is configured so timestamps arrive monotonic; the
baseline (no guardian) keeps the configured cap.

- Step 2 in `process_approved_withdrawal_request_batch` skips the
  iteration when `capacity_at(checkpoint_timestamp_ms/1000)` is below
  the aggregate external-out amount. No round-trip.
- Step 3 runs `finalize_withdrawal_through_guardian` after MPC: picks
  `seq` from `LocalLimiter::validate_consume`, fans out
  `SignGuardianWithdrawalRequest` BLS signatures to the committee
  (each validator re-fetches the txn from chain and reconstructs the
  same `StandardWithdrawalRequest` deterministically), forwards the
  signed request to the guardian, verifies the response envelope,
  then `apply_consume`.
- New BridgeService RPC `SignGuardianWithdrawalRequest` +
  `build_guardian_withdrawal_request` / `compute_withdrawal_wid`
  helpers in `withdrawals.rs`.
- Guardian side: `RateLimiter::consume` now takes a `wid` (unused
  for now, prepping the idempotency cache in #463).
- E2E test `test_bitcoin_withdrawal_with_guardian_e2e_flow` asserts
  `local_limiter().snapshot() == guardian.state.limiter_state()`
  and `next_seq == 1` after a successful withdrawal.

Follow-ups (known gaps):
- Wid-keyed idempotency cache on `consume` (#463): transient RPC
  failures currently double-debit on retry.
- Guardian restart safety / S3 rehydrate (#465).
- Step 2/Step 3 timestamp unification via a Move-side change.
0xsiddharthks added a commit that referenced this pull request Apr 26, 2026
Replaces the soft-reserve round-trip at Step 2 with a local
`capacity_at(ts)` check, and moves the hard reserve to post-MPC via
`validate_consume` → guardian `StandardWithdrawal` → verify Ed25519
response → `apply_consume`. Any rejection (seq mismatch, rate-limited,
unavailable) snaps local state to the guardian and bails so the next
leader tick retries cleanly. Serializes hard reserves to concurrency=1
when the guardian is configured so timestamps arrive monotonic; the
baseline (no guardian) keeps the configured cap.

- Step 2 in `process_approved_withdrawal_request_batch` skips the
  iteration when `capacity_at(checkpoint_timestamp_ms/1000)` is below
  the aggregate external-out amount. No round-trip.
- Step 3 runs `finalize_withdrawal_through_guardian` after MPC: picks
  `seq` from `LocalLimiter::validate_consume`, fans out
  `SignGuardianWithdrawalRequest` BLS signatures to the committee
  (each validator re-fetches the txn from chain and reconstructs the
  same `StandardWithdrawalRequest` deterministically), forwards the
  signed request to the guardian, verifies the response envelope,
  then `apply_consume`.
- New BridgeService RPC `SignGuardianWithdrawalRequest` +
  `build_guardian_withdrawal_request` / `compute_withdrawal_wid`
  helpers in `withdrawals.rs`.
- Guardian side: `RateLimiter::consume` now takes a `wid` (unused
  for now, prepping the idempotency cache in #463).
- E2E test `test_bitcoin_withdrawal_with_guardian_e2e_flow` asserts
  `local_limiter().snapshot() == guardian.state.limiter_state()`
  and `next_seq == 1` after a successful withdrawal.

Follow-ups (known gaps):
- Wid-keyed idempotency cache on `consume` (#463): transient RPC
  failures currently double-debit on retry.
- Guardian restart safety / S3 rehydrate (#465).
- Step 2/Step 3 timestamp unification via a Move-side change.
0xsiddharthks added a commit that referenced this pull request Apr 26, 2026
Replaces the soft-reserve round-trip at Step 2 with a local
`capacity_at(ts)` check, and moves the hard reserve to post-MPC via
`validate_consume` → guardian `StandardWithdrawal` → verify Ed25519
response → `apply_consume`. Any rejection (seq mismatch, rate-limited,
unavailable) snaps local state to the guardian and bails so the next
leader tick retries cleanly. Serializes hard reserves to concurrency=1
when the guardian is configured so timestamps arrive monotonic; the
baseline (no guardian) keeps the configured cap.

- Step 2 in `process_approved_withdrawal_request_batch` skips the
  iteration when `capacity_at(checkpoint_timestamp_ms/1000)` is below
  the aggregate external-out amount. No round-trip.
- Step 3 runs `finalize_withdrawal_through_guardian` after MPC: picks
  `seq` from `LocalLimiter::validate_consume`, fans out
  `SignGuardianWithdrawalRequest` BLS signatures to the committee
  (each validator re-fetches the txn from chain and reconstructs the
  same `StandardWithdrawalRequest` deterministically), forwards the
  signed request to the guardian, verifies the response envelope,
  then `apply_consume`.
- New BridgeService RPC `SignGuardianWithdrawalRequest` +
  `build_guardian_withdrawal_request` / `compute_withdrawal_wid`
  helpers in `withdrawals.rs`.
- Guardian side: `RateLimiter::consume` now takes a `wid` (unused
  for now, prepping the idempotency cache in #463).
- E2E test `test_bitcoin_withdrawal_with_guardian_e2e_flow` asserts
  `local_limiter().snapshot() == guardian.state.limiter_state()`
  and `next_seq == 1` after a successful withdrawal.

Follow-ups (known gaps):
- Wid-keyed idempotency cache on `consume` (#463): transient RPC
  failures currently double-debit on retry.
- Guardian restart safety / S3 rehydrate (#465).
- Step 2/Step 3 timestamp unification via a Move-side change.
0xsiddharthks added a commit that referenced this pull request Apr 28, 2026
Replaces the soft-reserve round-trip at Step 2 with a local
`capacity_at(ts)` check, and moves the hard reserve to post-MPC via
`validate_consume` → guardian `StandardWithdrawal` → verify Ed25519
response → `apply_consume`. Any rejection (seq mismatch, rate-limited,
unavailable) snaps local state to the guardian and bails so the next
leader tick retries cleanly. Serializes hard reserves to concurrency=1
when the guardian is configured so timestamps arrive monotonic; the
baseline (no guardian) keeps the configured cap.

- Step 2 in `process_approved_withdrawal_request_batch` skips the
  iteration when `capacity_at(checkpoint_timestamp_ms/1000)` is below
  the aggregate external-out amount. No round-trip.
- Step 3 runs `finalize_withdrawal_through_guardian` after MPC: picks
  `seq` from `LocalLimiter::validate_consume`, fans out
  `SignGuardianWithdrawalRequest` BLS signatures to the committee
  (each validator re-fetches the txn from chain and reconstructs the
  same `StandardWithdrawalRequest` deterministically), forwards the
  signed request to the guardian, verifies the response envelope,
  then `apply_consume`.
- New BridgeService RPC `SignGuardianWithdrawalRequest` +
  `build_guardian_withdrawal_request` / `compute_withdrawal_wid`
  helpers in `withdrawals.rs`.
- Guardian side: `RateLimiter::consume` now takes a `wid` (unused
  for now, prepping the idempotency cache in #463).
- E2E test `test_bitcoin_withdrawal_with_guardian_e2e_flow` asserts
  `local_limiter().snapshot() == guardian.state.limiter_state()`
  and `next_seq == 1` after a successful withdrawal.

Follow-ups (known gaps):
- Wid-keyed idempotency cache on `consume` (#463): transient RPC
  failures currently double-debit on retry.
- Guardian restart safety / S3 rehydrate (#465).
- Step 2/Step 3 timestamp unification via a Move-side change.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant