[4/n][guardian-integration] wid-keyed idempotency cache#463
[4/n][guardian-integration] wid-keyed idempotency cache#4630xsiddharthks wants to merge 1 commit intosiddharth/guardian-integrationfrom
Conversation
a4e0082 to
2be020f
Compare
Retries of the same withdrawal (leader restart, leader rotation, lost response, seq-mismatch retry on the client) previously debited the bucket once per retry — even though wid is deterministic and the "real" withdrawal only happens once on-chain. Add an LRU cache of signed responses keyed by wid on EnclaveState (cap 1024). standard_withdrawal consults the cache before touching the committee sig, limiter, or BTC key; a hit returns the previously signed response untouched. Entries are inserted only after a successful S3 log commit, so the cache and bucket always agree. Tests cover: - same wid twice returns the same response and debits the bucket once - a failed withdrawal is NOT cached; the next attempt runs the live path again
6d3c9ff to
229cd1e
Compare
|
Deferred alongside #465. The wid-keyed response cache is a real correctness improvement — without it, cross-tick retries of the same withdrawal double-debit the bucket. That is a rate-limiter accuracy concern, not a safety one: BTC withdrawal remains exactly-once (enforced by Sui Move For the MVP signet/devnet landing we are shipping #423 with both soft + hard guardian touchpoints and accepting the retry double-debit as a known limitation. This PR will be re-applied before we care about strict bucket accuracy (pre-scale / pre-mainnet). See |
Replaces the soft-reserve round-trip at Step 2 with a local `capacity_at(ts)` check, and moves the hard reserve to post-MPC via `validate_consume` → guardian `StandardWithdrawal` → verify Ed25519 response → `apply_consume`. Any rejection (seq mismatch, rate-limited, unavailable) snaps local state to the guardian and bails so the next leader tick retries cleanly. Serializes hard reserves to concurrency=1 when the guardian is configured so timestamps arrive monotonic; the baseline (no guardian) keeps the configured cap. - Step 2 in `process_approved_withdrawal_request_batch` skips the iteration when `capacity_at(checkpoint_timestamp_ms/1000)` is below the aggregate external-out amount. No round-trip. - Step 3 runs `finalize_withdrawal_through_guardian` after MPC: picks `seq` from `LocalLimiter::validate_consume`, fans out `SignGuardianWithdrawalRequest` BLS signatures to the committee (each validator re-fetches the txn from chain and reconstructs the same `StandardWithdrawalRequest` deterministically), forwards the signed request to the guardian, verifies the response envelope, then `apply_consume`. - New BridgeService RPC `SignGuardianWithdrawalRequest` + `build_guardian_withdrawal_request` / `compute_withdrawal_wid` helpers in `withdrawals.rs`. - Guardian side: `RateLimiter::consume` now takes a `wid` (unused for now, prepping the idempotency cache in #463). - E2E test `test_bitcoin_withdrawal_with_guardian_e2e_flow` asserts `local_limiter().snapshot() == guardian.state.limiter_state()` and `next_seq == 1` after a successful withdrawal. Follow-ups (known gaps): - Wid-keyed idempotency cache on `consume` (#463): transient RPC failures currently double-debit on retry. - Guardian restart safety / S3 rehydrate (#465). - Step 2/Step 3 timestamp unification via a Move-side change.
Replaces the soft-reserve round-trip at Step 2 with a local `capacity_at(ts)` check, and moves the hard reserve to post-MPC via `validate_consume` → guardian `StandardWithdrawal` → verify Ed25519 response → `apply_consume`. Any rejection (seq mismatch, rate-limited, unavailable) snaps local state to the guardian and bails so the next leader tick retries cleanly. Serializes hard reserves to concurrency=1 when the guardian is configured so timestamps arrive monotonic; the baseline (no guardian) keeps the configured cap. - Step 2 in `process_approved_withdrawal_request_batch` skips the iteration when `capacity_at(checkpoint_timestamp_ms/1000)` is below the aggregate external-out amount. No round-trip. - Step 3 runs `finalize_withdrawal_through_guardian` after MPC: picks `seq` from `LocalLimiter::validate_consume`, fans out `SignGuardianWithdrawalRequest` BLS signatures to the committee (each validator re-fetches the txn from chain and reconstructs the same `StandardWithdrawalRequest` deterministically), forwards the signed request to the guardian, verifies the response envelope, then `apply_consume`. - New BridgeService RPC `SignGuardianWithdrawalRequest` + `build_guardian_withdrawal_request` / `compute_withdrawal_wid` helpers in `withdrawals.rs`. - Guardian side: `RateLimiter::consume` now takes a `wid` (unused for now, prepping the idempotency cache in #463). - E2E test `test_bitcoin_withdrawal_with_guardian_e2e_flow` asserts `local_limiter().snapshot() == guardian.state.limiter_state()` and `next_seq == 1` after a successful withdrawal. Follow-ups (known gaps): - Wid-keyed idempotency cache on `consume` (#463): transient RPC failures currently double-debit on retry. - Guardian restart safety / S3 rehydrate (#465). - Step 2/Step 3 timestamp unification via a Move-side change.
Replaces the soft-reserve round-trip at Step 2 with a local `capacity_at(ts)` check, and moves the hard reserve to post-MPC via `validate_consume` → guardian `StandardWithdrawal` → verify Ed25519 response → `apply_consume`. Any rejection (seq mismatch, rate-limited, unavailable) snaps local state to the guardian and bails so the next leader tick retries cleanly. Serializes hard reserves to concurrency=1 when the guardian is configured so timestamps arrive monotonic; the baseline (no guardian) keeps the configured cap. - Step 2 in `process_approved_withdrawal_request_batch` skips the iteration when `capacity_at(checkpoint_timestamp_ms/1000)` is below the aggregate external-out amount. No round-trip. - Step 3 runs `finalize_withdrawal_through_guardian` after MPC: picks `seq` from `LocalLimiter::validate_consume`, fans out `SignGuardianWithdrawalRequest` BLS signatures to the committee (each validator re-fetches the txn from chain and reconstructs the same `StandardWithdrawalRequest` deterministically), forwards the signed request to the guardian, verifies the response envelope, then `apply_consume`. - New BridgeService RPC `SignGuardianWithdrawalRequest` + `build_guardian_withdrawal_request` / `compute_withdrawal_wid` helpers in `withdrawals.rs`. - Guardian side: `RateLimiter::consume` now takes a `wid` (unused for now, prepping the idempotency cache in #463). - E2E test `test_bitcoin_withdrawal_with_guardian_e2e_flow` asserts `local_limiter().snapshot() == guardian.state.limiter_state()` and `next_seq == 1` after a successful withdrawal. Follow-ups (known gaps): - Wid-keyed idempotency cache on `consume` (#463): transient RPC failures currently double-debit on retry. - Guardian restart safety / S3 rehydrate (#465). - Step 2/Step 3 timestamp unification via a Move-side change.
Replaces the soft-reserve round-trip at Step 2 with a local `capacity_at(ts)` check, and moves the hard reserve to post-MPC via `validate_consume` → guardian `StandardWithdrawal` → verify Ed25519 response → `apply_consume`. Any rejection (seq mismatch, rate-limited, unavailable) snaps local state to the guardian and bails so the next leader tick retries cleanly. Serializes hard reserves to concurrency=1 when the guardian is configured so timestamps arrive monotonic; the baseline (no guardian) keeps the configured cap. - Step 2 in `process_approved_withdrawal_request_batch` skips the iteration when `capacity_at(checkpoint_timestamp_ms/1000)` is below the aggregate external-out amount. No round-trip. - Step 3 runs `finalize_withdrawal_through_guardian` after MPC: picks `seq` from `LocalLimiter::validate_consume`, fans out `SignGuardianWithdrawalRequest` BLS signatures to the committee (each validator re-fetches the txn from chain and reconstructs the same `StandardWithdrawalRequest` deterministically), forwards the signed request to the guardian, verifies the response envelope, then `apply_consume`. - New BridgeService RPC `SignGuardianWithdrawalRequest` + `build_guardian_withdrawal_request` / `compute_withdrawal_wid` helpers in `withdrawals.rs`. - Guardian side: `RateLimiter::consume` now takes a `wid` (unused for now, prepping the idempotency cache in #463). - E2E test `test_bitcoin_withdrawal_with_guardian_e2e_flow` asserts `local_limiter().snapshot() == guardian.state.limiter_state()` and `next_seq == 1` after a successful withdrawal. Follow-ups (known gaps): - Wid-keyed idempotency cache on `consume` (#463): transient RPC failures currently double-debit on retry. - Guardian restart safety / S3 rehydrate (#465). - Step 2/Step 3 timestamp unification via a Move-side change.
Replaces the soft-reserve round-trip at Step 2 with a local `capacity_at(ts)` check, and moves the hard reserve to post-MPC via `validate_consume` → guardian `StandardWithdrawal` → verify Ed25519 response → `apply_consume`. Any rejection (seq mismatch, rate-limited, unavailable) snaps local state to the guardian and bails so the next leader tick retries cleanly. Serializes hard reserves to concurrency=1 when the guardian is configured so timestamps arrive monotonic; the baseline (no guardian) keeps the configured cap. - Step 2 in `process_approved_withdrawal_request_batch` skips the iteration when `capacity_at(checkpoint_timestamp_ms/1000)` is below the aggregate external-out amount. No round-trip. - Step 3 runs `finalize_withdrawal_through_guardian` after MPC: picks `seq` from `LocalLimiter::validate_consume`, fans out `SignGuardianWithdrawalRequest` BLS signatures to the committee (each validator re-fetches the txn from chain and reconstructs the same `StandardWithdrawalRequest` deterministically), forwards the signed request to the guardian, verifies the response envelope, then `apply_consume`. - New BridgeService RPC `SignGuardianWithdrawalRequest` + `build_guardian_withdrawal_request` / `compute_withdrawal_wid` helpers in `withdrawals.rs`. - Guardian side: `RateLimiter::consume` now takes a `wid` (unused for now, prepping the idempotency cache in #463). - E2E test `test_bitcoin_withdrawal_with_guardian_e2e_flow` asserts `local_limiter().snapshot() == guardian.state.limiter_state()` and `next_seq == 1` after a successful withdrawal. Follow-ups (known gaps): - Wid-keyed idempotency cache on `consume` (#463): transient RPC failures currently double-debit on retry. - Guardian restart safety / S3 rehydrate (#465). - Step 2/Step 3 timestamp unification via a Move-side change.
Summary
Retries of the same withdrawal (leader restart, leader rotation, lost response, or a seq-mismatch retry on the hashi side) previously debited the bucket once per retry — even though
widis deterministic and the actual on-chain withdrawal only happens once.This PR adds an LRU cache of signed responses keyed by
widonEnclaveState(cap 1024).standard_withdrawalconsults the cache before touching committee-sig verification, the limiter, or BTC signing; a hit returns the previously signed response untouched. Entries are inserted only after a successful S3 log commit, so the cache and bucket always agree.Why wid
widis already computed asu64::from_le_bytes(blake2b256(bcs(request_ids))[..8])deterministically on the hashi side. The guardian re-derives the same value from the on-chain request set, so a retry from any leader at any epoch collides to the same wid.What this unlocks
SoftReserveWithdrawal) can rely on wid idempotency to dedupe soft-reservation attempts across nodes.Tests
test_standard_withdrawal_wid_cache_is_idempotent— same wid twice returns an identical signed response; bucket reflects exactly one debit.test_standard_withdrawal_failures_not_cached— failed withdrawals are NOT cached; the next attempt runs the live path.Follow-ups (not in this PR)