Skip to content

[5/n][guardian-integration] wire explicit call to guardian during withdrawal flow#423

Open
0xsiddharthks wants to merge 5 commits intosiddharth/guardian-mpc-signing-validationfrom
siddharth/guardian-integration
Open

[5/n][guardian-integration] wire explicit call to guardian during withdrawal flow#423
0xsiddharthks wants to merge 5 commits intosiddharth/guardian-mpc-signing-validationfrom
siddharth/guardian-integration

Conversation

@0xsiddharthks
Copy link
Copy Markdown
Contributor

@0xsiddharthks 0xsiddharthks commented Apr 9, 2026

Summary

  • During Withdrawal flow:

    • Transaction Creation / Coin Selection:
      • Leader Step 2 batches up to the local limiter's currently-available capacity
      • This short-circuits the batching window when demand exceeds capacity.
      • Single oversized requests stay queued (warn once).
    • After MPC quorum, the leader aggregates a BLS cert over a StandardWithdrawalRequest and forwards it to the guardian for the enclave signature.
    • MPC concurrency capped to 1 when guardian is configured (keeps timestamps monotonic).
  • No-guardian deployments behave identically to main.

@0xsiddharthks 0xsiddharthks requested a review from bmwill as a code owner April 9, 2026 16:55
@0xsiddharthks 0xsiddharthks marked this pull request as draft April 9, 2026 18:14
@0xsiddharthks 0xsiddharthks changed the title guardian: integrate rate limiting into withdrawal flow [2/n][guardian-integration] integrate rate limiting into withdrawal flow Apr 17, 2026
@0xsiddharthks 0xsiddharthks force-pushed the siddharth/guardian-integration branch from 89e582f to fb20cc6 Compare April 17, 2026 11:30
@0xsiddharthks 0xsiddharthks changed the base branch from main to siddharth/deploy-guardian April 17, 2026 11:31
0xsiddharthks added a commit that referenced this pull request Apr 17, 2026
The hard-reserve path added in #423/PR-2 only debits the bucket AFTER
MPC — which means a leader can spend minutes doing committee fan-out
+ MPC signing only to discover the guardian is out of capacity. Add a
pre-commit soft reserve so that over-limit withdrawals abort cheaply
and do not churn leader iterations.

Guardian side (`hashi-types` + `hashi-guardian`):
- Extend RateLimiter with pending_reserves: HashMap<wid, PendingReserve>,
  a soft_reserve method (idempotent on wid, monotonic timestamp,
  capacity = refill - sum(pending)), and expire_pending for TTL sweep
  (SOFT_RESERVE_TTL_SECS = 5 min).
- consume now takes wid as well and drops any matching pending entry so
  a hard reserve converts the soft reserve atomically.
- Add SoftReserveWithdrawal RPC and handler. Soft reserves do not
  require a committee signature — the TTL bounds DoS blast radius and
  wid idempotency handles retries.
- Spawn a 1-second TTL sweep task in main so expired reservations free
  capacity promptly.

Hashi side:
- Add GuardianClient::soft_reserve_withdrawal wrapper.
- In leader::process_approved_withdrawal_request_batch, probe the
  guardian immediately after coin selection (build_withdrawal_tx_
  commitment) and before the committee BLS fan-out for commit.
  Rate-limited / unavailable aborts the iteration; next leader tick
  retries with the same wid so the reservation is reused.
- Extract compute_withdrawal_wid helper so both touchpoints derive
  the same deterministic identifier from request_ids.

Move package: no changes.
@0xsiddharthks 0xsiddharthks force-pushed the siddharth/deploy-guardian branch from 370b474 to 942504b Compare April 23, 2026 10:08
@0xsiddharthks 0xsiddharthks force-pushed the siddharth/guardian-integration branch from a4e0082 to 2be020f Compare April 23, 2026 10:12
@0xsiddharthks 0xsiddharthks changed the base branch from siddharth/deploy-guardian to siddharth/guardian-e2e-harness April 23, 2026 10:12
0xsiddharthks added a commit that referenced this pull request Apr 23, 2026
The hard-reserve path added in #423/PR-2 only debits the bucket AFTER
MPC — which means a leader can spend minutes doing committee fan-out
+ MPC signing only to discover the guardian is out of capacity. Add a
pre-commit soft reserve so that over-limit withdrawals abort cheaply
and do not churn leader iterations.

Guardian side (`hashi-types` + `hashi-guardian`):
- Extend RateLimiter with pending_reserves: HashMap<wid, PendingReserve>,
  a soft_reserve method (idempotent on wid, monotonic timestamp,
  capacity = refill - sum(pending)), and expire_pending for TTL sweep
  (SOFT_RESERVE_TTL_SECS = 5 min).
- consume now takes wid as well and drops any matching pending entry so
  a hard reserve converts the soft reserve atomically.
- Add SoftReserveWithdrawal RPC and handler. Soft reserves do not
  require a committee signature — the TTL bounds DoS blast radius and
  wid idempotency handles retries.
- Spawn a 1-second TTL sweep task in main so expired reservations free
  capacity promptly.

Hashi side:
- Add GuardianClient::soft_reserve_withdrawal wrapper.
- In leader::process_approved_withdrawal_request_batch, probe the
  guardian immediately after coin selection (build_withdrawal_tx_
  commitment) and before the committee BLS fan-out for commit.
  Rate-limited / unavailable aborts the iteration; next leader tick
  retries with the same wid so the reservation is reused.
- Extract compute_withdrawal_wid helper so both touchpoints derive
  the same deterministic identifier from request_ids.

Move package: no changes.
@0xsiddharthks 0xsiddharthks changed the title [2/n][guardian-integration] integrate rate limiting into withdrawal flow [3/n][guardian-integration] integrate rate limiting into withdrawal flow Apr 23, 2026
0xsiddharthks added a commit that referenced this pull request Apr 23, 2026
The hard-reserve path added in #423/PR-2 only debits the bucket AFTER
MPC — which means a leader can spend minutes doing committee fan-out
+ MPC signing only to discover the guardian is out of capacity. Add a
pre-commit soft reserve so that over-limit withdrawals abort cheaply
and do not churn leader iterations.

Guardian side (`hashi-types` + `hashi-guardian`):
- Extend RateLimiter with pending_reserves: HashMap<wid, PendingReserve>,
  a soft_reserve method (idempotent on wid, monotonic timestamp,
  capacity = refill - sum(pending)), and expire_pending for TTL sweep
  (SOFT_RESERVE_TTL_SECS = 5 min).
- consume now takes wid as well and drops any matching pending entry so
  a hard reserve converts the soft reserve atomically.
- Add SoftReserveWithdrawal RPC and handler. Soft reserves do not
  require a committee signature — the TTL bounds DoS blast radius and
  wid idempotency handles retries.
- Spawn a 1-second TTL sweep task in main so expired reservations free
  capacity promptly.

Hashi side:
- Add GuardianClient::soft_reserve_withdrawal wrapper.
- In leader::process_approved_withdrawal_request_batch, probe the
  guardian immediately after coin selection (build_withdrawal_tx_
  commitment) and before the committee BLS fan-out for commit.
  Rate-limited / unavailable aborts the iteration; next leader tick
  retries with the same wid so the reservation is reused.
- Extract compute_withdrawal_wid helper so both touchpoints derive
  the same deterministic identifier from request_ids.

Move package: no changes.
@0xsiddharthks 0xsiddharthks force-pushed the siddharth/guardian-e2e-harness branch from c124d70 to 2fcf91f Compare April 23, 2026 11:36
@0xsiddharthks 0xsiddharthks force-pushed the siddharth/guardian-integration branch from 52b6e4c to c61f080 Compare April 23, 2026 11:36
@0xsiddharthks 0xsiddharthks force-pushed the siddharth/guardian-e2e-harness branch from 2fcf91f to 56510fb Compare April 23, 2026 11:54
@0xsiddharthks 0xsiddharthks force-pushed the siddharth/guardian-integration branch 4 times, most recently from 55d5d8d to e02a373 Compare April 23, 2026 14:07
@0xsiddharthks 0xsiddharthks changed the base branch from siddharth/guardian-e2e-harness to siddharth/guardian-local-limiter April 23, 2026 14:08
@0xsiddharthks 0xsiddharthks changed the title [3/n][guardian-integration] integrate rate limiting into withdrawal flow [guardian] wire rate limit into withdrawal flow Apr 23, 2026
@0xsiddharthks 0xsiddharthks marked this pull request as ready for review April 23, 2026 15:10
@0xsiddharthks 0xsiddharthks changed the title [guardian] wire rate limit into withdrawal flow [4/n][guardian-integration] wire rate limit into withdrawal flow Apr 23, 2026
@0xsiddharthks 0xsiddharthks force-pushed the siddharth/guardian-integration branch from e02a373 to 51e9e41 Compare April 23, 2026 18:49
@0xsiddharthks 0xsiddharthks force-pushed the siddharth/guardian-local-limiter branch from 178b639 to b3a5c1b Compare April 26, 2026 21:22
0xsiddharthks added a commit that referenced this pull request Apr 27, 2026
- Remove the unused `network` field and `network()` getter on
  `GuardianHarness`. The enclave already owns the network internally
  and no caller reads `network()`; `#[allow(dead_code)]` on a `pub`
  API is a smell since `pub` already opts out of dead-code lint.
- Add a TODO on `test_bitcoin_withdrawal_with_guardian_e2e_flow`
  noting it should fold back into a parameterized test on
  `with_guardian: bool` once #423 wires guardian into the actual
  withdrawal path, so the two near-clone flows can't drift.
0xsiddharthks added a commit that referenced this pull request Apr 27, 2026
- Remove the unused `network` field and `network()` getter on
  `GuardianHarness`. The enclave already owns the network internally
  and no caller reads `network()`; `#[allow(dead_code)]` on a `pub`
  API is a smell since `pub` already opts out of dead-code lint.
- Add a TODO on `test_bitcoin_withdrawal_with_guardian_e2e_flow`
  noting it should fold back into a parameterized test on
  `with_guardian: bool` once #423 wires guardian into the actual
  withdrawal path, so the two near-clone flows can't drift.
@0xsiddharthks 0xsiddharthks marked this pull request as draft April 27, 2026 12:10
@0xsiddharthks 0xsiddharthks force-pushed the siddharth/guardian-local-limiter branch from f2e3300 to f826f4b Compare April 27, 2026 21:34
@0xsiddharthks 0xsiddharthks changed the title [4/n][guardian-integration] wire rate limit into withdrawal flow [4/n][guardian-integration] wire guardian into withdrawal flow Apr 28, 2026
@0xsiddharthks 0xsiddharthks force-pushed the siddharth/guardian-local-limiter branch from f826f4b to 7ef840d Compare April 28, 2026 14:26
@0xsiddharthks 0xsiddharthks force-pushed the siddharth/guardian-integration branch from b277025 to 0a2b584 Compare April 28, 2026 14:49
@0xsiddharthks 0xsiddharthks changed the title [4/n][guardian-integration] wire guardian into withdrawal flow [5/n][guardian-integration] wire guardian into withdrawal flow Apr 28, 2026
@0xsiddharthks 0xsiddharthks force-pushed the siddharth/guardian-integration branch from 0a2b584 to 8d1d85e Compare April 28, 2026 16:55
@0xsiddharthks 0xsiddharthks changed the base branch from siddharth/guardian-local-limiter to siddharth/guardian-mpc-signing-validation April 28, 2026 16:55
@0xsiddharthks 0xsiddharthks force-pushed the siddharth/guardian-mpc-signing-validation branch from a71b41c to bfb9bdb Compare April 28, 2026 17:05
@0xsiddharthks 0xsiddharthks force-pushed the siddharth/guardian-integration branch from 8d1d85e to 0eb3af4 Compare April 28, 2026 17:08
@0xsiddharthks 0xsiddharthks marked this pull request as ready for review April 28, 2026 17:54
@0xsiddharthks 0xsiddharthks changed the title [5/n][guardian-integration] wire guardian into withdrawal flow [5/n][guardian-integration] wire explicit call to guardian during withdrawal flow Apr 28, 2026
@0xsiddharthks 0xsiddharthks force-pushed the siddharth/guardian-mpc-signing-validation branch from 47314bc to f85d4ea Compare April 28, 2026 22:31
Aggregates a BLS cert over a `StandardWithdrawalRequest` after MPC
signing quorum and forwards it to the guardian for the enclave
signature. The local limiter has already been advanced by every
committee member at MPC signing time (see PR #495), so this just
records the consume with the authoritative guardian.

- Proto: `SignGuardianWithdrawalRequest` message + RPC
- bridge_service: `sign_guardian_withdrawal_request` handler
- guardian_client: `standard_withdrawal` RPC
- withdrawals: `validate_and_sign_guardian_withdrawal_request`,
  `build_guardian_withdrawal_request`, `compute_withdrawal_wid`
- leader: post-MPC `finalize_withdrawal_through_guardian` + BLS
  fan-out helpers; `max_concurrent=1` when guardian is configured
- guardian/limiter, hashi-guardian/{enclave,withdraw}: `wid` parameter
  for the guardian-side idempotency cache
- e2e: assert local_state == guardian_state after withdrawal
Pre-filter the approved-withdrawal batch to the longest timestamp-sorted
prefix whose cumulative `btc_amount` fits the local limiter's currently
available capacity, and add a third "process now" trigger when that
prefix is shorter than the full pending set. This short-circuits the
batching window when accumulated demand has already filled the bucket
and avoids broadcasting a Step 2 BLS round that #495's per-node MPC
validation would just reject.

A request larger than the currently-available capacity stays at the
head of the queue (waiting for refill if possible, or stuck forever if
it exceeds `max_bucket_capacity`). Warn once per request id so the
operator notices a stuck head; the warn set is pruned each checkpoint
to the still-pending ids.

`process_approved_withdrawal_requests` becomes async since
`LocalLimiter::capacity_at` holds a tokio mutex; the only call site is
already in the leader's async loop.

No-guardian deployments take the `else` branch and behave bit-for-bit
as before.
@0xsiddharthks 0xsiddharthks force-pushed the siddharth/guardian-integration branch from ff408c1 to 0a803c7 Compare April 28, 2026 22:33
After submit_sign_withdrawal succeeds, wait for the watcher to observe
the on-chain signatures before returning from process_unsigned_withdrawal_txn.
Until this returns, the txn ID stays in inflight_withdrawal_signings,
so the next checkpoint can't respawn the task and re-call the guardian
RPC with seq+1, which would double-advance the authoritative limiter.

Gated on guardian_client.is_some() so non-guardian deployments behave
as before.
Local-limiter bootstrap is event-driven and per-node, so without an
explicit wait the first MPC sign request can race with bootstrap on
slow CI: a follower whose limiter is configured but a leader whose
limiter isn't (or vice versa) bails on the validate_consume guard,
MPC times out, and the leader retries.

Add HashiNodeHandle::wait_for_local_limiter and call it after the
guardian harness finalizes.
Tech-lead review feedback on the two previous commits:

leader::wait_until_signed_visible (renamed from wait_for_signed_withdrawal_visible):
- Drop the guardian_client gate at the call site. The same respawn race
  produces a duplicate sign_withdrawal submission without guardian, just
  with a different failure mode (noisy sui_tx_submissions_total). One
  unconditional wait covers both.
- Take &Hashi instead of &Arc<Hashi>; the helper never clones.
- Use the existing OnchainState::withdrawal_txn(id) getter instead of
  cloning the whole withdrawal_txns map and scanning it.
- Use is_some_and for the visibility check.
- Drop the redundant is_err() branch on the watch wakeup; loop body
  re-checks visibility on every iteration regardless.

e2e: drop the ?; Ok(()) tail on wait_for_local_limiter and parallelize
the per-node bootstrap waits with try_join_all.

info!("Checking rate limits.");
let consumed_amount_sats = request.utxos().external_out_amount().to_sat();
let wid = *request.wid();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this a u64 if the actual id is 32 bytes?

Comment on lines +179 to +180
// Timestamp in unix seconds (used for guardian rate limiting).
uint64 timestamp_secs = 2;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is expected to be a semi-recent timestamp from a checkpoint correct?

Ok(hashi_types::guardian::StandardWithdrawalRequest::new(
wid,
utxos,
timestamp_secs,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The timestamp needs to be validated that it is within some acceptable range from this node's view of time.

Comment on lines +1351 to +1357
/// Deterministic 64-bit id for a withdrawal — leading bytes of Blake2b256 over
/// the BCS-encoded request ids. Stable across restarts and leader rotations.
pub fn compute_withdrawal_wid(request_ids: &[Address]) -> u64 {
let bytes = bcs::to_bytes(&request_ids).expect("serialization should succeed");
let hash = Blake2b256::digest(&bytes);
u64::from_le_bytes(hash.digest[..8].try_into().unwrap())
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't need this, a Withdrawal already has a stable ID, its UID. Lets use that instead of coming up with a new id mechanism.


pub async fn consume_from_limiter(
&self,
wid: u64,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But wid is not actually used inside consume, no? is this for future logging purposes?

@dlukegordon dlukegordon marked this pull request as draft May 1, 2026 18:27
@dlukegordon dlukegordon marked this pull request as ready for review May 1, 2026 18:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants