Conversation
Subscribe to the chain-index sync loop's `NamedAtomicStatus` via `tokio::sync::watch` so test setup wakes on the first `Ready` transition instead of after the next 2 s polling tick. A single `sleep(sync_timings.interval)` after rendezvous preserves the load-bearing teardown alignment at ~500 ms (down from 2 s); that shim disappears when the indexer gains `CancellationToken`-based shutdown. `wait_for_indexer_tip` is rewired against the same watch with a `tokio::time::timeout` budget. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…hanges Add `BlockchainSource::wait_for_change` (default returns a never-resolving future), have the success-path interval sleep `select!` between the timer and that future. MockchainSource overrides `wait_for_change` with a `Notify` pinged from `mine_blocks`, so the chain-index sync loop wakes immediately on a mocked block instead of after the next 500 ms tick. Per-block test latency drops from ~510 ms (one full `interval`) to ~10 ms (sync work only). `sync_blocks_after_startup` ~12 s → ~1–2 s, hitting #1039's "under 2 s" target. Real backends keep the default and pace on the timer unchanged. `NonFinalizedState::initialize` now takes `source.clone()` so the outer `source` survives the inner async block's `.await` and is reachable in the new `select!` arm. Known regression: `get_mempool_stream_for_stale_snapshot` flips from a probabilistic flake (#1037) to a deterministic failure. The chain-index sync loop now wakes immediately on `mine_blocks`, but the mempool serve loop still polls on its 100 ms cadence, so the chain-index/mempool tip-skew window is now always observable. The principled fix (per-subscriber notify + select! in the mempool loop) is the work tracked under #1037 and follows in a TDD'd PR. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…tion #1 Rename `get_mempool_stream_for_stale_snapshot` to `skew_chain_index_ahead_returns_stale_stream` and add a doc-comment identifying it as direction #1 of the #1037 tip-skew suite. The companion test for the mempool-ahead direction follows. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
new `mempool_skew` mod (1:1 with `tests/mempool_skew.rs`); add the companion direction-#2 test alongside. - `mempool_skew::chain_index_ahead_returns_stale_stream` — pre-existing direction-#1 spec, deterministically failing post-Option-2 for the same reason it was a flake before (#1037). - `mempool_skew::mempool_ahead_rejects_fresh_snapshot` — new direction-#2 spec. Drives the source via `mine_blocks_silent` (no chain-index notify) so the mempool's poll observes the new tip first, exposing the symmetric false-negative case where the API rejects its own freshly-issued snapshot. Test-only infrastructure to enable the new test: - `MockchainSource::mine_blocks_silent` advances the active height without firing the change-notify, the only way to put the chain-index *behind* the mempool now that `mine_blocks` wakes the sync loop. - `MempoolSubscriber::mempool_tip` (cfg(test)) clones the `mempool_chain_tip` watch receiver; forwarder on `NodeBackedChainIndexSubscriber::mempool_tip`. - `wait_for_mempool_tip_change` helper in `mempool_skew.rs` waits on that receiver, gating direction #2 to act exactly in the skew window. - `mockchain_tests::wait_for_indexer_tip` widened to `pub(super)` so the sibling mod can reuse it. Direction #2 is expected to fail until the principled fix lands. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add `tips_converge_within_bounded_time_after_mining` — a 20-sample
property test that mines one block per iteration, awaits the
chain-index tip via the existing event-driven helper, then samples the
mempool's tracked tip immediately and counts iterations where the
mempool still has the pre-mining hash. Asserts `lag_count == 0`.
Today the chain-index sync loop wakes on `mine_blocks` while the
mempool serve loop polls on its own 100 ms cadence, so the mempool
lags in the vast majority of iterations and the test fails
deterministically. Once the principled fix wakes both subsystems on
`mine_blocks`, both converge within a single iteration and `lag_count`
is zero.
Test-only support:
- `MockchainSource::active_block_hash` returns the block hash at the
current active height — needed to compare the mempool's tracked tip
against the expected post-mining hash.
- `wait_for_mempool_tip` helper waits for a specific tip hash on the
mempool's watch (companion to the existing
`wait_for_mempool_tip_change`); used to resync between iterations so
the test isn't chasing stacked updates.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…shness (#1037 part 1/2) `NodeBackedChainIndexSubscriber::get_mempool_stream` previously delegated the snapshot-freshness check to `Mempool`, which compared the snapshot's tip against the mempool serve loop's `mempool_chain_tip`. Because the chain-index sync loop and the mempool serve loop poll the source on independent cadences, the two views diverge for a window after every block — and the API exposed the divergence as caller-visible internal contradictions. Move the freshness check into the chain-index subscriber and compare against the chain-index's *current* non-finalized tip (synchronous `arc-swap` load on `non_finalized_state`, same source of truth as `snapshot_nonfinalized_state`). Pass `None` to `MempoolSubscriber::get_mempool_stream` so the mempool no longer gates on tip — staleness is the chain-index's responsibility. This flips the two skew-direction regression tests added under #1037: - `mempool_skew::chain_index_ahead_returns_stale_stream` (direction #1): FAIL → PASS. After `mine_blocks` + `wait_for_indexer_tip`, the chain-index has the new tip while the mempool still has the old one. The pre-fix check matched the stale snapshot against the mempool's lagging view and accepted it; the new check matches against the chain-index's advanced view and correctly returns `None`. - `mempool_skew::mempool_ahead_rejects_fresh_snapshot` (direction #2): FAIL → PASS. After `mine_blocks_silent` + `wait_for_mempool_tip_change`, the mempool has the new tip while the chain-index still has the old one. The pre-fix check rejected the freshly-issued snapshot because the mempool's tip had moved past it; the new check matches against the chain-index's authoritative (lagging) view and correctly returns `Some`. `mempool_skew::tips_converge_within_bounded_time_after_mining` still fails — the mempool serve loop still polls on its own cadence, so it lags the chain-index in raw timing. That's the next work and lands in part 2/2 (per-subscriber wake on source change via `tokio::sync::broadcast`). The dead `Some(Err(MempoolError::IncorrectChainTip { .. }))` arm in `get_mempool_stream`'s outer match is left in place as defensive code — the mempool can no longer return that error from this call site, but the arm doesn't cost anything and keeps the match exhaustive against future Mempool surface changes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…art 2/2)
Both the chain-index sync loop and the mempool serve loop pace
themselves on a fixed-cadence timer; only the chain-index used to wake
early on source state change (Option-2 `Notify`). After a `mine_blocks`,
the chain-index would converge in the same iteration but the mempool
would not pick up the new tip until its 100 ms timer expired. That
asymmetric lag was the failure mode behind
`mempool_skew::tips_converge_within_bounded_time_after_mining`.
Replace `BlockchainSource::wait_for_change` (a single shared
never-resolves future) with `change_subscribe` returning
`Option<broadcast::Receiver<()>>`. Each subscriber gets its own
buffered receiver, so a `mine_blocks` that fires while one subsystem
is mid-iteration is preserved on its receiver and consumed on the
next `recv().await` — no missed-wake gap. Push-capable sources (the
mockchain) override to return `Some`; poll-only sources (real
validators) keep the `None` default and fall through to the timer.
Fix the actual convergence bug: the post-`send_replace` cool-down at
the top of the mempool serve loop's tip-change branch was a bare
`tokio::time::sleep(100ms)` not wired to any wake source. After
iter-0's mine event the mempool would land in this sleep and stay
deaf to the broadcast for the next iter-1 mine — so the test saw 19
/ 20 events lag (iter 0 the only exception, since the mempool
started in the broadcast-aware bottom `select!` after init). Both
the cool-down and the inter-iteration sleep now go through the
shared helper.
DRY the now-three call sites of the
`match change_rx { Some => select!(sleep, recv), None => sleep }`
pattern into `source::wait_or_source_change(change_rx.as_mut(),
duration)`. The helper is `pub(super)` — only the chain-index sync
loop and the mempool serve loop need it.
`mempool_skew::tips_converge_within_bounded_time_after_mining`:
FAIL (19/20 lags) → PASS (0 lags). The two direction tests
(`chain_index_ahead_returns_stale_stream`,
`mempool_ahead_rejects_fresh_snapshot`) keep passing — `mine_blocks_silent`
still suppresses the broadcast, and the mempool's 100 ms timer alone
is enough to put it ahead of the chain-index for direction #2.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`escalates_to_critical_after_persistent_failure` open-coded the exact
shape of `super::poll::poll_until`: a `loop { sleep; check; assert
elapsed < deadline }` with a redundant post-loop deadline assert.
Replace it with a single `poll_until` call — same probe cadence
(`timings.initial_backoff`), same deadline (`timings.max_backoff_window()
* 5`), with the deadline panic now handled inside the helper.
Behaviour is preserved with one minor tightening: the original's
post-loop `elapsed < max_time_to_critical` assert had a flake window
of up to one `poll_interval` over budget (inner assert passed at
iter N-1, then sleep + status==CriticalError break at iter N could
push elapsed past the bound). `poll_until` collapses the two
deadline checks into one panic on `Instant::now() >= deadline`,
matching the test's stated intent.
Brings this test in line with `tip_converges_after_burst_mine` in
the same module — same helper, same shape. `Instant` and `sleep`
imports stay; both are still used by
`survives_transient_source_failure`, which is a soak test (verify
status stays NOT-CriticalError for 2 s under a transient failure)
and intentionally does not fit the `poll_until` shape.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Drop initial_backoff 25→10 ms, max_backoff 800→50 ms, and max_consecutive_failures 10→5 so escalates_to_critical_after_persistent_failure drives its backoff schedule to completion in ~120 ms (4 sleeps: 10 + 20 + 40 + 50, with the cap clamping the would-be 80 ms doubling) instead of ~4 s. The compressed schedule still exercises the same three observable behaviours as default(): counter→threshold, exponential doubling, and max_backoff clamp. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…10 ms
The 100 ms ProptestMockchain delay keeps fs.sync_to_height from
initializing non_finalized_state before the passthrough_test body
runs — an invariant the :109 assertion on StillSyncingFinalizedState
depends on. 100 ms was a conservative gut number: at that rate,
sync_to_height(139) takes ~28 s before initializing nfs, orders of
magnitude more than any test body needs.
Drop to 10 ms. sync_to_height(139) now takes ~2.8 s before nfs
initializes, still far longer than any test body (longest is
passthrough_get_block_range at ~2 s of body work). The invariant is
preserved with comfortable headroom.
Direct savings:
- passthrough_get_block_range: its 10-block serial stream goes from
10 × 100 ms = 1 s per task → 10 × 10 ms = 100 ms. ~900 ms off wall.
- All seven passthrough_* tests: Mempool::spawn's three serial source
probes (get_mempool_txids, get_best_block_hash,
get_mempool_transactions) drop from 3 × 100 ms to 3 × 10 ms.
~270 ms off each.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…l::spawn Mempool::spawn had three serial startup probes: (1) retry-loop on get_mempool_txids waiting for validator mempool online, (2) one-shot get_best_block_hash (fatal on error), (3) retry-loop on get_mempool_transactions for initial contents. (1) and (2) share no data — (2) doesn't depend on mempool liveness, and (1) discards its result. Serial ordering added one full source round-trip to every NodeBackedChainIndex::new for no semantic reason. Run (1) and (2) concurrently via tokio::try_join!. If (2) hard-errors, try_join! cancels (1)'s retry-loop and returns the Critical error — same overall outcome as before, just without the wait. The 3 s retry backoff on (1) is preserved; (2)'s fatal-on-error semantics are preserved. Savings per indexer init: - tests (mock source at 10 ms): ~10 ms - prod LAN: a few ms - prod WAN: 50-200 ms Every passthrough_* test, make_chain, and any other test or code path that spins up a NodeBackedChainIndex benefits. (3) still runs serially after the parallel gather — extracting it for concurrent execution is a separate refactor because its helper is wired through the partially-built Mempool struct. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…l::spawn gather
Follow-up to the previous Mempool::spawn refactor, which parallelized
the wait-for-online and best-block-hash probes but left the initial
mempool-snapshot fetch running serially after them. Fold it in so all
three source-dependent startup calls run concurrently.
To make it poolable with the other two, extracted
Mempool::get_mempool_transactions (private method on Mempool<T>) into a
free fn fetch_mempool_snapshot<T: BlockchainSource>(fetcher: &T) — the
method only ever touched self.fetcher, so there was no real reason it
had to be a method. The serve() call site now uses the free fn too.
Two minor behavioural cleanups enabled by this:
- Initial Mempool status is now constructed as Ready instead of Spawning
(the snapshot has already been fetched by the time we build the
struct), matching what the old code did on first-iteration success.
- Dropped the `state.notify(...)` call that was inside the snapshot
retry loop — during spawn there are no subscribers attached yet, so
the notify was unreachable.
Savings per indexer init:
- tests (mock source at 10 ms): 30 ms serial → 10 ms parallel, ~20 ms.
- prod LAN: a few ms.
- prod WAN: 100–400 ms.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Builds on #1055 merge that first.
Fixes: #1038