Skip to content

[kyoto] rebuild on consecutive PotentialStaleTip warnings#491

Closed
0xsiddharthks wants to merge 1 commit intosiddharth/kyoto-opportunistic-winsfrom
siddharth/kyoto-stale-tip-rebuild
Closed

[kyoto] rebuild on consecutive PotentialStaleTip warnings#491
0xsiddharthks wants to merge 1 commit intosiddharth/kyoto-opportunistic-winsfrom
siddharth/kyoto-stale-tip-rebuild

Conversation

@0xsiddharthks
Copy link
Copy Markdown
Contributor

Stacked on top of #490.

Summary

  • Add a third trigger to the connectivity supervisor's rebuild path. Today KYOTO_MAX_CONSECUTIVE_FAILURES catches "we cannot reach any peer", but it does not catch the case where peers are alive (and not flagging connection errors) yet kyoto has heard no new tip for a while. bip157 surfaces this as Warning::PotentialStaleTip, which previously only bumped a metric.
  • New constant KYOTO_MAX_CONSECUTIVE_STALE_TIPS = 3. Counter is incremented on each PotentialStaleTip and reset on any chain progress (BlockHeaderChanges::Connected, Reorganized, Event::FiltersSynced).
  • KyotoEventLoopExit::ConnectivityLost now carries a KyotoRebuildReason enum (ConsecutiveConnectionFailures or ConsecutiveStaleTips). The supervisor's restart log includes the reason via tracing, so dashboards/alerts can distinguish the two trigger paths.
  • Add hashi_kyoto_consecutive_stale_tips gauge for observability.

Test

  • cargo nextest run -p hashi (274 passed)
  • make fmt && make clippy
  • Manual injection plan post-deploy: stop the bitcoind feeding kyoto headers and confirm kyoto_consecutive_stale_tips climbs to 3, then kyoto_restarts increments with the new consecutive_stale_tips reason in the log.

Follow-up

  • After this lands, sample kyoto_consecutive_stale_tips distribution; tune the threshold if 3 is too eager or too patient for production cadence.

Add a third trigger for the connectivity supervisor's rebuild path.
Today KYOTO_MAX_CONSECUTIVE_FAILURES catches "we cannot reach any
peer", but it does not catch the case where peers are alive and not
reporting connection errors yet kyoto has heard no new tip for an
extended window. bip157 surfaces this as Warning::PotentialStaleTip.

  - New constant KYOTO_MAX_CONSECUTIVE_STALE_TIPS = 3.
  - Track consecutive PotentialStaleTip warnings in run_event_loop.
    Reset on any chain progress event (BlockHeaderChanges::Connected,
    Reorganized, or Event::FiltersSynced).
  - When the count reaches the threshold, return ConnectivityLost
    with a typed reason so the supervisor's log distinguishes
    failure-driven rebuilds from stale-tip-driven ones.

KyotoEventLoopExit::ConnectivityLost now carries a KyotoRebuildReason
enum (ConsecutiveConnectionFailures or ConsecutiveStaleTips). The
supervisor's restart log includes the reason field, and rpc and stale
counters are both reset on rebuild.

Adds the gauge hashi_kyoto_consecutive_stale_tips for observability.
@0xsiddharthks 0xsiddharthks requested a review from bmwill as a code owner April 26, 2026 21:56
@0xsiddharthks 0xsiddharthks marked this pull request as draft April 26, 2026 21:58
@0xsiddharthks 0xsiddharthks reopened this Apr 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant