[kyoto] rebuild on consecutive PotentialStaleTip warnings#491
Closed
0xsiddharthks wants to merge 1 commit intosiddharth/kyoto-opportunistic-winsfrom
Closed
[kyoto] rebuild on consecutive PotentialStaleTip warnings#4910xsiddharthks wants to merge 1 commit intosiddharth/kyoto-opportunistic-winsfrom
0xsiddharthks wants to merge 1 commit intosiddharth/kyoto-opportunistic-winsfrom
Conversation
Add a third trigger for the connectivity supervisor's rebuild path.
Today KYOTO_MAX_CONSECUTIVE_FAILURES catches "we cannot reach any
peer", but it does not catch the case where peers are alive and not
reporting connection errors yet kyoto has heard no new tip for an
extended window. bip157 surfaces this as Warning::PotentialStaleTip.
- New constant KYOTO_MAX_CONSECUTIVE_STALE_TIPS = 3.
- Track consecutive PotentialStaleTip warnings in run_event_loop.
Reset on any chain progress event (BlockHeaderChanges::Connected,
Reorganized, or Event::FiltersSynced).
- When the count reaches the threshold, return ConnectivityLost
with a typed reason so the supervisor's log distinguishes
failure-driven rebuilds from stale-tip-driven ones.
KyotoEventLoopExit::ConnectivityLost now carries a KyotoRebuildReason
enum (ConsecutiveConnectionFailures or ConsecutiveStaleTips). The
supervisor's restart log includes the reason field, and rpc and stale
counters are both reset on rebuild.
Adds the gauge hashi_kyoto_consecutive_stale_tips for observability.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
KYOTO_MAX_CONSECUTIVE_FAILUREScatches "we cannot reach any peer", but it does not catch the case where peers are alive (and not flagging connection errors) yet kyoto has heard no new tip for a while. bip157 surfaces this asWarning::PotentialStaleTip, which previously only bumped a metric.KYOTO_MAX_CONSECUTIVE_STALE_TIPS = 3. Counter is incremented on eachPotentialStaleTipand reset on any chain progress (BlockHeaderChanges::Connected,Reorganized,Event::FiltersSynced).KyotoEventLoopExit::ConnectivityLostnow carries aKyotoRebuildReasonenum (ConsecutiveConnectionFailuresorConsecutiveStaleTips). The supervisor's restart log includes the reason via tracing, so dashboards/alerts can distinguish the two trigger paths.hashi_kyoto_consecutive_stale_tipsgauge for observability.Test
cargo nextest run -p hashi(274 passed)make fmt && make clippykyoto_consecutive_stale_tipsclimbs to 3, thenkyoto_restartsincrements with the newconsecutive_stale_tipsreason in the log.Follow-up
kyoto_consecutive_stale_tipsdistribution; tune the threshold if 3 is too eager or too patient for production cadence.