Skip to content

[kyoto] expose chain_tip and peer_info, add peer-flag metrics#490

Closed
0xsiddharthks wants to merge 1 commit intosiddharth/kyoto-supervisor-calibrationfrom
siddharth/kyoto-opportunistic-wins
Closed

[kyoto] expose chain_tip and peer_info, add peer-flag metrics#490
0xsiddharthks wants to merge 1 commit intosiddharth/kyoto-supervisor-calibrationfrom
siddharth/kyoto-opportunistic-wins

Conversation

@0xsiddharthks
Copy link
Copy Markdown
Contributor

Stacked on top of #489.

Summary

  • Add MonitorClient::chain_tip() — synchronous query into kyoto's tip, independent of the cached watch channel. Useful for healthz endpoints that need to distinguish "no new blocks for a while" from "kyoto is wedged".
  • Add MonitorClient::peer_info() — live snapshot of (AddrV2, ServiceFlags) for every connected peer. Lets us diagnose "we're connected but filter sync isn't progressing" cases without scraping logs.
  • Add two prometheus gauges populated by a 30s background poller in the event loop:
    • hashi_kyoto_peers_with_compact_filters — peers advertising NODE_COMPACT_FILTERS. If this drops to 0 we cannot make filter-sync progress even when overall connection counts look healthy.
    • hashi_kyoto_peers_v2 — peers using BIP-324 (P2P V2) transport.
  • Both new client calls and the poller use a 5s timeout via the existing rpc_workers JoinSet so a wedged kyoto node cannot stall a worker. On timeout / rebuild-in-progress the metrics keep their previous value until the next tick. The poller's interval uses MissedTickBehavior::Skip to avoid catch-up bursts.

Test

  • cargo nextest run -p hashi (274 passed)
  • make fmt && make clippy

Follow-up

  • siddharth/kyoto-stale-tip-rebuild (next): use Warning::PotentialStaleTip as a third trigger for the connectivity-supervisor rebuild (today it only bumps a metric). Catches the "connected but stuck" case that KYOTO_MAX_CONSECUTIVE_FAILURES does not.

Add three diagnostic surfaces using APIs that landed in bip157 v0.5.0:

  - MonitorClient::chain_tip()  -> synchronous query of kyoto's tip,
    independent of the cached watch channel. Useful for healthz
    endpoints that want to confirm the kyoto node is alive (a stale
    cache cannot tell the difference between "no new blocks for a
    while" and "kyoto is wedged").

  - MonitorClient::peer_info()  -> live snapshot of (AddrV2,
    ServiceFlags) for every connected peer. Exposes which peers are
    advertising NODE_COMPACT_FILTERS and NODE_P2P_V2 — needed to
    diagnose "we're connected but filter sync isn't progressing"
    cases.

  - Two prometheus gauges populated by a 30s background poller:
      * hashi_kyoto_peers_with_compact_filters
      * hashi_kyoto_peers_v2

Both new client calls and the poller go through the existing
rpc_workers JoinSet and use a 5s timeout so a wedged kyoto node
cannot stall a worker forever; on timeout / rebuild-in-progress the
metrics keep their previous value until the next tick.

The poller arm uses MissedTickBehavior::Skip so a long-running select
arm doesn't trigger a burst of catch-up polls.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant