fix: reduce post-reallocation receipt rejection window during network subgraph polling gap by cargopete · Pull Request #973 · graphprotocol/indexer-rs

cargopete · 2026-03-13T10:21:42Z

Context

This PR addresses a partial contributor to the post-reallocation query outage reported by Ellipfra and others. The full outage lasts around 60 minutes and is primarily caused by gateway-side network subgraph lag - the gateway continues issuing receipts with stale allocation IDs for the duration of its own propagation delay, which is not addressable from indexer-rs, unfortunately.

However, there is a shorter initial window (~0–5 minutes) where indexer-service itself contributes errors due to two gaps in how recently-closed allocations are handled at the service layer. This PR attempts to fix those two gaps.

What indexer-service-rs is doing wrong

Indexers logs show:

Receipt allocation ID `0xblahblah...` is not eligible for this indexer

This error fires in the minutes immediately following a reallocation, then disappears - while gateway errors persist for the full 60 minutes. The indexer-service errors are caused by two issues:

Issue 1 — Attestation signer evicted too eagerly (`crates/monitor/src/attestation.rs`)

modify_signers unconditionally drops any signer not present in the current allocations map:

signers.retain(|id, _| allocations.contains_key(id));

Between the on-chain closure and the monitor's next successful poll, the signer is gone. If a receipt arrives during this window, the query is served but attestation_middleware returns attestation: null, producing BadResponse(bad attestation: ...) at the gateway.

Issue 2 — Receipt eligibility hard-rejects with no grace period (`crates/service/src/tap/checks/allocation_eligible.rs`)

AllocationEligible::check does a hard lookup against the same watch channel. If the allocation is transiently absent during the polling gap, the receipt is immediately rejected - no query is processed, no attestation is returned.

The existing recently_closed_allocation_buffer_secs config (default: 3600s) was designed to prevent exactly this, but it is only applied to the network subgraph query. It is never threaded to the signer eviction logic or the eligibility check at the service layer.

Fix

Fix 1 adds an evicted_at: HashMap<Address, Instant> tombstone map to modify_signers. Signers are kept alive for grace_period after first eviction rather than being dropped immediately.

Fix 2 adds a recently_seen: HashMap<Address, Instant> local cache to AllocationEligible. Before hard-rejecting, the check consults this cache. If the allocation was confirmed eligible within grace_period, the receipt is accepted.

Both fixes source grace_period from recently_closed_allocation_buffer_secs - no new config surface. Behaviour beyond the grace period is identical to current.

What this does and doesn't fix

Window	Before this PR	After this PR
Minutes 0–5 (monitor polling gap)	Indexer-service rejects receipts and drops signers	Receipts accepted, responses correctly attested
Minutes 5–60 (gateway propagation lag)	Gateway errors persist	Gateway errors persist - not fixed here

The 55-minute tail of the outage requires the gateway to either sync its network subgraph faster or stop issuing receipts with stale allocation IDs after a reallocation. That is a gateway-side fix.

Safety

No payment path changes. Accepting receipts for recently-closed allocations within the grace period is consistent with existing recently_closed_allocation_buffer_secs semantics - tap-agent already aggregates RAVs for these.
AttestationSigner construction is deterministic and purely local. Retaining it for the grace period does not affect signing correctness.
Hard-reject resumes after grace period expiry, identical to current behaviour.
AllocationEligible::new defaults to 3600s. Existing call sites require no changes.

Files changed

crates/monitor/src/attestation.rs
crates/service/src/tap/checks/allocation_eligible.rs
crates/service/src/service/router.rs
crates/service/src/tap.rs

…ity checks

fix: add grace period for attestation signers and allocation eligibil…

d09dc97

…ity checks

cargopete marked this pull request as ready for review March 13, 2026 10:26

Merge branch 'main' into fix/post-reallocation-query-outage

85546d8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: reduce post-reallocation receipt rejection window during network subgraph polling gap#973

fix: reduce post-reallocation receipt rejection window during network subgraph polling gap#973
cargopete wants to merge 2 commits intographprotocol:mainfrom
cargopete:fix/post-reallocation-query-outage

cargopete commented Mar 13, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

cargopete commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Context

What indexer-service-rs is doing wrong

Issue 1 — Attestation signer evicted too eagerly (crates/monitor/src/attestation.rs)

Issue 2 — Receipt eligibility hard-rejects with no grace period (crates/service/src/tap/checks/allocation_eligible.rs)

Fix

What this does and doesn't fix

Safety

Files changed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

cargopete commented Mar 13, 2026 •

edited

Loading

Issue 1 — Attestation signer evicted too eagerly (`crates/monitor/src/attestation.rs`)

Issue 2 — Receipt eligibility hard-rejects with no grace period (`crates/service/src/tap/checks/allocation_eligible.rs`)