Skip to content

Move BlockProcessingResult match out of block lookups#9327

Open
dapplion wants to merge 5 commits into
sigp:unstablefrom
dapplion:gloas-lookup-sync-error-type
Open

Move BlockProcessingResult match out of block lookups#9327
dapplion wants to merge 5 commits into
sigp:unstablefrom
dapplion:gloas-lookup-sync-error-type

Conversation

@dapplion
Copy link
Copy Markdown
Collaborator

@dapplion dapplion commented May 21, 2026

Issue Addressed

As a result we would have to duplicate x3 the big match on BlockProcessingResult we currently have in block lookups mod.rs

This PR moves the match of BlockProcessingResult to sync_methods to reduce the diff of #9155. There are some subtle changes that deserve dedicated attention, and may be drowned in the bigger diff of #9155 otherwise:

Unstable This PR / #9115
Some error conditions immediately Drop the lookup (no retries). For example for "internal" errors like the BeaconChainError Retries ALL errors 4 times. I believe assuming some errors are internal is risky as dropping a lookup drops all its children potentially forcing the node to resync a lot of blocks because of an internal timeout

dapplion added 2 commits May 20, 2026 06:08
Reshape BlockProcessingResult from the AC-verdict-passthrough
Ok/Err/Ignored enum to Imported(info) | Error { penalty, reason }. The
producer (network_beacon_processor) translates beacon-chain
Result<AvailabilityProcessingStatus, BlockError> into this shape via a
new classify_processing_result(), so the consumer only has to resolve
the symbolic WhichPeerToPenalize against an in-scope PeerGroup.

- New WhichPeerToPenalize { BlockPeer, CustodyPeerForColumn(u64) } with
  an `apply(action, &peer_group, reason, cx)` helper. Penalty policy
  lives once in classify_processing_result instead of being duplicated
  across consumer arms.
- Producer emits stable identifiers via two const modules
  (processing_result_info, processing_result_reason) so producer and
  consumer never trade ad-hoc string literals.
- Ignored becomes Error { penalty: None, reason: "processor_overloaded" }
  with a producer-side warn!; the consumer drops the lookup as before.
- DuplicateFullyImported and GenesisBlock map to Imported (which makes
  the consumer's "successfully imported" branch fall out naturally and
  removes the per-BlockError policy block — net -88 lines in mod.rs).
- on_processing_result_inner now captures the downloaded block's
  parent_root up-front, before borrowing request_state mutably, so the
  parent_unknown branch keeps working for any R.

Test rig: three tests construct the new variants directly. Extracted
from sigp#9155 (gloas-lookup-sync) onto bare sigp/unstable.
@dapplion dapplion requested a review from jxs as a code owner May 21, 2026 00:37
@dapplion dapplion force-pushed the gloas-lookup-sync-error-type branch from 50076f0 to 76e344b Compare May 21, 2026 04:04
BlockError::BeaconChainError(_) | BlockError::InternalError(_) => None,
BlockError::DuplicateImportStatusUnknown(_) => None,
BlockError::AvailabilityCheck(inner) => match inner {
AvailabilityCheckError::InvalidColumn((Some(idx), _)) => Some((
Copy link
Copy Markdown
Member

@jimmygchen jimmygchen May 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a note here - if column_index is None, it falls to the Malicious category below and penalize all column peers - on unstable, no peer is penalised:

// If no index supplied this is an un-attributable fault. In practice
// this should never happen.
None => vec![],
}

However, it doesn't look like the None column index case is reachable?

return Err((

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The None column index case reaches block_peer_penalty, which penalizes all column peers it seems, the enum variant and mapping here is slightly confusing - we're using BlockPeer variant to penalise all column peers it seems? I could be wrong

WhichPeerToPenalize::BlockPeer => peer_group.all().copied().collect(),

// per-component failure counter, so `SINGLE_BLOCK_LOOKUP_MAX_ATTEMPTS` bounds the
// retry loop and eventually drops the lookup if the failure persists. Whether the
// peer should be downscored is the producer's call (encoded in `penalty`).
debug!(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was previously error, but given we have now collapsed some of the ignore case as well, so it may not worth error on all failures? do we lose any visibility on errors here?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the apply function call below is supposed to handle those?

Copy link
Copy Markdown
Member

@jimmygchen jimmygchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did a pair review with @dapplion and the simplification looks good to me!

The impact of the simplication is described in the PR, to summarise:

  • Lookup now retries instead of dropping immediately on an errors, and collapse the Drop and Retry cases into just Retry
  • Penalty messages have been changed and number of unique messages are bound to the BlockError enum variants
  • The error categorization and penality looks correct to me:

match result {
Ok(AvailabilityProcessingStatus::Imported(_)) => Self::Imported(true, "imported"),
Ok(AvailabilityProcessingStatus::MissingComponents(_, _)) => {
Self::Imported(false, "missing_components")
}
Err(e) => {
let penalty = match &e {
BlockError::DuplicateFullyImported(_) => {
return Self::Imported(true, "duplicate");
}
BlockError::GenesisBlock => return Self::Imported(true, "genesis"),
BlockError::ParentUnknown { parent_root, .. } => {
return Self::ParentUnknown {
parent_root: *parent_root,
};
}
BlockError::BeaconChainError(_) | BlockError::InternalError(_) => None,
BlockError::DuplicateImportStatusUnknown(_) => None,
BlockError::AvailabilityCheck(inner) => match inner {
AvailabilityCheckError::InvalidColumn((Some(idx), _)) => Some((
PeerAction::MidToleranceError,
WhichPeerToPenalize::CustodyPeerForColumn(*idx),
(&e).into(),
)),
inner => match inner.category() {
AvailabilityCheckErrorCategory::Internal => None,
AvailabilityCheckErrorCategory::Malicious => block_peer_penalty(inner),
},
},
BlockError::ExecutionPayloadError(epe) => {
if epe.penalize_peer() {
block_peer_penalty(epe)
} else {
None
}
}
// Remaining invalid blocks: penalize the block peer. Listed explicitly so a
// new `BlockError` variant forces a compile error here.
BlockError::FutureSlot { .. }
| BlockError::StateRootMismatch { .. }
| BlockError::WouldRevertFinalizedSlot { .. }
| BlockError::NotFinalizedDescendant { .. }
| BlockError::BlockSlotLimitReached
| BlockError::IncorrectBlockProposer { .. }
| BlockError::UnknownValidator(_)
| BlockError::InvalidSignature(_)
| BlockError::BlockIsNotLaterThanParent { .. }
| BlockError::NonLinearParentRoots
| BlockError::NonLinearSlots
| BlockError::PerBlockProcessingError(_)
| BlockError::WeakSubjectivityConflict
| BlockError::InconsistentFork(_)
| BlockError::ParentExecutionPayloadInvalid { .. }
| BlockError::KnownInvalidExecutionPayload(_)
| BlockError::Slashable
| BlockError::EnvelopeBlockRootUnknown(_)
| BlockError::OptimisticSyncNotSupported { .. }
| BlockError::BlobNotRequired(_)
| BlockError::InvalidBlobCount { .. }
| BlockError::BidParentRootMismatch { .. } => block_peer_penalty(&e),
};
Self::Error {
penalty,
reason: format!("{e:?}"),
}
}
}

Just a small comment above @dapplion

@jimmygchen jimmygchen added the waiting-on-author The reviewer has suggested changes and awaits thier implementation. label May 21, 2026
Copy link
Copy Markdown
Member

@pawanjay176 pawanjay176 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Mostly nits

self.send_sync_message(SyncMessage::BlockComponentProcessed {
process_type,
result: crate::sync::manager::BlockProcessingResult::Ignored,
result: BlockProcessingResult::Error {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we just retrying when the cpu is overloaded? does it simplify things somehow? seems wasteful to retry something we already have and just failed to process

// per-component failure counter, so `SINGLE_BLOCK_LOOKUP_MAX_ATTEMPTS` bounds the
// retry loop and eventually drops the lookup if the failure persists. Whether the
// peer should be downscored is the producer's call (encoded in `penalty`).
debug!(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the apply function call below is supposed to handle those?

debug!(
?block_root,
component = ?R::response_type(),
"Lookup component processing ignored, cpu might be overloaded"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just noting that this changes internal availability-processing errors from dropping the whole lookup to retrying.
But given that we are following this up with #9155 , we should be good since every sub state machine there handles its own drop semantics?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

syncing waiting-on-author The reviewer has suggested changes and awaits thier implementation.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants