Move BlockProcessingResult match out of block lookups#9327
Conversation
Reshape BlockProcessingResult from the AC-verdict-passthrough
Ok/Err/Ignored enum to Imported(info) | Error { penalty, reason }. The
producer (network_beacon_processor) translates beacon-chain
Result<AvailabilityProcessingStatus, BlockError> into this shape via a
new classify_processing_result(), so the consumer only has to resolve
the symbolic WhichPeerToPenalize against an in-scope PeerGroup.
- New WhichPeerToPenalize { BlockPeer, CustodyPeerForColumn(u64) } with
an `apply(action, &peer_group, reason, cx)` helper. Penalty policy
lives once in classify_processing_result instead of being duplicated
across consumer arms.
- Producer emits stable identifiers via two const modules
(processing_result_info, processing_result_reason) so producer and
consumer never trade ad-hoc string literals.
- Ignored becomes Error { penalty: None, reason: "processor_overloaded" }
with a producer-side warn!; the consumer drops the lookup as before.
- DuplicateFullyImported and GenesisBlock map to Imported (which makes
the consumer's "successfully imported" branch fall out naturally and
removes the per-BlockError policy block — net -88 lines in mod.rs).
- on_processing_result_inner now captures the downloaded block's
parent_root up-front, before borrowing request_state mutably, so the
parent_unknown branch keeps working for any R.
Test rig: three tests construct the new variants directly. Extracted
from sigp#9155 (gloas-lookup-sync) onto bare sigp/unstable.
50076f0 to
76e344b
Compare
| BlockError::BeaconChainError(_) | BlockError::InternalError(_) => None, | ||
| BlockError::DuplicateImportStatusUnknown(_) => None, | ||
| BlockError::AvailabilityCheck(inner) => match inner { | ||
| AvailabilityCheckError::InvalidColumn((Some(idx), _)) => Some(( |
There was a problem hiding this comment.
Just a note here - if column_index is None, it falls to the Malicious category below and penalize all column peers - on unstable, no peer is penalised:
lighthouse/beacon_node/network/src/sync/block_lookups/mod.rs
Lines 689 to 692 in a9637c1
However, it doesn't look like the None column index case is reachable?
lighthouse/crypto/kzg/src/lib.rs
Line 243 in 815040d
There was a problem hiding this comment.
The None column index case reaches block_peer_penalty, which penalizes all column peers it seems, the enum variant and mapping here is slightly confusing - we're using BlockPeer variant to penalise all column peers it seems? I could be wrong
| // per-component failure counter, so `SINGLE_BLOCK_LOOKUP_MAX_ATTEMPTS` bounds the | ||
| // retry loop and eventually drops the lookup if the failure persists. Whether the | ||
| // peer should be downscored is the producer's call (encoded in `penalty`). | ||
| debug!( |
There was a problem hiding this comment.
This was previously error, but given we have now collapsed some of the ignore case as well, so it may not worth error on all failures? do we lose any visibility on errors here?
There was a problem hiding this comment.
I think the apply function call below is supposed to handle those?
jimmygchen
left a comment
There was a problem hiding this comment.
Did a pair review with @dapplion and the simplification looks good to me!
The impact of the simplication is described in the PR, to summarise:
- Lookup now retries instead of dropping immediately on an errors, and collapse the
DropandRetrycases into justRetry - Penalty messages have been changed and number of unique messages are bound to the
BlockErrorenum variants - The error categorization and penality looks correct to me:
lighthouse/beacon_node/network/src/network_beacon_processor/sync_methods.rs
Lines 1046 to 1112 in c5cf95e
Just a small comment above @dapplion
| self.send_sync_message(SyncMessage::BlockComponentProcessed { | ||
| process_type, | ||
| result: crate::sync::manager::BlockProcessingResult::Ignored, | ||
| result: BlockProcessingResult::Error { |
There was a problem hiding this comment.
Are we just retrying when the cpu is overloaded? does it simplify things somehow? seems wasteful to retry something we already have and just failed to process
| // per-component failure counter, so `SINGLE_BLOCK_LOOKUP_MAX_ATTEMPTS` bounds the | ||
| // retry loop and eventually drops the lookup if the failure persists. Whether the | ||
| // peer should be downscored is the producer's call (encoded in `penalty`). | ||
| debug!( |
There was a problem hiding this comment.
I think the apply function call below is supposed to handle those?
| debug!( | ||
| ?block_root, | ||
| component = ?R::response_type(), | ||
| "Lookup component processing ignored, cpu might be overloaded" |
There was a problem hiding this comment.
Just noting that this changes internal availability-processing errors from dropping the whole lookup to retrying.
But given that we are following this up with #9155 , we should be good since every sub state machine there handles its own drop semantics?
Issue Addressed
As a result we would have to duplicate x3 the big match on
BlockProcessingResultwe currently have in block lookups mod.rsThis PR moves the match of
BlockProcessingResulttosync_methodsto reduce the diff of #9155. There are some subtle changes that deserve dedicated attention, and may be drowned in the bigger diff of #9155 otherwise:Dropthe lookup (no retries). For example for "internal" errors like the BeaconChainError