Builder deposits optimisation#9311
Conversation
|
This is ready for review now. |
|
Some required checks have failed. Could you please take a look @pawanjay176? 🙏 |
eserilev
left a comment
There was a problem hiding this comment.
looks good on the whole, just some small suggestion, a question and a few nits
| let decompressed = deposit_data | ||
| .par_iter() | ||
| .enumerate() | ||
| .map(|(index, deposit)| { | ||
| deposit_pubkey_signature_message(deposit, spec) | ||
| .map(|(public_key, signature, message)| (index, public_key, signature, message)) | ||
| }) | ||
| .collect::<Vec<_>>(); |
There was a problem hiding this comment.
I think we should use one of the scoped rayon pools instead of the global pool
| let mut results = vec![false; decompressed.len()]; | ||
|
|
||
| let batch_results = decompressed | ||
| .par_chunks(DEPOSIT_SIGNATURE_BATCH_SIZE) |
There was a problem hiding this comment.
I think we should use a scoped rayon pool here as well
There was a problem hiding this comment.
I think instead of using the scoped rayon pool here, which would involve threading the task executor all the way to state_processing, we can instead spawn the tasks that trigger signature verification with the rayon pool. Implemented in 50cd378
There was a problem hiding this comment.
The only place where we might call rayon without a scoped pool is in process_deposit_requests when we have cache misses for the signature verification. I think that is okay.
| /// This can be significantly slower if there are many builder deposits | ||
| /// that need to be onboarded at the fork boundary. This variant should be used | ||
| /// for tests and other non-production paths. | ||
| FullVerification, |
There was a problem hiding this comment.
it looks like we are only running tests for GloasVerificationContext::FullVerification, would be nice to write tests for the other two variants if possible
|
@eserilev Removed the |
|
Some required checks have failed. Could you please take a look @pawanjay176? 🙏 |
| // it is `O(n * m)` where `n` is max 8192 and `m` is max 128M. | ||
| fn is_pending_validator<E: EthSpec>( | ||
| state: &BeaconState<E>, | ||
| #[instrument(skip_all, level = "debug")] |
There was a problem hiding this comment.
this may create a large number of spans, we probably don't need the span per validator?
|
Note to reviewer: Changed The observation was that with a bigger sized We now cache all reusable indices in a first sweep before reusing anything and that dropped the time to insert with big builder count much more manageable. Again, this is highly unlikely on mainnet. |
|
Some required checks have failed. Could you please take a look @pawanjay176? 🙏 |
| } | ||
|
|
||
| builder_deposit_keys.push(key); | ||
| builder_deposits.push(deposit_data); |
There was a problem hiding this comment.
i thought about whether its worth de-dup here, but it seems like the risk and potential impact is low?
There was a problem hiding this comment.
Yeah i was considering getting rid of cache_pending_deposits post fork so this could make it easier.
I'm happy to consolidate logic in one place though.
| } | ||
|
|
||
| /// Transform a `Fulu` state into a `Gloas` state. | ||
| #[instrument(skip_all)] |
There was a problem hiding this comment.
Do you see this span when you test it locally?
I think we might have to rename the advance_head span to lh_advance_head so it gets exported to tempo.
There was a problem hiding this comment.
I'm happy to just drop it too. I have already tested and benchmarked it with way worse cases than we'll ever see on mainnet and this happens just once at the fork transtion. I think its fine to remove it.
| // perform the signature verification in batches. | ||
| // We have until the fork transition for the cache to be used, so we use the low priority pool. | ||
| executor.spawn_blocking_with_rayon( | ||
| move || cache.add_new_pending_deposits::<T::EthSpec>(&state, &spec), |
There was a problem hiding this comment.
Is this pretty much a no-op after the fork?
There was a problem hiding this comment.
yeah it is. Hadn't considered it. can potentially only do this for pre-gloas states and then delete it post gloas
|
|
||
| for (index, builder) in state_builders.iter().enumerate() { | ||
| builder_index_map.insert(builder.pubkey, index as BuilderIndex); | ||
| if builder.withdrawable_epoch <= current_epoch && builder.balance == 0 { |
There was a problem hiding this comment.
if the builder tops up in the same block and its balance increases, then we could accidentally make this index reusable right? is this possible
lighthouse/consensus/state_processing/src/per_block_processing/process_operations.rs
Lines 1052 to 1058 in 70b8594
There was a problem hiding this comment.
Fixed in 656e70a and added a test as well. Really great catch. I'm going to try and upstream this to the EF tests
| // perform the signature verification in batches. | ||
| executor.spawn_blocking_with_rayon( | ||
| move || cache.cache_deposit_requests(&deposits, &spec), | ||
| task_executor::RayonPoolType::HighPriority, |
There was a problem hiding this comment.
This isn't work in the hot path, however i think its fine leaving it high prio, as we want to be ready asap in case if the payload arrive late in the slot? is this what you were thinking?
There was a problem hiding this comment.
Yeah pretty much. Better to have everything verified to reduce cache misses in case of late envelopes
| } | ||
|
|
||
| /// Helper to create a harness with Fulu genesis and gloas at a later epoch. | ||
| async fn get_fulu_harness_with_gloas_scheduled<E: EthSpec>( |
There was a problem hiding this comment.
nit: move this to the top near the other similar functions
|
Looks like the failing test here revealed a bug, and the invalid deposit got added to pending deposits. Might want to skip ( https://github.com/sigp/lighthouse/actions/runs/26260083391/job/77291404904?pr=9311 |
Issue Addressed
N/A
Proposed Changes
Adds an
OnboardBuildersCacheto the beacon chain to pre-verify and cache builder deposits. Caching is important in 2 places:onboard_builders_from_pending_depositsis a fork transition function that scales with the number of pending deposits. Under worst case, the pending deposits queue can be dos'd with a number of 1eth deposits to make nodes do more work verifying it at the fork boundary. Even though the pending_deposits queue is effectively capped by the gas limit, this cache makes even a theoretic attack ineffective by doing the full verification in miliseconds instead of seconds.Some numbers claude cooked up
process_operationsmay need to verify all builder deposits in the hot path. the engine api currently allows max 8192 deposit requests to be sent for block production, so in worst case, we may need to verify 8192 signatures during block processing. The deposits we need to process are received in the payload envelope ~6 seconds into the slot. we process these deposits when a new beacon block that builds on the payload arrives ~3 seconds into the next slot. So we have a lot of time to verify these signatures before we actually need to process themThe cache is threaded to both the
per_slot_processingfor the first case andper_block_processingfor the second case.Additional Info
tested with the following kurtosis config: