refactor: break circular dependency over net_processing and dkgsessionhandler#7314
refactor: break circular dependency over net_processing and dkgsessionhandler#7314knst wants to merge 11 commits intodashpay:developfrom
Conversation
It shows the hidden circular dependency and tidy up list of includes
- removed method CDKGPendingMessages::Misbehaving(NodeId, int, PeerManager&), ProcessPendingMessageBatch calls peerman.Misbehaving(...) directly - renamed PushPendingMessage<Message>(NodeId, Message&, PeerManager&) to PushOwnPendingMessage for clear distinction of path with node=-1 (self made)
…from PeerManager Re-ordered initialization of PeerManager and ActiveContext / ObserverContext, PeerManager::make now takes nodeman raw ptr (or nullptr). It resolves several circular dependencies over net_processing and removes several unique_ptr<T&> work-arounds from PeerManager
It helps to drop dependency of llmq/dkgsessionhandler on network code
- moved implementation of ProcessMessage and AlreadyHave to NetDKG - drop usages of MessageProcessingResult in CDKGSessionManager - introduced a new helper DoForHandler
|
Review GateCommit:
|
WalkthroughThis PR refactors LLMQ DKG (Distributed Key Generation) handling in Dash by separating network management concerns from context objects. The main changes migrate DKG phase operations from enqueueing messages into Sequence Diagram(s)The conditions for generating sequence diagrams are met. This PR introduces significant control flow changes with multi-component interactions across network message handling, phase execution, and context initialization. sequenceDiagram
participant Node as DKG Node
participant NetDKG as NetDKG Handler
participant SessionMgr as CDKGSessionManager
participant SessionHdlr as CDKGSessionHandler
participant ActiveDKG as ActiveDKGSession
Node->>NetDKG: ProcessMessage(QCONTRIB)
NetDKG->>SessionMgr: ForEachHandler(route message)
SessionMgr->>SessionHdlr: PushPendingMessage(serialized)
Note over SessionHdlr: Batches message in queue
NetDKG->>NetDKG: HandleDKGRound()
loop per_phase
NetDKG->>SessionHdlr: ProcessPendingMessageBatch()
SessionHdlr->>ActiveDKG: Contribute()
ActiveDKG-->>SessionHdlr: std::optional<CDKGContribution>
SessionHdlr->>NetDKG: RelayInvToParticipants()
end
sequenceDiagram
participant Init as Initialization
participant ActiveCtx as ActiveContext
participant PeerMgr as PeerManager
participant NetDKG as NetDKG Handler
participant Spork as CSporkManager
Init->>ActiveCtx: construct(dmnman, qman, qsnapman, sigman)
Init->>PeerMgr: make(nodeman=active_ctx->nodeman.get())
Note over PeerMgr: m_nodeman set once
Init->>NetDKG: construct(sporkman, dkgsman, qman)
Init->>Spork: IsQuorumDKGEnabled()
Note over NetDKG: Check spork for DKG enabled
Init->>ActiveCtx: Start()
Note over ActiveCtx: No connman/peerman params
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (3)
src/active/dkgsession.cpp (1)
106-111: ⚡ Quick winMove the
sent*debug updates to the actual enqueue/broadcast path.These methods now only build and return a message. Setting
sentContributions,sentComplaint,sentJustification, andsentPrematureCommitmenthere records a successful send beforeNetDKGhas actually serialized and queued/broadcast the payload.Also applies to: 292-297, 382-387, 539-544
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/active/dkgsession.cpp` around lines 106 - 111, The dkgDebugManager.UpdateLocalSessionStatus calls inside the message-builder functions (e.g., setting CDKGDebugSessionStatus::statusBits.sentContributions, sentComplaint, sentJustification, sentPrematureCommitment) must be removed from those builders (the functions that build and return qc/messages) and moved into the actual send path inside NetDKG — i.e., the code that serializes and enqueues/broadcasts the payload. Locate the UpdateLocalSessionStatus calls in the builders and delete them there, then add equivalent UpdateLocalSessionStatus updates immediately after NetDKG performs the serialization/queuing/broadcast so the debug flags reflect a real successful send.src/llmq/debug.cpp (1)
213-228: 💤 Low valueOptional: make
MarkAbortedidempotent w.r.t.nTime.
MarkAborted's lambda always returnstrue, so each call bumpslocalStatus.nTimeeven when the session was already marked aborted.MarkPhaseAdvancedalready does the right thing (returnschanged). For consistency and to avoid spurious timestamp updates if the helper is invoked more than once on the same aborted session, consider returning a real changed flag.♻️ Proposed change
void CDKGDebugManager::MarkAborted(Consensus::LLMQType llmqType, int quorumIndex) { UpdateLocalSessionStatus(llmqType, quorumIndex, [&](CDKGDebugSessionStatus& status) { + if (status.statusBits.aborted) return false; status.statusBits.aborted = true; return true; }); }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/llmq/debug.cpp` around lines 213 - 228, MarkAborted currently always returns true from its UpdateLocalSessionStatus lambda which forces localStatus.nTime to update every call; change the lambda in CDKGDebugManager::MarkAborted to compute a changed flag by comparing status.statusBits.aborted with the new value, set status.statusBits.aborted = true, and return that changed flag (i.e., return status.statusBits.aborted was previously false). This makes MarkAborted idempotent like MarkPhaseAdvanced and avoids spurious nTime updates.src/llmq/net_dkg.cpp (1)
449-482: 💤 Low valueInconsistent
dynamic_castusage betweenStart()andInterrupt(); consider tightening shutdown.
Start()uses the throwing reference form (dynamic_cast<ActiveDKGSessionHandler&>) whileInterrupt()uses the safe pointer form. Both iterate the same handler set and both early-return onm_active == nullptr, so the invariant is identical and the two should agree.The reference form also has a small resilience gap: if the cast were ever to throw mid-iteration, the threads already pushed into
m_phase_threadswould never be joined, because~NetDKG()only callsDisconnectManagers()(line 254), notStop(). Either use the pointer form here as well, or have the destructor callStop()defensively so a partially-initialized state still cleans up.♻️ Proposed alignment with `Interrupt()`
m_qdkgsman.ForEachHandler([this](CDKGSessionHandler& base) { - auto& handler = dynamic_cast<ActiveDKGSessionHandler&>(base); - std::string thread_name = strprintf("llmq-%d-%d", std23::to_underlying(handler.params.type), handler.QuorumIndex()); - m_phase_threads.emplace_back([this, name = std::move(thread_name), &handler] { - util::TraceThread(name.c_str(), [this, &handler] { PhaseHandlerThread(handler); }); - }); + auto* handler = dynamic_cast<ActiveDKGSessionHandler*>(&base); + if (!Assume(handler != nullptr)) return; + std::string thread_name = strprintf("llmq-%d-%d", std23::to_underlying(handler->params.type), handler->QuorumIndex()); + m_phase_threads.emplace_back([this, name = std::move(thread_name), handler] { + util::TraceThread(name.c_str(), [this, handler] { PhaseHandlerThread(*handler); }); + }); });🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/llmq/net_dkg.cpp` around lines 449 - 482, Start() uses dynamic_cast<ActiveDKGSessionHandler&> which can throw partway through filling m_phase_threads and leave threads unjoined; make Start() mirror Interrupt() by using the non-throwing pointer form (dynamic_cast<ActiveDKGSessionHandler*>) when iterating m_qdkgsman.ForEachHandler so you only create threads for valid handlers and avoid exceptions during the loop, ensuring m_phase_threads remains consistent for later Stop() join; update the lambda in NetDKG::Start to check the pointer, capture it safely, and call PhaseHandlerThread(handler) with the pointer/ref as appropriate.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@src/active/dkgsession.cpp`:
- Around line 106-111: The dkgDebugManager.UpdateLocalSessionStatus calls inside
the message-builder functions (e.g., setting
CDKGDebugSessionStatus::statusBits.sentContributions, sentComplaint,
sentJustification, sentPrematureCommitment) must be removed from those builders
(the functions that build and return qc/messages) and moved into the actual send
path inside NetDKG — i.e., the code that serializes and enqueues/broadcasts the
payload. Locate the UpdateLocalSessionStatus calls in the builders and delete
them there, then add equivalent UpdateLocalSessionStatus updates immediately
after NetDKG performs the serialization/queuing/broadcast so the debug flags
reflect a real successful send.
In `@src/llmq/debug.cpp`:
- Around line 213-228: MarkAborted currently always returns true from its
UpdateLocalSessionStatus lambda which forces localStatus.nTime to update every
call; change the lambda in CDKGDebugManager::MarkAborted to compute a changed
flag by comparing status.statusBits.aborted with the new value, set
status.statusBits.aborted = true, and return that changed flag (i.e., return
status.statusBits.aborted was previously false). This makes MarkAborted
idempotent like MarkPhaseAdvanced and avoids spurious nTime updates.
In `@src/llmq/net_dkg.cpp`:
- Around line 449-482: Start() uses dynamic_cast<ActiveDKGSessionHandler&> which
can throw partway through filling m_phase_threads and leave threads unjoined;
make Start() mirror Interrupt() by using the non-throwing pointer form
(dynamic_cast<ActiveDKGSessionHandler*>) when iterating
m_qdkgsman.ForEachHandler so you only create threads for valid handlers and
avoid exceptions during the loop, ensuring m_phase_threads remains consistent
for later Stop() join; update the lambda in NetDKG::Start to check the pointer,
capture it safely, and call PhaseHandlerThread(handler) with the pointer/ref as
appropriate.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 757eb414-ab77-46e9-b643-a3f32d98e788
📒 Files selected for processing (25)
src/Makefile.amsrc/active/context.cppsrc/active/context.hsrc/active/dkgsession.cppsrc/active/dkgsession.hsrc/active/dkgsessionhandler.cppsrc/active/dkgsessionhandler.hsrc/init.cppsrc/llmq/debug.cppsrc/llmq/debug.hsrc/llmq/dkgsession.hsrc/llmq/dkgsessionhandler.cppsrc/llmq/dkgsessionhandler.hsrc/llmq/dkgsessionmgr.cppsrc/llmq/dkgsessionmgr.hsrc/llmq/net_dkg.cppsrc/llmq/net_dkg.hsrc/llmq/observer.cppsrc/llmq/observer.hsrc/llmq/options.cppsrc/llmq/options.hsrc/net_processing.cppsrc/net_processing.hsrc/test/util/setup_common.cpptest/lint/lint-circular-dependencies.py
💤 Files with no reviewable changes (1)
- test/lint/lint-circular-dependencies.py
Issue being fixed or feature implemented
This PR is continuous of #7247
This PR is not direct dependency of kernel project.
This PR aim to resolve next issues:
constructor of PeerManager uses references to unique_ptr to multiple objects that will be initialized later, such as:
That's a fragile design that has multiple assumptions about already initialized members and their life term
What was done?
CDKGSessionManageris reduced to a pure state class, it owns DB and provides 2 new helper:ForEachHandler/DoForHandlerCDKGSessionHandlerandActiveDKGSessionHandlerloses its threading andProcessMessagemembersMessageProcessingResultusages are dropped from llmq/ consensus codeNetDKGis introduced which takes responsibilities for p2p communications for DKG works and for running threadsHow Has This Been Tested?
Removed circular dependency over
dkgsessionhandler <-> net_processingBreaking Changes
N/A
Checklist: