Skip to content

refactor: break circular dependency over net_processing and dkgsessionhandler#7314

Open
knst wants to merge 11 commits intodashpay:developfrom
knst:refactor-peermanager-handlers-dkg
Open

refactor: break circular dependency over net_processing and dkgsessionhandler#7314
knst wants to merge 11 commits intodashpay:developfrom
knst:refactor-peermanager-handlers-dkg

Conversation

@knst
Copy link
Copy Markdown
Collaborator

@knst knst commented May 8, 2026

Issue being fixed or feature implemented

This PR is continuous of #7247
This PR is not direct dependency of kernel project.

This PR aim to resolve next issues:

  1. constructor of PeerManager uses references to unique_ptr to multiple objects that will be initialized later, such as:

    const std::unique_ptr<ActiveContext>& active_ctx,
    const std::unique_ptr<CDeterministicMNManager>& dmnman,
    const std::unique_ptr<CJWalletManager>& cj_walletman,
    const std::unique_ptr<LLMQContext>& llmq_ctx,
    const std::unique_ptr<llmq::ObserverContext>& observer_ctx,
    

That's a fragile design that has multiple assumptions about already initialized members and their life term

  1. Implementation of state machine for DKG mechanism and p2p implementation is tightly connected.

What was done?

  • CDKGSessionManager is reduced to a pure state class, it owns DB and provides 2 new helper: ForEachHandler / DoForHandler
  • CDKGSessionHandler and ActiveDKGSessionHandler loses its threading and ProcessMessage members
  • MessageProcessingResult usages are dropped from llmq/ consensus code
  • PeerManager forgot about Observer/ActiveContext and lost 2 unique_ptr& from constructor
  • new NetHandler NetDKG is introduced which takes responsibilities for p2p communications for DKG works and for running threads

How Has This Been Tested?

  • Run unit, functional tests
  • Run test/lint/lint-circular-dependencies.py linter

Removed circular dependency over dkgsessionhandler <-> net_processing

Breaking Changes

N/A

Checklist:

  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have added or updated relevant unit/integration/functional/e2e tests
  • I have made corresponding changes to the documentation
  • I have assigned this pull request to a milestone (for repository code-owners and collaborators only)

knst added 11 commits May 8, 2026 21:30
It shows the hidden circular dependency and tidy up list of includes
- removed method CDKGPendingMessages::Misbehaving(NodeId, int, PeerManager&), ProcessPendingMessageBatch calls peerman.Misbehaving(...) directly
- renamed PushPendingMessage<Message>(NodeId, Message&, PeerManager&) to PushOwnPendingMessage for clear distinction of path with node=-1 (self made)
…from PeerManager

Re-ordered initialization of PeerManager and ActiveContext / ObserverContext, PeerManager::make now takes nodeman raw ptr (or nullptr).

It resolves several circular dependencies over net_processing and removes several unique_ptr<T&> work-arounds from PeerManager
It helps to drop dependency of llmq/dkgsessionhandler on network code
 - moved implementation of ProcessMessage and AlreadyHave to NetDKG
 - drop usages of MessageProcessingResult in CDKGSessionManager
 - introduced a new helper DoForHandler
@knst knst added this to the 24 milestone May 8, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 8, 2026

⚠️ Potential Merge Conflicts Detected

This PR has potential conflicts with the following open PRs:

Please coordinate with the authors of these PRs to avoid merge conflicts.

@thepastaclaw
Copy link
Copy Markdown

thepastaclaw commented May 8, 2026

Review Gate

Commit: 53be42b2

  • Debounce: 1729m ago (need 30m)

  • CI checks: build failure: linux64_multiprocess-build / Build source, mac-build / Build source, linux64_tsan-build / Build source

  • CodeRabbit review: comment found

  • Off-peak hours: off-peak (12:57 PM PT Saturday)

  • Run review now (check to override)

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 8, 2026

Review Change Stack

Walkthrough

This PR refactors LLMQ DKG (Distributed Key Generation) handling in Dash by separating network management concerns from context objects. The main changes migrate DKG phase operations from enqueueing messages into CDKGPendingMessages to returning std::optional<Message> types; introduce a new NetDKG handler class that owns thread management and network routing; simplify context constructors by removing BLS worker, masternode metadata, and quorum block processor dependencies; and decouple PeerManager from ActiveContext/ObserverContext by using a raw CActiveMasternodeManager* pointer for masternode detection. Related infrastructure changes include adding debug manager tracking methods, a spork-based DKG enablement check, and build configuration updates to include the new network handler sources.

Sequence Diagram(s)

The conditions for generating sequence diagrams are met. This PR introduces significant control flow changes with multi-component interactions across network message handling, phase execution, and context initialization.

sequenceDiagram
  participant Node as DKG Node
  participant NetDKG as NetDKG Handler
  participant SessionMgr as CDKGSessionManager
  participant SessionHdlr as CDKGSessionHandler
  participant ActiveDKG as ActiveDKGSession
  
  Node->>NetDKG: ProcessMessage(QCONTRIB)
  NetDKG->>SessionMgr: ForEachHandler(route message)
  SessionMgr->>SessionHdlr: PushPendingMessage(serialized)
  Note over SessionHdlr: Batches message in queue
  
  NetDKG->>NetDKG: HandleDKGRound()
  loop per_phase
    NetDKG->>SessionHdlr: ProcessPendingMessageBatch()
    SessionHdlr->>ActiveDKG: Contribute()
    ActiveDKG-->>SessionHdlr: std::optional<CDKGContribution>
    SessionHdlr->>NetDKG: RelayInvToParticipants()
  end
Loading
sequenceDiagram
  participant Init as Initialization
  participant ActiveCtx as ActiveContext
  participant PeerMgr as PeerManager
  participant NetDKG as NetDKG Handler
  participant Spork as CSporkManager
  
  Init->>ActiveCtx: construct(dmnman, qman, qsnapman, sigman)
  Init->>PeerMgr: make(nodeman=active_ctx->nodeman.get())
  Note over PeerMgr: m_nodeman set once
  Init->>NetDKG: construct(sporkman, dkgsman, qman)
  Init->>Spork: IsQuorumDKGEnabled()
  Note over NetDKG: Check spork for DKG enabled
  Init->>ActiveCtx: Start()
  Note over ActiveCtx: No connman/peerman params
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Suggested reviewers

  • UdjinM6
  • kwvg
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 11.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely describes the main refactoring objective: breaking the circular dependency between net_processing and dkgsessionhandler.
Description check ✅ Passed The description is comprehensive and directly related to the changeset, explaining the issues being fixed, what was done, and how it was tested.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (3)
src/active/dkgsession.cpp (1)

106-111: ⚡ Quick win

Move the sent* debug updates to the actual enqueue/broadcast path.

These methods now only build and return a message. Setting sentContributions, sentComplaint, sentJustification, and sentPrematureCommitment here records a successful send before NetDKG has actually serialized and queued/broadcast the payload.

Also applies to: 292-297, 382-387, 539-544

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/active/dkgsession.cpp` around lines 106 - 111, The
dkgDebugManager.UpdateLocalSessionStatus calls inside the message-builder
functions (e.g., setting CDKGDebugSessionStatus::statusBits.sentContributions,
sentComplaint, sentJustification, sentPrematureCommitment) must be removed from
those builders (the functions that build and return qc/messages) and moved into
the actual send path inside NetDKG — i.e., the code that serializes and
enqueues/broadcasts the payload. Locate the UpdateLocalSessionStatus calls in
the builders and delete them there, then add equivalent UpdateLocalSessionStatus
updates immediately after NetDKG performs the serialization/queuing/broadcast so
the debug flags reflect a real successful send.
src/llmq/debug.cpp (1)

213-228: 💤 Low value

Optional: make MarkAborted idempotent w.r.t. nTime.

MarkAborted's lambda always returns true, so each call bumps localStatus.nTime even when the session was already marked aborted. MarkPhaseAdvanced already does the right thing (returns changed). For consistency and to avoid spurious timestamp updates if the helper is invoked more than once on the same aborted session, consider returning a real changed flag.

♻️ Proposed change
 void CDKGDebugManager::MarkAborted(Consensus::LLMQType llmqType, int quorumIndex)
 {
     UpdateLocalSessionStatus(llmqType, quorumIndex, [&](CDKGDebugSessionStatus& status) {
+        if (status.statusBits.aborted) return false;
         status.statusBits.aborted = true;
         return true;
     });
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/llmq/debug.cpp` around lines 213 - 228, MarkAborted currently always
returns true from its UpdateLocalSessionStatus lambda which forces
localStatus.nTime to update every call; change the lambda in
CDKGDebugManager::MarkAborted to compute a changed flag by comparing
status.statusBits.aborted with the new value, set status.statusBits.aborted =
true, and return that changed flag (i.e., return status.statusBits.aborted was
previously false). This makes MarkAborted idempotent like MarkPhaseAdvanced and
avoids spurious nTime updates.
src/llmq/net_dkg.cpp (1)

449-482: 💤 Low value

Inconsistent dynamic_cast usage between Start() and Interrupt(); consider tightening shutdown.

Start() uses the throwing reference form (dynamic_cast<ActiveDKGSessionHandler&>) while Interrupt() uses the safe pointer form. Both iterate the same handler set and both early-return on m_active == nullptr, so the invariant is identical and the two should agree.

The reference form also has a small resilience gap: if the cast were ever to throw mid-iteration, the threads already pushed into m_phase_threads would never be joined, because ~NetDKG() only calls DisconnectManagers() (line 254), not Stop(). Either use the pointer form here as well, or have the destructor call Stop() defensively so a partially-initialized state still cleans up.

♻️ Proposed alignment with `Interrupt()`
     m_qdkgsman.ForEachHandler([this](CDKGSessionHandler& base) {
-        auto& handler = dynamic_cast<ActiveDKGSessionHandler&>(base);
-        std::string thread_name = strprintf("llmq-%d-%d", std23::to_underlying(handler.params.type), handler.QuorumIndex());
-        m_phase_threads.emplace_back([this, name = std::move(thread_name), &handler] {
-            util::TraceThread(name.c_str(), [this, &handler] { PhaseHandlerThread(handler); });
-        });
+        auto* handler = dynamic_cast<ActiveDKGSessionHandler*>(&base);
+        if (!Assume(handler != nullptr)) return;
+        std::string thread_name = strprintf("llmq-%d-%d", std23::to_underlying(handler->params.type), handler->QuorumIndex());
+        m_phase_threads.emplace_back([this, name = std::move(thread_name), handler] {
+            util::TraceThread(name.c_str(), [this, handler] { PhaseHandlerThread(*handler); });
+        });
     });
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/llmq/net_dkg.cpp` around lines 449 - 482, Start() uses
dynamic_cast<ActiveDKGSessionHandler&> which can throw partway through filling
m_phase_threads and leave threads unjoined; make Start() mirror Interrupt() by
using the non-throwing pointer form (dynamic_cast<ActiveDKGSessionHandler*>)
when iterating m_qdkgsman.ForEachHandler so you only create threads for valid
handlers and avoid exceptions during the loop, ensuring m_phase_threads remains
consistent for later Stop() join; update the lambda in NetDKG::Start to check
the pointer, capture it safely, and call PhaseHandlerThread(handler) with the
pointer/ref as appropriate.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@src/active/dkgsession.cpp`:
- Around line 106-111: The dkgDebugManager.UpdateLocalSessionStatus calls inside
the message-builder functions (e.g., setting
CDKGDebugSessionStatus::statusBits.sentContributions, sentComplaint,
sentJustification, sentPrematureCommitment) must be removed from those builders
(the functions that build and return qc/messages) and moved into the actual send
path inside NetDKG — i.e., the code that serializes and enqueues/broadcasts the
payload. Locate the UpdateLocalSessionStatus calls in the builders and delete
them there, then add equivalent UpdateLocalSessionStatus updates immediately
after NetDKG performs the serialization/queuing/broadcast so the debug flags
reflect a real successful send.

In `@src/llmq/debug.cpp`:
- Around line 213-228: MarkAborted currently always returns true from its
UpdateLocalSessionStatus lambda which forces localStatus.nTime to update every
call; change the lambda in CDKGDebugManager::MarkAborted to compute a changed
flag by comparing status.statusBits.aborted with the new value, set
status.statusBits.aborted = true, and return that changed flag (i.e., return
status.statusBits.aborted was previously false). This makes MarkAborted
idempotent like MarkPhaseAdvanced and avoids spurious nTime updates.

In `@src/llmq/net_dkg.cpp`:
- Around line 449-482: Start() uses dynamic_cast<ActiveDKGSessionHandler&> which
can throw partway through filling m_phase_threads and leave threads unjoined;
make Start() mirror Interrupt() by using the non-throwing pointer form
(dynamic_cast<ActiveDKGSessionHandler*>) when iterating
m_qdkgsman.ForEachHandler so you only create threads for valid handlers and
avoid exceptions during the loop, ensuring m_phase_threads remains consistent
for later Stop() join; update the lambda in NetDKG::Start to check the pointer,
capture it safely, and call PhaseHandlerThread(handler) with the pointer/ref as
appropriate.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 757eb414-ab77-46e9-b643-a3f32d98e788

📥 Commits

Reviewing files that changed from the base of the PR and between 5fd84aa and 53be42b.

📒 Files selected for processing (25)
  • src/Makefile.am
  • src/active/context.cpp
  • src/active/context.h
  • src/active/dkgsession.cpp
  • src/active/dkgsession.h
  • src/active/dkgsessionhandler.cpp
  • src/active/dkgsessionhandler.h
  • src/init.cpp
  • src/llmq/debug.cpp
  • src/llmq/debug.h
  • src/llmq/dkgsession.h
  • src/llmq/dkgsessionhandler.cpp
  • src/llmq/dkgsessionhandler.h
  • src/llmq/dkgsessionmgr.cpp
  • src/llmq/dkgsessionmgr.h
  • src/llmq/net_dkg.cpp
  • src/llmq/net_dkg.h
  • src/llmq/observer.cpp
  • src/llmq/observer.h
  • src/llmq/options.cpp
  • src/llmq/options.h
  • src/net_processing.cpp
  • src/net_processing.h
  • src/test/util/setup_common.cpp
  • test/lint/lint-circular-dependencies.py
💤 Files with no reviewable changes (1)
  • test/lint/lint-circular-dependencies.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants