Skip to content

feature:Support server-side load balancing for TC-to-RM reverse communication#8102

Open
YvCeung wants to merge 8 commits into
apache:2.xfrom
YvCeung:load_balance_to_client
Open

feature:Support server-side load balancing for TC-to-RM reverse communication#8102
YvCeung wants to merge 8 commits into
apache:2.xfrom
YvCeung:load_balance_to_client

Conversation

@YvCeung

@YvCeung YvCeung commented May 15, 2026

Copy link
Copy Markdown
Contributor

Ⅰ. Describe what this PR did

This PR introduces server-side load balancing for TC-to-RM reverse communication (branch commit/rollback), addressing the issue where ChannelManager.getChannel() always selects the first available channel with no distribution strategy, causing requests to concentrate on specific RM instances.

Core changes:

  1. New ServerLoadBalance SPI interface

  2. Three LB strategy implementations:

    • ServerRandomLoadBalance
    • ServerRoundRobinLoadBalance
    • ServerLeastActiveLoadBalance
  3. Simplified configuration system (ServerLoadBalanceFactory): Only two configuration keys:

    • server.loadBalance.at.type — LB algorithm for AT mode
    • server.loadBalance.tcc.type — LB algorithm for TCC mode

    If not configured or configured as blank, the original priority-based channel selection logic is used (fully backward compatible). If explicitly configured, the specified LB algorithm is loaded via SPI.

  4. XA and SAGA explicitly excluded: XA's second-phase operations are bound to the local database connection of the original RM; SAGA's state machine execution context is held in memory with no distributed lock protection. Both modes always use the original priority-based channel selection.

  5. ChannelManager refactoring: getChannel() collects all active candidate channels and applies the LB algorithm when configured; getRmChannels() applies LB per resourceId. The original getChannelByPriority() logic is preserved as the default fallback path.

Ⅱ. Does this pull request fix one issue?

fix #7758

Ⅲ. Why don't you add test cases (unit test/integration test)?

Unit tests are included:

  • ServerLoadBalanceFactoryTest — verifies that XA/SAGA return null, unconfigured AT/TCC return null, SPI loading works for all three strategies, and invalid type names are handled gracefully
  • ServerLoadBalanceBehaviorTest — verifies randomness for RandomLoadBalance, even distribution for RoundRobinLoadBalance, least-active selection for LeastActiveLoadBalance, and single-candidate edge case

Ⅳ. Describe how to verify it

Ⅴ. Special notes for reviews

  • Backward compatibility: When no LB type is configured (the default), the behavior is identical to the original logic — getChannelByPriority() is used. Zero risk for existing deployments.
  • Why not reuse LoadBalance interface from discovery module: LoadBalance operates on InetSocketAddress and lives in the discovery module. ChannelManager is in core. Making core depend on discovery would create a circular dependency. The server-side context (RpcContext with channel, applicationId, activeCount) is fundamentally different from the client-side context.
  • Why ConsistentHash is not included: TC-to-RM requests are driven by the TC server. Server-side load balancing is opt-in via configuration (server.loadBalance.tcc.type). When enabled, it directly selects from all active RM candidates; when disabled, it falls back to the original exact-priority matching logic. Session affinity based on xid is unnecessary for AT mode (stateless second-phase) and does not fit TCC's current design where LB and priority-based selection are mutually exclusive strategies rather than layered fallbacks.

Copilot AI review requested due to automatic review settings May 15, 2026 10:50

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds an opt-in, server-side load-balancing mechanism for TC→RM reverse calls (branch commit/rollback) so the TC can distribute requests across multiple active RM connections instead of always picking the first available channel.

Changes:

  • Introduces a ServerLoadBalance SPI and factory with configurable strategies for AT/TCC (Random, RoundRobin, LeastActive), with XA/SAGA excluded by design.
  • Refactors ChannelManager channel selection to optionally apply server-side LB and updates TC→RM sync request path to pass transaction context.
  • Adds RpcContext.activeCount tracking (for least-active) and unit tests validating factory behavior and strategy selection.

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
server/src/main/java/org/apache/seata/server/coordinator/DefaultCoordinator.java Uses AT-aware RM channel selection for undo-log delete routing.
server/src/main/java/org/apache/seata/server/coordinator/AbstractCore.java Passes xid/branchType into the remoting sync request path for server-side LB eligibility.
core/src/main/java/org/apache/seata/core/rpc/RpcContext.java Adds activeCount to support least-active strategy selection.
core/src/main/java/org/apache/seata/core/rpc/RemotingServer.java Adds an overload (default method) for sending sync requests with transaction context.
core/src/main/java/org/apache/seata/core/rpc/netty/AbstractNettyRemotingServer.java Overrides the new sync request overload and updates activeCount around sync calls.
core/src/main/java/org/apache/seata/core/rpc/netty/ChannelManager.java Adds LB-aware channel selection and a branchType-aware getRmChannels.
core/src/main/java/org/apache/seata/core/rpc/netty/loadbalance/ServerLoadBalance.java New SPI interface for server-side channel selection.
core/src/main/java/org/apache/seata/core/rpc/netty/loadbalance/ServerLoadBalanceFactory.java Loads configured LB implementations for AT/TCC via SPI; falls back when unset/invalid.
core/src/main/java/org/apache/seata/core/rpc/netty/loadbalance/ServerRandomLoadBalance.java Random strategy implementation.
core/src/main/java/org/apache/seata/core/rpc/netty/loadbalance/ServerRoundRobinLoadBalance.java Round-robin strategy implementation.
core/src/main/java/org/apache/seata/core/rpc/netty/loadbalance/ServerLeastActiveLoadBalance.java Least-active strategy implementation using RpcContext.activeCount.
core/src/main/resources/META-INF/services/org.apache.seata.core.rpc.netty.loadbalance.ServerLoadBalance Registers the new LB SPI implementations.
core/src/test/java/org/apache/seata/core/rpc/netty/loadbalance/ServerLoadBalanceFactoryTest.java Adds unit tests for factory/SPI behavior.
core/src/test/java/org/apache/seata/core/rpc/netty/loadbalance/ServerLoadBalanceBehaviorTest.java Adds behavior tests for the three LB strategies.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +43 to +48
@Test
public void testAtNotConfiguredReturnsNull() {
// When no type is configured, should return null (use original logic)
ServerLoadBalance loadBalance = ServerLoadBalanceFactory.getInstance(BranchType.AT);
Assertions.assertNull(loadBalance);
}
Comment on lines +77 to +82
@Test
public void testSpiLoadInvalidTypeThrowsException() {
// Loading a non-existent LB type should throw EnhancedServiceNotFoundException
Assertions.assertThrows(
Exception.class, () -> EnhancedServiceLoader.load(ServerLoadBalance.class, "NonExistentLoadBalance"));
}
@codecov

codecov Bot commented May 15, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 39.39394% with 80 lines in your changes missing coverage. Please review.
✅ Project coverage is 72.57%. Comparing base (2f3de1c) to head (b1db50a).

Files with missing lines Patch % Lines
...rg/apache/seata/core/rpc/netty/ChannelManager.java 11.94% 55 Missing and 4 partials ⚠️
...ta/core/rpc/netty/AbstractNettyRemotingServer.java 0.00% 10 Missing ⚠️
...pc/netty/loadbalance/ServerLoadBalanceFactory.java 58.33% 4 Missing and 1 partial ⚠️
...netty/loadbalance/ServerRoundRobinLoadBalance.java 62.50% 1 Missing and 2 partials ⚠️
...ain/java/org/apache/seata/core/rpc/RpcContext.java 66.66% 2 Missing ⚠️
...etty/loadbalance/ServerLeastActiveLoadBalance.java 93.33% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##                2.x    #8102      +/-   ##
============================================
- Coverage     72.92%   72.57%   -0.36%     
+ Complexity      883      879       -4     
============================================
  Files          1327     1332       +5     
  Lines         50769    50895     +126     
  Branches       6058     6091      +33     
============================================
- Hits          37025    36935      -90     
- Misses        10735    10957     +222     
+ Partials       3009     3003       -6     
Files with missing lines Coverage Δ
...java/org/apache/seata/core/rpc/RemotingServer.java 100.00% <100.00%> (ø)
...rpc/netty/loadbalance/ServerRandomLoadBalance.java 100.00% <100.00%> (ø)
.../apache/seata/server/coordinator/AbstractCore.java 82.40% <100.00%> (+5.40%) ⬆️
...e/seata/server/coordinator/DefaultCoordinator.java 70.58% <100.00%> (+2.35%) ⬆️
...etty/loadbalance/ServerLeastActiveLoadBalance.java 93.33% <93.33%> (ø)
...ain/java/org/apache/seata/core/rpc/RpcContext.java 91.86% <66.66%> (-1.89%) ⬇️
...netty/loadbalance/ServerRoundRobinLoadBalance.java 62.50% <62.50%> (ø)
...pc/netty/loadbalance/ServerLoadBalanceFactory.java 58.33% <58.33%> (ø)
...ta/core/rpc/netty/AbstractNettyRemotingServer.java 81.74% <0.00%> (-7.05%) ⬇️
...rg/apache/seata/core/rpc/netty/ChannelManager.java 58.00% <11.94%> (-12.04%) ⬇️

... and 32 files with indirect coverage changes

Impacted file tree graph

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement Server-side Connection Management with Load Balancing for Client Calls

3 participants