Skip to content

executor, distsql: reduce the log of dispatching mpp tasks#68041

Open
gengliqi wants to merge 4 commits intopingcap:masterfrom
gengliqi:reduce-mpp-logs
Open

executor, distsql: reduce the log of dispatching mpp tasks#68041
gengliqi wants to merge 4 commits intopingcap:masterfrom
gengliqi:reduce-mpp-logs

Conversation

@gengliqi
Copy link
Copy Markdown
Contributor

@gengliqi gengliqi commented Apr 24, 2026

What problem does this PR solve?

Issue Number: close #68040

Problem Summary:

What changed and how does it work?

Aggregate the dispatching mpp task log from one line per task to one line per fragment.
The aggregated log now records task IDs and TiFlash store IDs instead of verbose addresses.
BTW, this PR also includes TiFlash compute nodes in the store info map in disaggregated mode, so zone-based traffic statistics will not miss them.

Run TPC-H Q9 with 2 TiFlash
Before

[2026/04/24 17:53:53.151 +08:00] [INFO] [local_mpp_coordinator.go:239] ["Dispatch mpp task"] [timestamp=465836292997120001] [ID=1] [QueryTs=1777024433145415964] [LocalQueryId=2] [ServerID=1253] [address=127.0.0.1:3931] [plan="Table(nation)->Send(2, )"] [mpp-version=3] [exchange-compression-mode=FAST] [GatherID=1] [resource_group=default]
[2026/04/24 17:53:53.151 +08:00] [INFO] [local_mpp_coordinator.go:239] ["Dispatch mpp task"] [timestamp=465836292997120001] [ID=2] [QueryTs=1777024433145415964] [LocalQueryId=2] [ServerID=1253] [address=127.0.0.1:3931] [plan="RightHashJoin{Recv(1, )->Table(supplier)}(tpch100.nation.n_nationkey,tpch100.supplier.s_nationkey)->Projection->Send(5, 6, )"] [mpp-version=3] [exchange-compression-mode=FAST] [GatherID=1] [resource_group=default]
[2026/04/24 17:53:53.151 +08:00] [INFO] [local_mpp_coordinator.go:239] ["Dispatch mpp task"] [timestamp=465836292997120001] [ID=3] [QueryTs=1777024433145415964] [LocalQueryId=2] [ServerID=1253] [address=127.0.0.1:3931] [plan="Table(lineitem)->Send(5, 6, )"] [mpp-version=3] [exchange-compression-mode=FAST] [GatherID=1] [resource_group=default]
[2026/04/24 17:53:53.151 +08:00] [INFO] [local_mpp_coordinator.go:239] ["Dispatch mpp task"] [timestamp=465836292997120001] [ID=4] [QueryTs=1777024433145415964] [LocalQueryId=2] [ServerID=1253] [address=127.0.0.1:3930] [plan="Table(lineitem)->Send(5, 6, )"] [mpp-version=3] [exchange-compression-mode=FAST] [GatherID=1] [resource_group=default]
[2026/04/24 17:53:53.151 +08:00] [INFO] [local_mpp_coordinator.go:239] ["Dispatch mpp task"] [timestamp=465836292997120001] [ID=5] [QueryTs=1777024433145415964] [LocalQueryId=2] [ServerID=1253] [address=127.0.0.1:3931] [plan="RightHashJoin{Recv(2, )->Recv(3, 4, )}(tpch100.supplier.s_suppkey,tpch100.lineitem.l_suppkey)->Projection->Send(9, 10, )"] [mpp-version=3] [exchange-compression-mode=FAST] [GatherID=1] [resource_group=default]
[2026/04/24 17:53:53.151 +08:00] [INFO] [local_mpp_coordinator.go:239] ["Dispatch mpp task"] [timestamp=465836292997120001] [ID=6] [QueryTs=1777024433145415964] [LocalQueryId=2] [ServerID=1253] [address=127.0.0.1:3930] [plan="RightHashJoin{Recv(2, )->Recv(3, 4, )}(tpch100.supplier.s_suppkey,tpch100.lineitem.l_suppkey)->Projection->Send(9, 10, )"] [mpp-version=3] [exchange-compression-mode=FAST] [GatherID=1] [resource_group=default]
[2026/04/24 17:53:53.151 +08:00] [INFO] [local_mpp_coordinator.go:239] ["Dispatch mpp task"] [timestamp=465836292997120001] [ID=7] [QueryTs=1777024433145415964] [LocalQueryId=2] [ServerID=1253] [address=127.0.0.1:3931] [plan="Table(part)->Sel([like(tpch100.part.p_name, %dim%, 92)])->Send(9, 10, )"] [mpp-version=3] [exchange-compression-mode=FAST] [GatherID=1] [resource_group=default]
[2026/04/24 17:53:53.151 +08:00] [INFO] [local_mpp_coordinator.go:239] ["Dispatch mpp task"] [timestamp=465836292997120001] [ID=8] [QueryTs=1777024433145415964] [LocalQueryId=2] [ServerID=1253] [address=127.0.0.1:3930] [plan="Table(part)->Sel([like(tpch100.part.p_name, %dim%, 92)])->Send(9, 10, )"] [mpp-version=3] [exchange-compression-mode=FAST] [GatherID=1] [resource_group=default]
[2026/04/24 17:53:53.151 +08:00] [INFO] [local_mpp_coordinator.go:239] ["Dispatch mpp task"] [timestamp=465836292997120001] [ID=9] [QueryTs=1777024433145415964] [LocalQueryId=2] [ServerID=1253] [address=127.0.0.1:3931] [plan="LeftHashJoin{Recv(5, 6, )->Recv(7, 8, )}(tpch100.lineitem.l_partkey,tpch100.part.p_partkey)->Projection->Send(13, 14, )"] [mpp-version=3] [exchange-compression-mode=FAST] [GatherID=1] [resource_group=default]
[2026/04/24 17:53:53.151 +08:00] [INFO] [local_mpp_coordinator.go:239] ["Dispatch mpp task"] [timestamp=465836292997120001] [ID=10] [QueryTs=1777024433145415964] [LocalQueryId=2] [ServerID=1253] [address=127.0.0.1:3930] [plan="LeftHashJoin{Recv(5, 6, )->Recv(7, 8, )}(tpch100.lineitem.l_partkey,tpch100.part.p_partkey)->Projection->Send(13, 14, )"] [mpp-version=3] [exchange-compression-mode=FAST] [GatherID=1] [resource_group=default]
[2026/04/24 17:53:53.151 +08:00] [INFO] [local_mpp_coordinator.go:239] ["Dispatch mpp task"] [timestamp=465836292997120001] [ID=11] [QueryTs=1777024433145415964] [LocalQueryId=2] [ServerID=1253] [address=127.0.0.1:3931] [plan="Table(partsupp)->Send(13, 14, )"] [mpp-version=3] [exchange-compression-mode=FAST] [GatherID=1] [resource_group=default]
[2026/04/24 17:53:53.152 +08:00] [INFO] [local_mpp_coordinator.go:239] ["Dispatch mpp task"] [timestamp=465836292997120001] [ID=12] [QueryTs=1777024433145415964] [LocalQueryId=2] [ServerID=1253] [address=127.0.0.1:3930] [plan="Table(partsupp)->Send(13, 14, )"] [mpp-version=3] [exchange-compression-mode=FAST] [GatherID=1] [resource_group=default]
[2026/04/24 17:53:53.152 +08:00] [INFO] [local_mpp_coordinator.go:239] ["Dispatch mpp task"] [timestamp=465836292997120001] [ID=13] [QueryTs=1777024433145415964] [LocalQueryId=2] [ServerID=1253] [address=127.0.0.1:3931] [plan="LeftHashJoin{Recv(9, 10, )->Recv(11, 12, )}(tpch100.lineitem.l_suppkey,tpch100.partsupp.ps_suppkey)(tpch100.lineitem.l_partkey,tpch100.partsupp.ps_partkey)->Projection->Send(17, 18, )"] [mpp-version=3] [exchange-compression-mode=FAST] [GatherID=1] [resource_group=default]
[2026/04/24 17:53:53.152 +08:00] [INFO] [local_mpp_coordinator.go:239] ["Dispatch mpp task"] [timestamp=465836292997120001] [ID=14] [QueryTs=1777024433145415964] [LocalQueryId=2] [ServerID=1253] [address=127.0.0.1:3930] [plan="LeftHashJoin{Recv(9, 10, )->Recv(11, 12, )}(tpch100.lineitem.l_suppkey,tpch100.partsupp.ps_suppkey)(tpch100.lineitem.l_partkey,tpch100.partsupp.ps_partkey)->Projection->Send(17, 18, )"] [mpp-version=3] [exchange-compression-mode=FAST] [GatherID=1] [resource_group=default]
[2026/04/24 17:53:53.152 +08:00] [INFO] [local_mpp_coordinator.go:239] ["Dispatch mpp task"] [timestamp=465836292997120001] [ID=15] [QueryTs=1777024433145415964] [LocalQueryId=2] [ServerID=1253] [address=127.0.0.1:3931] [plan="Table(orders)->Send(17, 18, )"] [mpp-version=3] [exchange-compression-mode=FAST] [GatherID=1] [resource_group=default]
[2026/04/24 17:53:53.152 +08:00] [INFO] [local_mpp_coordinator.go:239] ["Dispatch mpp task"] [timestamp=465836292997120001] [ID=16] [QueryTs=1777024433145415964] [LocalQueryId=2] [ServerID=1253] [address=127.0.0.1:3930] [plan="Table(orders)->Send(17, 18, )"] [mpp-version=3] [exchange-compression-mode=FAST] [GatherID=1] [resource_group=default]
[2026/04/24 17:53:53.152 +08:00] [INFO] [local_mpp_coordinator.go:239] ["Dispatch mpp task"] [timestamp=465836292997120001] [ID=17] [QueryTs=1777024433145415964] [LocalQueryId=2] [ServerID=1253] [address=127.0.0.1:3931] [plan="LeftHashJoin{Recv(13, 14, )->Recv(15, 16, )}(tpch100.lineitem.l_orderkey,tpch100.orders.o_orderkey)->Projection->Projection->Projection->HashAgg->Send(19, 20, )"] [mpp-version=3] [exchange-compression-mode=FAST] [GatherID=1] [resource_group=default]
[2026/04/24 17:53:53.152 +08:00] [INFO] [local_mpp_coordinator.go:239] ["Dispatch mpp task"] [timestamp=465836292997120001] [ID=18] [QueryTs=1777024433145415964] [LocalQueryId=2] [ServerID=1253] [address=127.0.0.1:3930] [plan="LeftHashJoin{Recv(13, 14, )->Recv(15, 16, )}(tpch100.lineitem.l_orderkey,tpch100.orders.o_orderkey)->Projection->Projection->Projection->HashAgg->Send(19, 20, )"] [mpp-version=3] [exchange-compression-mode=FAST] [GatherID=1] [resource_group=default]
[2026/04/24 17:53:53.152 +08:00] [INFO] [local_mpp_coordinator.go:239] ["Dispatch mpp task"] [timestamp=465836292997120001] [ID=19] [QueryTs=1777024433145415964] [LocalQueryId=2] [ServerID=1253] [address=127.0.0.1:3931] [plan="Recv(17, 18, )->HashAgg->Projection->Projection->Send(-1, )"] [mpp-version=3] [exchange-compression-mode=NONE] [GatherID=1] [resource_group=default]
[2026/04/24 17:53:53.152 +08:00] [INFO] [local_mpp_coordinator.go:239] ["Dispatch mpp task"] [timestamp=465836292997120001] [ID=20] [QueryTs=1777024433145415964] [LocalQueryId=2] [ServerID=1253] [address=127.0.0.1:3930] [plan="Recv(17, 18, )->HashAgg->Projection->Projection->Send(-1, )"] [mpp-version=3] [exchange-compression-mode=NONE] [GatherID=1] [resource_group=default]

After

[2026/04/29 01:25:42.728 +08:00] [INFO] [local_mpp_coordinator.go:270] ["Dispatch mpp tasks"] [timestamp=465933996569657356] [IDs="[1]"] [storeIDs="[128]"] [QueryTs=1777397142717349452] [LocalQueryId=2] [ServerID=1359] [plan="Table(nation)->Send(2, )"] [mpp-version=3] [exchange-compression-mode=FAST] [GatherID=1] [resource_group=default]
[2026/04/29 01:25:42.728 +08:00] [INFO] [local_mpp_coordinator.go:270] ["Dispatch mpp tasks"] [timestamp=465933996569657356] [IDs="[2]"] [storeIDs="[128]"] [QueryTs=1777397142717349452] [LocalQueryId=2] [ServerID=1359] [plan="RightHashJoin{Recv(1, )->Table(supplier)}(tpch100.nation.n_nationkey,tpch100.supplier.s_nationkey)->Projection->Send(5, 6, )"] [mpp-version=3] [exchange-compression-mode=FAST] [GatherID=1] [resource_group=default]
[2026/04/29 01:25:42.728 +08:00] [INFO] [local_mpp_coordinator.go:270] ["Dispatch mpp tasks"] [timestamp=465933996569657356] [IDs="[3,4]"] [storeIDs="[128,129]"] [QueryTs=1777397142717349452] [LocalQueryId=2] [ServerID=1359] [plan="Table(lineitem)->Send(5, 6, )"] [mpp-version=3] [exchange-compression-mode=FAST] [GatherID=1] [resource_group=default]
[2026/04/29 01:25:42.728 +08:00] [INFO] [local_mpp_coordinator.go:270] ["Dispatch mpp tasks"] [timestamp=465933996569657356] [IDs="[5,6]"] [storeIDs="[128,129]"] [QueryTs=1777397142717349452] [LocalQueryId=2] [ServerID=1359] [plan="RightHashJoin{Recv(2, )->Recv(3, 4, )}(tpch100.supplier.s_suppkey,tpch100.lineitem.l_suppkey)->Projection->Send(9, 10, )"] [mpp-version=3] [exchange-compression-mode=FAST] [GatherID=1] [resource_group=default]
[2026/04/29 01:25:42.728 +08:00] [INFO] [local_mpp_coordinator.go:270] ["Dispatch mpp tasks"] [timestamp=465933996569657356] [IDs="[7,8]"] [storeIDs="[129,128]"] [QueryTs=1777397142717349452] [LocalQueryId=2] [ServerID=1359] [plan="Table(part)->Sel([like(tpch100.part.p_name, %dim%, 92)])->Send(9, 10, )"] [mpp-version=3] [exchange-compression-mode=FAST] [GatherID=1] [resource_group=default]
[2026/04/29 01:25:42.728 +08:00] [INFO] [local_mpp_coordinator.go:270] ["Dispatch mpp tasks"] [timestamp=465933996569657356] [IDs="[9,10]"] [storeIDs="[128,129]"] [QueryTs=1777397142717349452] [LocalQueryId=2] [ServerID=1359] [plan="LeftHashJoin{Recv(5, 6, )->Recv(7, 8, )}(tpch100.lineitem.l_partkey,tpch100.part.p_partkey)->Projection->Send(13, 14, )"] [mpp-version=3] [exchange-compression-mode=FAST] [GatherID=1] [resource_group=default]
[2026/04/29 01:25:42.728 +08:00] [INFO] [local_mpp_coordinator.go:270] ["Dispatch mpp tasks"] [timestamp=465933996569657356] [IDs="[11,12]"] [storeIDs="[128,129]"] [QueryTs=1777397142717349452] [LocalQueryId=2] [ServerID=1359] [plan="Table(partsupp)->Send(13, 14, )"] [mpp-version=3] [exchange-compression-mode=FAST] [GatherID=1] [resource_group=default]
[2026/04/29 01:25:42.728 +08:00] [INFO] [local_mpp_coordinator.go:270] ["Dispatch mpp tasks"] [timestamp=465933996569657356] [IDs="[13,14]"] [storeIDs="[128,129]"] [QueryTs=1777397142717349452] [LocalQueryId=2] [ServerID=1359] [plan="LeftHashJoin{Recv(9, 10, )->Recv(11, 12, )}(tpch100.lineitem.l_suppkey,tpch100.partsupp.ps_suppkey)(tpch100.lineitem.l_partkey,tpch100.partsupp.ps_partkey)->Projection->Send(17, 18, )"] [mpp-version=3] [exchange-compression-mode=FAST] [GatherID=1] [resource_group=default]
[2026/04/29 01:25:42.728 +08:00] [INFO] [local_mpp_coordinator.go:270] ["Dispatch mpp tasks"] [timestamp=465933996569657356] [IDs="[15,16]"] [storeIDs="[129,128]"] [QueryTs=1777397142717349452] [LocalQueryId=2] [ServerID=1359] [plan="Table(orders)->Send(17, 18, )"] [mpp-version=3] [exchange-compression-mode=FAST] [GatherID=1] [resource_group=default]
[2026/04/29 01:25:42.729 +08:00] [INFO] [local_mpp_coordinator.go:270] ["Dispatch mpp tasks"] [timestamp=465933996569657356] [IDs="[17,18]"] [storeIDs="[128,129]"] [QueryTs=1777397142717349452] [LocalQueryId=2] [ServerID=1359] [plan="LeftHashJoin{Recv(13, 14, )->Recv(15, 16, )}(tpch100.lineitem.l_orderkey,tpch100.orders.o_orderkey)->Projection->Projection->Projection->HashAgg->Send(19, 20, )"] [mpp-version=3] [exchange-compression-mode=FAST] [GatherID=1] [resource_group=default]
[2026/04/29 01:25:42.729 +08:00] [INFO] [local_mpp_coordinator.go:270] ["Dispatch mpp tasks"] [timestamp=465933996569657356] [IDs="[19,20]"] [storeIDs="[128,129]"] [QueryTs=1777397142717349452] [LocalQueryId=2] [ServerID=1359] [plan="Recv(17, 18, )->HashAgg->Projection->Projection->Send(-1, )"] [mpp-version=3] [exchange-compression-mode=NONE] [GatherID=1] [resource_group=default]

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No need to test
    • I checked and no code files have been changed.

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

None

Summary by CodeRabbit

  • Refactor
    • Optimized multi-task dispatch coordination to support broader store info (including disaggregated compute stores), improve dispatch efficiency, reduce per-task logging, and consolidate dispatch reporting.
  • Tests
    • Updated unit tests to match the new dispatch and store-info handling.

Note: Internal optimizations only; no visible end-user changes.

Signed-off-by: gengliqi <gengliqiii@gmail.com>
@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot Bot commented Apr 24, 2026

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@ti-chi-bot ti-chi-bot Bot added release-note-none Denotes a PR that doesn't merit a release note. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. labels Apr 24, 2026
@gengliqi gengliqi marked this pull request as ready for review April 24, 2026 12:47
@ti-chi-bot ti-chi-bot Bot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. labels Apr 24, 2026
@tiprow
Copy link
Copy Markdown

tiprow Bot commented Apr 24, 2026

Hi @gengliqi. Thanks for your PR.

PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test all.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 24, 2026

📝 Walkthrough

Walkthrough

appendMPPDispatchReq now takes a map of TiFlash store info (zone + storeID) instead of zone-only mapping; it aggregates task IDs and store IDs per fragment, uses the new map for exchange zoning, and emits a single consolidated dispatch log. Execute builds the store-info map (including compute stores when disaggregated) before calling appendMPPDispatchReq.

Changes

Cohort / File(s) Summary
MPP Coordinator core
pkg/executor/internal/mpp/local_mpp_coordinator.go
appendMPPDispatchReq signature changed to accept map[string]tiFlashStoreInfo (zone + storeID). Caches sink tasks, accumulates task IDs and TiFlash storeIDs per fragment, sets exchange zone using the new store-info map, and replaces per-task dispatch logs with a single consolidated "Dispatch mpp tasks" log. Execute now builds the TiFlash store-info map and includes compute stores when disaggregated.
Tests / Helpers
pkg/executor/internal/mpp/local_mpp_coordinator_test.go
mockTaskZoneInfoHelper updated to populate allTiFlashStoreInfo by converting storeZoneMpp (map[string]string) into map[string]tiFlashStoreInfo{zone: ...}, and adjusted helper initialization to use the new field.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested labels

size/M

Suggested reviewers

  • windtalker

Poem

🐰 I gathered stores and numbered each hare,
Zones and IDs tucked with tidy care.
One cheerful log now sings for the team,
No noisy hops to break the scheme.
Hooray — steady dispatch without a scare!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely summarizes the main change: reducing the volume of logging when dispatching MPP tasks by aggregating logs.
Description check ✅ Passed The description includes the required issue reference (close #68040), clearly explains the problem, changes, and provides before-and-after log examples demonstrating the improvement.
Linked Issues check ✅ Passed The PR successfully addresses the linked issue #68040 by aggregating dispatch logs from per-task to per-fragment entries, reducing log volume while preserving relevant task IDs and store information.
Out of Scope Changes check ✅ Passed All code changes are directly related to the stated objectives: aggregating MPP dispatch logging and including TiFlash compute nodes in the store info map for disaggregated mode.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 golangci-lint (2.11.4)

Command failed


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
Review rate limit: 7/8 reviews remaining, refill in 7 minutes and 30 seconds.

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
pkg/executor/internal/mpp/local_mpp_coordinator.go (1)

265-277: Aggregated dispatch log: consider a couple of small polish items.

The consolidation looks good and meets the PR goal. A few optional points worth considering:

  1. Log-string change is a breaking signal for operators. The message changed from the per-task "Dispatch mpp task" to "Dispatch mpp tasks". Any external log alerting/grep that matched the old string will silently go dark. Worth calling out in the release note (currently None) so operators can update dashboards/alerts.

  2. task-count is redundant with IDs. zap.Int64s("IDs", taskIDs) already encodes the count; zap.Int("task-count", len(taskIDs)) duplicates that info. Fine to keep for human readability, but you can drop it if terseness is preferred.

  3. Field-name style is mixed (timestamp, IDs, QueryTs, LocalQueryId, ServerID, addresses, mpp-version, GatherID, resource_group). This mirrors the legacy log so it’s not a regression, but while you’re touching this line it’s a cheap opportunity to pick one style (the rest of the file tends toward lower-kebab like mpp-version, task-id).

  4. firstTask.StartTs vs c.startTS. For a fragment dispatched by this coordinator these should always match; using c.startTS directly would avoid the implicit assumption and remove the need to read firstTask for that field.

No functional concerns — these are all nits.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/executor/internal/mpp/local_mpp_coordinator.go` around lines 265 - 277,
Restore the original log message and tidy redundant/mixed fields in the
consolidated dispatch log: in the log statement built in
local_mpp_coordinator.go (the logutil.BgLogger().Info call referencing
firstTask, taskIDs, pf.Sink, c.gatherID, rgName), change the message back to
"Dispatch mpp task" to avoid breaking alerts, remove the redundant
zap.Int("task-count", len(taskIDs)) since zap.Int64s("IDs", taskIDs) already
conveys count, replace firstTask.StartTs with c.startTS to avoid the implicit
dependency on firstTask, and normalize key names to a consistent lower-kebab
style (e.g., query-ts, local-query-id, server-id, gather-id, resource-group)
while keeping existing values (firstTask.MppQueryID.*, c.gatherID, rgName,
addresses, plannercore.ToString(pf.Sink), firstTask.MppVersion,
pf.Sink.GetCompressionMode()).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@pkg/executor/internal/mpp/local_mpp_coordinator.go`:
- Around line 265-277: Restore the original log message and tidy redundant/mixed
fields in the consolidated dispatch log: in the log statement built in
local_mpp_coordinator.go (the logutil.BgLogger().Info call referencing
firstTask, taskIDs, pf.Sink, c.gatherID, rgName), change the message back to
"Dispatch mpp task" to avoid breaking alerts, remove the redundant
zap.Int("task-count", len(taskIDs)) since zap.Int64s("IDs", taskIDs) already
conveys count, replace firstTask.StartTs with c.startTS to avoid the implicit
dependency on firstTask, and normalize key names to a consistent lower-kebab
style (e.g., query-ts, local-query-id, server-id, gather-id, resource-group)
while keeping existing values (firstTask.MppQueryID.*, c.gatherID, rgName,
addresses, plannercore.ToString(pf.Sink), firstTask.MppVersion,
pf.Sink.GetCompressionMode()).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: f3804ee6-fa49-4752-95de-249ffd50749a

📥 Commits

Reviewing files that changed from the base of the PR and between f2ebab0 and e1f8dff.

📒 Files selected for processing (1)
  • pkg/executor/internal/mpp/local_mpp_coordinator.go

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 24, 2026

Codecov Report

❌ Patch coverage is 0% with 51 lines in your changes missing coverage. Please review.
✅ Project coverage is 77.0719%. Comparing base (9d1ac81) to head (15870ae).

Additional details and impacted files
@@               Coverage Diff                @@
##             master     #68041        +/-   ##
================================================
- Coverage   77.7558%   77.0719%   -0.6839%     
================================================
  Files          1990       1972        -18     
  Lines        551768     552429       +661     
================================================
- Hits         429032     425768      -3264     
- Misses       121816     126649      +4833     
+ Partials        920         12       -908     
Flag Coverage Δ
integration 41.3648% <0.0000%> (+1.5630%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
dumpling 60.4888% <ø> (ø)
parser ∅ <ø> (∅)
br 50.0597% <ø> (-13.0338%) ⬇️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

u
Signed-off-by: gengliqi <gengliqiii@gmail.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
pkg/executor/internal/mpp/local_mpp_coordinator.go (1)

265-277: Aggregated dispatch log — consider a brief comment on the invariant.

The aggregated log reuses firstTask.StartTs, firstTask.MppQueryID, and firstTask.MppVersion to represent the whole fragment. That is correct because all tasks produced for a single fragment share the same query/gather metadata, but it's not self-evident to a future reader skimming this method. A one-line comment above the log (e.g., "all tasks in the fragment share StartTs/MppQueryID/MppVersion") would make the intent obvious and prevent someone from later "fixing" this to a per-task loop.

Also note the positional correspondence between taskIDs[i] and addresses[i] is implicit; since this log is intended for operators rather than structured parsing, that's acceptable — just worth being aware of if downstream tooling ever starts consuming these fields.

✍️ Suggested clarifying comment
 	if len(tasks) > 0 {
+		// All tasks within a fragment share the same query-level metadata
+		// (StartTs/MppQueryID/MppVersion), so it's safe to take them from tasks[0].
+		// taskIDs[i] corresponds positionally to addresses[i].
 		firstTask := tasks[0]
 		logutil.BgLogger().Info("Dispatch mpp tasks", zap.Uint64("timestamp", firstTask.StartTs),
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/executor/internal/mpp/local_mpp_coordinator.go` around lines 265 - 277,
Add a one-line clarifying comment immediately above the aggregated dispatch log
in LocalMPPCoordinator (the block using firstTask, tasks, taskIDs, addresses,
and c.gatherID) stating the invariant that all tasks in this fragment share the
same StartTs, MppQueryID, and MppVersion so using firstTask values is
intentional; also note that taskIDs[index] corresponds positionally to
addresses[index] to avoid future accidental per-task refactors or
misinterpretation of the logged arrays.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@pkg/executor/internal/mpp/local_mpp_coordinator.go`:
- Around line 265-277: Add a one-line clarifying comment immediately above the
aggregated dispatch log in LocalMPPCoordinator (the block using firstTask,
tasks, taskIDs, addresses, and c.gatherID) stating the invariant that all tasks
in this fragment share the same StartTs, MppQueryID, and MppVersion so using
firstTask values is intentional; also note that taskIDs[index] corresponds
positionally to addresses[index] to avoid future accidental per-task refactors
or misinterpretation of the logged arrays.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: f45b6296-ebb4-4063-bce9-6312f5dd607a

📥 Commits

Reviewing files that changed from the base of the PR and between e1f8dff and 546b053.

📒 Files selected for processing (1)
  • pkg/executor/internal/mpp/local_mpp_coordinator.go

u
Signed-off-by: gengliqi <gengliqiii@gmail.com>
@ti-chi-bot ti-chi-bot Bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Apr 28, 2026
@gengliqi
Copy link
Copy Markdown
Contributor Author

/cc @windtalker

@ti-chi-bot ti-chi-bot Bot requested a review from windtalker April 28, 2026 17:27
@gengliqi
Copy link
Copy Markdown
Contributor Author

/cc @xzhangxian1008

@ti-chi-bot ti-chi-bot Bot requested a review from xzhangxian1008 April 28, 2026 17:28
if config.GetGlobalConfig().DisaggregatedTiFlash {
computeStores, getStoreErr := cache.GetTiFlashComputeStores(
backoff.NewBackoffer(ctx, copr.CopNextMaxBackoff).TiKVBackoffer())
if getStoreErr == nil {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happens if getStoreErr is not nil?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Traffic that could cross zones might be counted as traffic within the same zone, and the store ID in the logs might be 0.

Copy link
Copy Markdown
Contributor

@xzhangxian1008 xzhangxian1008 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot Bot commented Apr 30, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: xzhangxian1008
Once this PR has been reviewed and has the lgtm label, please assign nrc for approval. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot Bot added the needs-1-more-lgtm Indicates a PR needs 1 more LGTM. label Apr 30, 2026
@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot Bot commented Apr 30, 2026

[LGTM Timeline notifier]

Timeline:

  • 2026-04-30 03:11:05.948148215 +0000 UTC m=+2826671.153508272: ☑️ agreed by xzhangxian1008.

@gengliqi gengliqi requested a review from windtalker April 30, 2026 06:36
Signed-off-by: gengliqi <gengliqiii@gmail.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
pkg/executor/internal/mpp/local_mpp_coordinator.go (1)

988-995: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Handle GetTiFlashComputeStores failures explicitly instead of silent fallback.

When getStoreErr != nil, the code silently skips compute stores. That can leave allTiFlashStoreInfo incomplete, which later causes unknown zones / storeID=0 in aggregated dispatch logs and zone-traffic misclassification.

Suggested fix
 		if config.GetGlobalConfig().DisaggregatedTiFlash {
 			computeStores, getStoreErr := cache.GetTiFlashComputeStores(
 				backoff.NewBackoffer(ctx, copr.CopNextMaxBackoff).TiKVBackoffer())
-			if getStoreErr == nil {
+			if getStoreErr != nil {
+				logutil.BgLogger().Warn("failed to get tiflash compute stores for mpp dispatch",
+					zap.Error(getStoreErr),
+					zap.Uint64("txnStartTS", c.startTS),
+					zap.Uint64("gatherID", c.gatherID))
+			} else {
 				for _, tiflashStore := range computeStores {
 					addTiFlashStoreInfo(allTiFlashStoreInfo, tiflashStore)
 				}
 			}
 		}

As per coding guidelines: "Keep error handling actionable and contextual; avoid silently swallowing errors."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/executor/internal/mpp/local_mpp_coordinator.go` around lines 988 - 995,
The code currently swallows errors from cache.GetTiFlashComputeStores causing
missing entries in allTiFlashStoreInfo; update the DisaggregatedTiFlash branch
to handle getStoreErr explicitly: check if getStoreErr != nil and then either
return or propagate the error (or at minimum log it with contextual information
including the backoffer, ctx/region/store identifiers and the fact it's
populating allTiFlashStoreInfo) before skipping the loop; keep the existing
successful path that calls addTiFlashStoreInfo for each tiflashStore, but do not
silently ignore failures from GetTiFlashComputeStores.
🧹 Nitpick comments (1)
pkg/executor/internal/mpp/local_mpp_coordinator.go (1)

282-293: ⚡ Quick win

Consider capping IDs/storeIDs log payload size for very large fragments.

Aggregating per fragment is great, but logging full arrays can still create huge single log entries at scale. Consider logging count + truncated samples to keep log cost predictable.

Possible approach
+		const maxLogged = 128
+		loggedTaskIDs := taskIDs
+		loggedStoreIDs := storeIDs
+		truncated := false
+		if len(loggedTaskIDs) > maxLogged {
+			loggedTaskIDs = loggedTaskIDs[:maxLogged]
+			loggedStoreIDs = loggedStoreIDs[:maxLogged]
+			truncated = true
+		}
 		logutil.BgLogger().Info("Dispatch mpp tasks", zap.Uint64("timestamp", firstTask.StartTs),
-			zap.Int64s("IDs", taskIDs), zap.Uint64s("storeIDs", storeIDs),
+			zap.Int("taskCount", len(taskIDs)),
+			zap.Bool("idsTruncated", truncated),
+			zap.Int64s("IDs", loggedTaskIDs), zap.Uint64s("storeIDs", loggedStoreIDs),
 			...
 		)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/executor/internal/mpp/local_mpp_coordinator.go` around lines 282 - 293,
The "Dispatch mpp tasks" log is currently emitting entire taskIDs and storeIDs
arrays (variables taskIDs and storeIDs) which can blow up log size; change the
log to instead emit their counts and small truncated samples (e.g.,
len(taskIDs)/len(storeIDs) plus first N elements or a hash/summary) while
keeping the existing context fields (firstTask.MppQueryID.*,
firstTask.MppVersion, pf.Sink plan/compression, c.gatherID, rgName, sqlDigest,
planDigestStr) so callers like the Dispatch mpp tasks logging site produce
predictable, bounded-sized entries.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@pkg/executor/internal/mpp/local_mpp_coordinator.go`:
- Around line 988-995: The code currently swallows errors from
cache.GetTiFlashComputeStores causing missing entries in allTiFlashStoreInfo;
update the DisaggregatedTiFlash branch to handle getStoreErr explicitly: check
if getStoreErr != nil and then either return or propagate the error (or at
minimum log it with contextual information including the backoffer,
ctx/region/store identifiers and the fact it's populating allTiFlashStoreInfo)
before skipping the loop; keep the existing successful path that calls
addTiFlashStoreInfo for each tiflashStore, but do not silently ignore failures
from GetTiFlashComputeStores.

---

Nitpick comments:
In `@pkg/executor/internal/mpp/local_mpp_coordinator.go`:
- Around line 282-293: The "Dispatch mpp tasks" log is currently emitting entire
taskIDs and storeIDs arrays (variables taskIDs and storeIDs) which can blow up
log size; change the log to instead emit their counts and small truncated
samples (e.g., len(taskIDs)/len(storeIDs) plus first N elements or a
hash/summary) while keeping the existing context fields (firstTask.MppQueryID.*,
firstTask.MppVersion, pf.Sink plan/compression, c.gatherID, rgName, sqlDigest,
planDigestStr) so callers like the Dispatch mpp tasks logging site produce
predictable, bounded-sized entries.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 8d5b971d-e2d2-4071-91c0-92e96de9eadd

📥 Commits

Reviewing files that changed from the base of the PR and between e5c4c54 and 15870ae.

📒 Files selected for processing (1)
  • pkg/executor/internal/mpp/local_mpp_coordinator.go

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

needs-1-more-lgtm Indicates a PR needs 1 more LGTM. release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

executor, distsql: reduce the log of dispatching mpp tasks

3 participants