Skip to content

feat(drive): add +list-comments shortcut with anchor-aware filtering#1114

Draft
kyalpha313 wants to merge 4 commits into
larksuite:mainfrom
kyalpha313:feat/drive-list-comments-shortcut
Draft

feat(drive): add +list-comments shortcut with anchor-aware filtering#1114
kyalpha313 wants to merge 4 commits into
larksuite:mainfrom
kyalpha313:feat/drive-list-comments-shortcut

Conversation

@kyalpha313

@kyalpha313 kyalpha313 commented May 26, 2026

Copy link
Copy Markdown
Contributor

Summary

新增 drive +list-comments shortcut——以智能默认返回 docx 评论,对齐用户在飞书 UI 侧栏看到的评论卡片。详见 #1111 的背景。

本 PR 已基于 #1258 文档化的字段(need_relation=true / relation.relation.positionInfo.blockID / content_deleted / parent_type / parent_token)重写实现,取代了早期"客户端 quote 文本匹配 + 字节偏移投影"的曲线方案。

CLI 形态

# 默认:筛选=未解决 + 锚定有效,输出=含评论的回复 + reaction,顺序=按 anchor 出现顺序
lark-cli drive +list-comments --doc <docx-url-or-token>

# Opt-in 扩展
--include-orphaned   # 加上孤立锚的评论(尾部)
--include-resolved   # 加上已解决评论
--no-reactions       # 不拉取 reaction(节省一次接口往返)
--order=created      # 按 create_time 排序

输出契约

复用 drive file.comments list 原响应,每条评论多五个派生字段:

{
  "comment_id": "...",
  "quote": "...",
  "is_solved": false,
  "is_whole": false,
  "anchor_state": "valid" | "structural" | "orphaned",
  "anchor_position": 12,
  "anchor_block_id": "...",                              // 从 positionInfo.blockID 解析
  "content_deleted": false,                              // 服务端 orphan 信号
  "location_accuracy": "relation_exact"                  // 见下表
                      | "parent_resource_exact"
                      | "weak_inferred"
                      | "content_deleted"
                      | "whole_document",
  "reply_list": { "replies": [ { ..., "reactions": [...] } ] }
}

外层附 counts: {total, valid, structural, orphaned}file_token / wiki_token,便于上层断言。

定位准确度分级

location_accuracy 判定 输出口径
relation_exact positionInfo.blockID 能在 docs +fetch --detail with-ids 的 block 树里找到 准确定位到 block
parent_resource_exact 嵌入 Sheet/Base/画板内部评论,只有父级 block 已知 准确定位到嵌入资源,内部需要用对应 skill 下钻
content_deleted 服务端 content_deleted=true 引用内容已删,无法定位
whole_document is_whole=true 的全文评论 全文级,无锚点
weak_inferred 兜底(目前 MVP 不应触发) 推断,需说明歧义来源

内部流程

  1. 解析 --doc,wiki URL 通过 wiki/v2/spaces/get_node 解包到 docx token。
  2. 分页拉 GET /open-apis/drive/v1/files/:file_token/comments,默认带 need_relation=true + is_solved=false + need_reaction=true
  3. 一次 POST /open-apis/docs_ai/v1/documents/:token/fetch 拉取正文 XML(format=xml, export_option.export_block_id=true),构建 block ID 索引用于稳定排序。
  4. 对每条评论:
    • relation.content_deleted=trueorphaned + location_accuracy=content_deleted
    • 解析 relation.relation(嵌套 JSON 字符串)取 positionInfo.blockID → 命中 block 树 → valid + relation_exact,否则 parent_resource_exact 或降级
    • parent_type ∈ {SHEET_BLOCK, BITABLE_BLOCK, WHITEBOARD_BLOCK} → 用 parent_token 匹配嵌入 block,降级为 structural + parent_resource_exact
    • is_whole=truevalid + whole_document
  5. 按 block ID 在文档中的出现顺序排序(orphan 末尾,可选丢弃);按 --order=created 时改为按创建时间。

实测验证(线上真实 docx)

某 docx(原始 22 条用户可见 + 7 条孤立锚 = 29 条):

旧 algo(文本匹配) 新 algo(本 PR)
总评论 29 32(新增 3 条)
与 UI 侧栏匹配 22 valid 中有 2 条组成错(已知错误 误判 valid、P2/P3 事件聚类触发 误判 orphan) 22 valid + 2 structural,匹配原 UI 22 条中的 21 条(剩余 1 条 产品 推测是文档编辑后该 block 已删)
[Sticky note] 错误排到列表末尾 正确排到 #19(用户 UI 是 #17,微差源于新评论挤压)
体验驱动 多行 callout 错误排到 #2(取首个文本出现位置) 正确排到 #18
已知错误(短锚多处出现) 误判 valid 通过 content_deleted=true 正确归入 orphan

已知限制

  • 嵌入资源内部排序:Sheet/Base/画板内部评论的锚被文档 XML 折叠为父级 block,本 PR 故意降级为 parent_resource_exact 而非声称内部精确顺序——由 sheet / bitable / whiteboard skill 下钻负责
  • MVP scope:仅支持 docx(file_type=docx);其它 file_type 暂不通过 need_relation 精确定位,需走资源对应 skill

为什么是新 shortcut 而非给原命令加 flag

  1. 不动 drive file.comments list 现有行为 / 输出结构,避免破坏依赖原始计数的脚本。
  2. 当前 shortcuts/drive 下唯一与评论有关的 shortcut 是 +add-comment;新增 +list-comments 为后续 +resolve-comment / +update-reply 等读改类 shortcut 建立命名 pattern。
  3. shortcut 的"智能默认"契合 +search / +inspect 等先例,默认就给"用户最常想要的那个口径"。

Test Plan

  • go test -race -count=1 ./shortcuts/drive/...
  • go test -race -count=1 ./tests/cli_e2e/drive -run TestDriveListComments(dry-run E2E)
  • 真实 docx 验证:./lark-cli drive +list-comments --doc <real-docx-url> → 22 valid + 2 structural,组成与 UI 侧栏一致(细节见上"实测验证")
  • gofmt -l shortcuts/drive/drive_list_comments.go shortcuts/drive/drive_list_comments_test.go tests/cli_e2e/drive/drive_list_comments_* 无输出
  • go vet ./shortcuts/drive/
  • go mod tidy 不改 go.mod / go.sum
  • golangci-lint run --new-from-rev=origin/main 无新增 issue
  • Opt-in 实测 live E2E:LARK_DRIVE_LIST_COMMENTS_E2E=1 go test ./tests/cli_e2e/drive -run TestDriveListCommentsWorkflow -v(user 身份)通过,含创建临时 docx → 加 block 评论 → list 断言 relation → 清理

Related Issues

🤖 Generated with Claude Code

@CLAassistant

CLAassistant commented May 26, 2026

Copy link
Copy Markdown

CLA assistant check
All committers have signed the CLA.

@github-actions github-actions Bot added domain/ccm PR touches the ccm domain size/L Large or sensitive change across domains or core paths labels May 26, 2026
@coderabbitai

coderabbitai Bot commented May 26, 2026

Copy link
Copy Markdown
📝 Walkthrough

Walkthrough

Adds a new drive +list-comments shortcut that lists DOCX comments with smart defaults: it resolves wiki nodes to DOCX tokens, paginates comments, fetches DOCX XML, normalizes text for anchor matching, classifies anchors as valid/structural/orphaned, supports filtering/sorting, and includes dry-run and tests.

Changes

Drive +list-comments Shortcut Implementation

Layer / File(s) Summary
Shortcut definition and input validation
shortcuts/drive/drive_list_comments.go
Defines the shortcut flags (--doc, --include-orphaned, --include-resolved, --no-reactions, --order) and implements parsing/validation for docx, wiki, and bare-token inputs.
API orchestration: planning, listing, and document fetch
shortcuts/drive/drive_list_comments.go
Dry-run emits multi-step plan (wiki resolve when needed), lists paginated comments to bounded limits, and fetches DOCX XML via docs AI fetch (format=xml).
Text normalization and anchor classification
shortcuts/drive/drive_list_comments.go
Normalizes XML/text (strip tags, unescape entities, collapse whitespace), projects raw positions to normalized coordinates, and classifies anchors: full-doc (valid), sticky (structural if readonly-block present, else orphaned), quote-based (valid if located, else orphaned).
Execution flow, sorting, and output formatting
shortcuts/drive/drive_list_comments.go
Main flow: resolve wiki → list comments → fetch/normalize document → build anchor-classified items → filter by orphan/resolved → sort (orphans last; anchor or created order) → output JSON with per-state counts. Includes map clone and quote truncation helpers.
Unit tests: input parsing and validation
shortcuts/drive/drive_list_comments_test.go
Tests parseListCommentsDocRef and validateListComments for URL/token parsing and validation error/success cases.
Unit tests: text normalization and anchor detection
shortcuts/drive/drive_list_comments_test.go
Tests normalizeDocContent/normalizeQuoteNeedle, buildCommentItem, sortCommentItems, and projectRawPosToNormalized (tag stripping, entity decoding, whitespace handling, first-line needle behavior, classification, orphan fallback, and sorting).
Integration tests: dry-run and end-to-end
shortcuts/drive/drive_list_comments_test.go
Tests DriveListComments.DryRun for docx/wiki (resolve step), filter-parameter behavior, and httpmock E2E tests verifying default orphan filtering and --include-orphaned behavior.
Shortcut registry integration
shortcuts/drive/shortcuts.go, shortcuts/drive/shortcuts_test.go
Registers DriveListComments in the shortcuts list and updates the test expectation to include "+list-comments".

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • larksuite/cli#947: Implements related wiki URL-to-{type,token} resolution used before listing DOCX comments.

Suggested reviewers

  • wittam-01
  • fangshuyu-768

Poem

🐰 I hopped through XML, tags set free,

I sniffed each quote where it ought to be,
Orphans tucked gently at the end,
Valid ones bloom, structural defend,
A rabbit's nod to tidy commentary.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 40.43% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: addition of a new +list-comments shortcut with anchor-aware filtering for the drive service.
Linked Issues check ✅ Passed The implementation successfully addresses all coding objectives from #1111: anchor-state detection (valid/structural/orphaned), document XML fetching, text normalization with HTML tag stripping, quote matching, sorting by anchor position, filtering flags, comprehensive test coverage, and proper output structure with derived fields.
Out of Scope Changes check ✅ Passed All changes are directly scoped to the #1111 objectives: new shortcut implementation, comprehensive tests, and registration in the shortcuts module. No unrelated changes detected.
Description check ✅ Passed PR description is comprehensive and well-structured, covering summary, CLI usage, output contract, implementation details, test results, and known limitations.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@shortcuts/drive/drive_list_comments_test.go`:
- Around line 458-593: Both tests
(TestDriveListComments_E2E_FiltersOrphanedByDefault and
TestDriveListComments_E2E_IncludeOrphanedKeepsAll) rely on global config and
must isolate config state; add t.Setenv("LARKSUITE_CLI_CONFIG_DIR", t.TempDir())
at the start of each test (or in a shared test helper called before
cmdutil.TestFactory/mountAndRunDrive) so the tests use a temporary
LARKSUITE_CLI_CONFIG_DIR and avoid cross-test state leakage.
- Around line 86-132: Replace uses of common.TestNewRuntimeContext(...) in these
tests with the repo-standard test factory cmdutil.TestFactory(t,
&core.CliConfig{}); specifically, when building the runtime for
validateListComments calls (tests referencing newListCommentsCmd and
validateListComments), call cmdutil.TestFactory(t, &core.CliConfig{}) to obtain
the test factory/runtime instead of common.TestNewRuntimeContext, and update any
variable names as needed so validateListComments receives the factory/runtime
produced by cmdutil.TestFactory.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 177b228c-d297-4257-8756-8d8f50e92a48

📥 Commits

Reviewing files that changed from the base of the PR and between 1135fc2 and e35ed9e.

📒 Files selected for processing (4)
  • shortcuts/drive/drive_list_comments.go
  • shortcuts/drive/drive_list_comments_test.go
  • shortcuts/drive/shortcuts.go
  • shortcuts/drive/shortcuts_test.go

Comment thread shortcuts/drive/drive_list_comments_test.go
Comment thread shortcuts/drive/drive_list_comments_test.go
kyalpha313 added a commit to kyalpha313/lark-cli that referenced this pull request May 26, 2026
Add t.Setenv("LARKSUITE_CLI_CONFIG_DIR", t.TempDir()) to the two httpmock E2E
tests for drive +list-comments, matching the pattern used by
drive_inspect_test.go and AGENTS.md guidance for config-state isolation.

Addresses CodeRabbit review on larksuite#1114.
Comment thread shortcuts/drive/drive_list_comments.go Outdated
hasStructuralAnchor := strings.Contains(docXML, structuralAnchorTag)
// Approximate position of the FIRST sticky-note anchor in normalized space.
// All sticky-anchored comments share this position for sorting purposes.
structuralPos := projectRawPosToNormalized(docXML, normalized, structuralAnchorTag)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This computes a single structuralPos from the first <readonly-block> and then reuses that same position for every sticky-note comment. As a result, multiple structural comments do not actually sort by document position under the default order=anchor; they all tie on the same anchor position and then fall back to create_time. That does not match the PR description about approximating sticky-note order from XML position. If we cannot map individual sticky-note comments to distinct structural anchors, we should either avoid claiming anchor-order fidelity for them or make this limitation explicit in output and tests.

Comment thread shortcuts/drive/drive_list_comments.go Outdated
return ""
}
decoded := html.UnescapeString(firstLine)
return whitespacePattern.ReplaceAllString(decoded, "")

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rich-text quotes containing inline markup will be misclassified here. The PR description says quote matching should strip HTML tags before comparison, but normalizeQuoteNeedle only unescapes entities and removes whitespace. If the comments API returns formatting tags inside quote (for example <b> or <i>), the needle will still contain markup and fail to match the tag-stripped document text, causing a still-valid comment to be classified as orphaned and filtered out by default. Please strip tags here as well, and add a test where the quote itself contains inline markup.

@wittam-01

wittam-01 commented May 28, 2026

Copy link
Copy Markdown
Collaborator

感谢这个PR,看了下实现,现在是根据评论的quota去匹配正文的,定位在xml内容的起始位置,有几个问题:

  1. 对于block块的评论(如高亮块block的评论,quota为“[高亮块]”)会因为匹配不到直接被过滤掉;
  2. 只返回position的话,如果后续文档内容有变更,位置发生偏移,这个position就定位不到了。

另外,我们下周会在评论接口返回评论所在的blockID位置,这个会更加准确,也能覆盖block评论场景,可以等下我们下周的版本。

@kyalpha313 kyalpha313 marked this pull request as draft May 30, 2026 00:51
@kyalpha313

Copy link
Copy Markdown
Contributor Author

@wittam-01 感谢 review 和详细反馈,这两点都很到位:

  1. [高亮块] 之类的 block 占位符确实会被我的文本匹配漏掉——我目前只特判了 [Sticky note](对应 <readonly-block>),其它 block 类锚没覆盖。
  2. position 不稳定也是这个方案的硬伤,只对"取完立即排序"友好,持久化使用会漂。

听到下周接口会返回 blockID,这才是真正的根治方案——可以唯一定位锚、自然覆盖 block 评论场景,连 fetch 正文都省了。我先把这个 PR 转 draft 暂停,等新接口上线后用 blockID 重写,代码也能砍掉一大半。

不用提前给我看新字段 schema,下周看到了再说。期间如果有什么我能帮忙的(比如新接口上线时优先试一下、或者拆其它周边小工具),随时 ping 我。

Add t.Setenv("LARKSUITE_CLI_CONFIG_DIR", t.TempDir()) to the two httpmock E2E
tests for drive +list-comments, matching the pattern used by
drive_inspect_test.go and AGENTS.md guidance for config-state isolation.

Addresses CodeRabbit review on larksuite#1114.
…tests

Revert the t.Setenv("LARKSUITE_CLI_CONFIG_DIR", ...) added in 64e65b6. Per the
repo learning from PR larksuite#343, tests built via cmdutil.TestFactory(t, config) use
an in-memory config closure and never touch the filesystem, so isolating
CONFIG_DIR has no effect. Only the real NewDefault() factory path needs it.

Verified: cmdutil.TestFactory sets Config: func() (*core.CliConfig, error) {
return config, nil } — no filesystem access.
…#1258

Replaces the client-side quote/text-matching orphan detection with the
server-supplied anchor identifiers from `need_relation=true`, following
the field design documented in larksuite#1258.

Behavior:
- list comments with `need_relation=true` to obtain `relation.relation`
  (containing `positionInfo.blockID`) and `content_deleted`
- fetch docs XML with `export_block_id=true`; build a block-id index
  for stable anchor ordering (replaces the prior text-position
  projection that broke on multi-occurrence and tag-broken anchors)
- use `content_deleted` as the orphan signal (resolves the prior
  "short anchor multi-occurrence" false-positive limitation)
- use `parent_type` / `parent_token` to place embedded-resource
  comments (Sheet / Base / Whiteboard); intentionally downgrade these
  to `parent_resource_exact` rather than claiming exact internal
  ordering, since docs XML only identifies the embedding block

Output contract additions:
- `anchor_block_id`: per-comment anchor (from positionInfo.blockID)
- `location_accuracy`: relation_exact | parent_resource_exact |
  weak_inferred | content_deleted | whole_document
- `content_deleted`: server-side orphan signal

Tests:
- unit: relation parsing, malformed-JSON fallback, content-deleted
  orphan classification, relation-exact / parent-resource-exact
  anchoring, embedded-relation downgrade, sorted output, dry-run
  request shape, httpmock E2E
- dry-run E2E (`tests/cli_e2e/drive/drive_list_comments_dryrun_test.go`):
  asserts `need_relation=true`, `need_reaction=true`, `is_solved=false`,
  and fetch body `format=xml` with `export_block_id=true`
- opt-in live E2E
  (`tests/cli_e2e/drive/drive_list_comments_workflow_test.go`):
  gated by `LARK_DRIVE_LIST_COMMENTS_E2E=1`; user identity; creates
  a docx, adds a block comment, lists, asserts relation metadata,
  cleans up
- coverage doc updated to 11 / 32 = 34.4%

Live validation against a real docx: composition now matches the
Lark UI side panel for 21 of 22 originally observed comments (the
prior algorithm had a 2-item swap error from text-matching false
positives); `[Sticky note]` and embedded `<callout>` comments now
land at their correct positions.
@kyalpha313 kyalpha313 force-pushed the feat/drive-list-comments-shortcut branch from b0267d3 to b86407a Compare June 8, 2026 10:18
@kyalpha313

Copy link
Copy Markdown
Contributor Author

@wittam-01 进展同步:

已按 #1258 的字段设计把 PR #1114 重写并 force-push(基于刚 rebase 到 upstream/main@ee5113f)。要点:

  • need_relation=true 拉评论,解 relation.relation(嵌套 JSON)取 positionInfo.blockID
  • content_deleted 直接作 orphan 信号(替掉了之前客户端文本匹配的启发式)
  • parent_type + parent_token 处理 Sheet/Base/画板嵌入资源评论,按你文档里的"父级精确,内部需下钻"规则降级
  • 输出加了 anchor_block_id / location_accuracy / content_deleted 三个新字段;location_accuracy 用了你定的 relation_exact / parent_resource_exact / content_deleted / whole_document 几档(还留了个 weak_inferred 兜底,MVP 下不应触发)

实测某线上 docx(原 29 条):新算法 22 valid + 2 structural + 8 orphan,与 UI 侧栏匹配 21/22(剩余 1 条推测是文档编辑后 block 被删,content_deleted=true 自然归入 orphan)。早期那两个 swap 错误(已知错误/P2/P3 事件聚类触发)已彻底解决,[Sticky note] 和嵌入 callout 也排到了正确位置。

PR 暂时仍是 draft——等 #1258 合入(或你确认这套字段假设稳定)后再 mark ready,以防字段还有调整。CI 跑通后再来更新这条 thread。

不急,等你方便回。

@wittam-01

Copy link
Copy Markdown
Collaborator

@wittam-01 进展同步:

已按 #1258 的字段设计把 PR #1114 重写并 force-push(基于刚 rebase 到 upstream/main@ee5113f)。要点:

  • need_relation=true 拉评论,解 relation.relation(嵌套 JSON)取 positionInfo.blockID
  • content_deleted 直接作 orphan 信号(替掉了之前客户端文本匹配的启发式)
  • parent_type + parent_token 处理 Sheet/Base/画板嵌入资源评论,按你文档里的"父级精确,内部需下钻"规则降级
  • 输出加了 anchor_block_id / location_accuracy / content_deleted 三个新字段;location_accuracy 用了你定的 relation_exact / parent_resource_exact / content_deleted / whole_document 几档(还留了个 weak_inferred 兜底,MVP 下不应触发)

实测某线上 docx(原 29 条):新算法 22 valid + 2 structural + 8 orphan,与 UI 侧栏匹配 21/22(剩余 1 条推测是文档编辑后 block 被删,content_deleted=true 自然归入 orphan)。早期那两个 swap 错误(已知错误/P2/P3 事件聚类触发)已彻底解决,[Sticky note] 和嵌入 callout 也排到了正确位置。

PR 暂时仍是 draft——等 #1258 合入(或你确认这套字段假设稳定)后再 mark ready,以防字段还有调整。CI 跑通后再来更新这条 thread。

不急,等你方便回。

我这边的PR预计明天会合入,还在验证中。你这边的PR还需要吗,现在已经可以使用lark-cli drive file.comments list拿到评论列表了

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

domain/ccm PR touches the ccm domain size/L Large or sensitive change across domains or core paths

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature] drive +list-comments:默认按用户实际可见口径返回评论,含孤立锚识别与按位置排序

4 participants