Skip to content

fix(parser): enhance handling of custom tags and markdown syntax#1904

Open
meet-student wants to merge 1 commit into
ant-design:mainfrom
meet-student:patch
Open

fix(parser): enhance handling of custom tags and markdown syntax#1904
meet-student wants to merge 1 commit into
ant-design:mainfrom
meet-student:patch

Conversation

@meet-student
Copy link
Copy Markdown
Member

@meet-student meet-student commented May 19, 2026

中文版模板 / Chinese template

🤔 This is a ...

  • 🆕 New feature
  • 🐞 Bug fix
  • 📝 Site / documentation improvement
  • 📽️ Demo improvement
  • 💄 Component style improvement
  • 🤖 TypeScript definition improvement
  • 📦 Bundle size optimization
  • ⚡️ Performance optimization
  • ⭐️ Feature enhancement
  • 🌐 Internationalization
  • 🛠 Refactoring
  • 🎨 Code style optimization
  • ✅ Test Case
  • 🔀 Branch merge
  • ⏩ Workflow
  • ⌨️ Accessibility improvement
  • ❓ Other (about what?)

🔗 Related Issues

  • Describe the source of related requirements, such as links to relevant issue discussions.
  • For example: close #xxxx, fix #xxxx

💡 Background and Solution

  • The specific problem to be addressed.
  • List the final API implementation and usage if needed.
  • If there are UI/interaction changes, consider providing screenshots or GIFs.

📝 Change Log

Language Changelog
🇺🇸 English enhance handling of custom tags and markdown syntax
🇨🇳 Chinese -

Summary by CodeRabbit

发布说明

https://codesandbox.io/p/devbox/zi-ding-yi-zu-jian-antd-6-1-1-forked-3gsml5?file=%2Fdemo.tsx%3A52%2C36&workspaceId=ws_SZaFiNzC93UXcaQ8qB75mT

  • Tests

    • 增加了自定义组件内容处理的测试覆盖,包括Markdown语法和流式解析场景。
  • Bug Fixes

    • 改进了Markdown解析器对自定义组件内容的处理,确保自定义组件内的文本内容被正确保留为纯文本,不被误解析为Markdown语法。
image

Review Change Stack

@dosubot dosubot Bot added the bug Something isn't working label May 19, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 19, 2026

📝 Walkthrough

总体概览

该PR重构了Markdown自定义标签的保护机制,引入原生HTML标签识别,改进占位符策略,确保自定义组件内容始终作为纯文本透传而不被Markdown解析器处理。

变更

自定义标签保护机制

层级 / 文件 摘要
原生标签识别与占位符基础设施
packages/x-markdown/src/XMarkdown/core/Parser.ts
新增NATIVE_HTML_TAGS常量集合(37-151行)用于区分原生HTML标签,定义CustomTagPlaceholder类型约束占位符结构,调整标签名收集逻辑仅对非原生标签执行保护。
protectCustomTags核心逻辑重写
packages/x-markdown/src/XMarkdown/core/Parser.ts
重写占位符填充与返回分支(376-400行),将完整的开标签、内部内容和闭标签整体封装进占位符,对未闭合标签同样使用占位符返回受保护结果,改进了保护的粒度和一致性。
占位符恢复与parse流程更新
packages/x-markdown/src/XMarkdown/core/Parser.ts
更新restorePlaceholders实现为逐一split/join替换(412-416行),调整parse方法移除条件保护逻辑改为无条件执行保护、解析和恢复的完整流程(477-479行)。
Parser单元测试验证
packages/x-markdown/src/XMarkdown/__tests__/Parser.test.ts
新增和更新protectCustomTagNewlines相关测试用例(84-111、123-131行),验证自定义标签内的Markdown语法保持为纯文本、未闭合标签处理、原生HTML标签不触发保护行为。
集成测试与streaming场景
packages/x-markdown/src/XMarkdown/__tests__/index.test.tsx
新增集成测试验证自定义组件接收的props.children作为纯文本传递(179-200、202-221行),额外覆盖streaming场景下标签未闭合时的正确行为,确保渲染结果不产生不期望的链接元素。

代码评审工作量

🎯 3 (中等) | ⏱️ ~20 分钟

兔语诗篇

🐰 自定义的标签来,占位符守护它,
原生HTML识别清,流保护无条件,
纯文本在里面,链接再也藏不住,
测试全覆盖,代码更稳当!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed 标题准确总结了主要变更:增强了自定义标签和 Markdown 语法的处理逻辑,与所有修改文件的核心内容相符。
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ESLint

If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration.

ESLint skipped: no ESLint configuration detected in root package.json. To enable, add eslint to devDependencies.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request modifies the markdown parser to protect custom component tags from being processed as markdown, ensuring their content is treated as plain text. Key changes include the introduction of a native HTML tag exclusion list, support for protecting unclosed tags during streaming, and a transition to a more robust placeholder system. Feedback highlights a performance concern in the restorePlaceholders method, where a regex-based approach is recommended over the current iterative split().join() implementation. Additionally, it was noted that the protectCustomTagNewlines option is now bypassed, potentially causing unintended breaking changes for users who rely on markdown parsing within custom components.

Comment on lines +412 to +416
let restored = content;
placeholders.forEach((value, placeholder) => {
restored = restored.split(placeholder).join(value);
});
return restored;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The current implementation of restorePlaceholders is inefficient for large documents or many placeholders because it performs a full string traversal (split().join()) for every single placeholder. Since placeholders are unique and non-overlapping, using a single regex replacement is much more performant.

    return content.replace(/\u0000XMDPLACEHOLDER\d+\u0000/g, (match) => {
      return placeholders.get(match) ?? match;
    });

Comment on lines +477 to +479
const { protected: protectedContent, placeholders } = this.protectCustomTags(content);
const parsed = this.markdownInstance.parse(protectedContent) as string;
return this.restorePlaceholders(parsed, placeholders);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The protectCustomTagNewlines option is now effectively ignored as protectCustomTags is called unconditionally in the parse method. This represents a significant change in behavior: markdown syntax will no longer be parsed inside any custom components provided in the components prop.

If this is the intended new default behavior, the protectCustomTagNewlines prop should be deprecated or its documentation updated to reflect that it no longer controls this protection. Otherwise, the logic should respect the flag to allow users to opt-out of this behavior if they want markdown parsing within their custom components.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 19, 2026

Bundle Report

Changes will increase total bundle size by 185.07kB (5.63%) ⬆️⚠️, exceeding the configured threshold of 5%.

Bundle name Size Change
x-markdown-array-push 1.37MB 2.17kB (0.16%) ⬆️
antdx-array-push 2.11MB 182.9kB (9.5%) ⬆️⚠️

Affected Assets, Files, and Routes:

view changes for bundle: x-markdown-array-push

Assets Changed:

Asset Name Size Change Total Size Change (%)
latex.min.js 2.17kB 264.89kB 0.83%
latex.min.css -6 bytes 24.39kB -0.02%
view changes for bundle: antdx-array-push

Assets Changed:

Asset Name Size Change Total Size Change (%)
antdx.min.js 182.9kB 2.11MB 9.5% ⚠️

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
packages/x-markdown/src/XMarkdown/core/Parser.ts (1)

290-405: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

这里的占位符扫描会误伤代码片段,而且开标签匹配也不够稳。

protectCustomTags() 现在直接在原始 markdown 上做全局正则扫描,没有跳过 fenced code / inline code;像 `<Demo>**x**</Demo>` 这类示例会先被替换成占位符,marked 生成 <code>...</code> 后又被 restorePlaceholders() 还原成真实标签,最终代码示例不再被转义。另外,<Demo title="a > b"> 这种属性值里带 > 的开标签也会被提前截断。这里最好改成基于 token 的保护,或者至少先排除 code span / code block 再匹配标签。

Also applies to: 408-416

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/x-markdown/src/XMarkdown/core/Parser.ts` around lines 290 - 405,
protectCustomTags currently scans raw markdown and mis-identifies custom tags
inside code spans/blocks and breaks open-tag matching when attribute values
contain >; update protectCustomTags to first detect and replace fenced code
blocks and inline code spans with temporary placeholders (reused by
restorePlaceholders) before running the custom-tag scan, or switch to a
token-based approach if a Markdown tokenizer is available; also strengthen the
openTagRegex used in protectCustomTags to allow attributes with quoted " or '
characters (e.g., match attributes with a pattern that accepts quoted strings)
so opening tags like <Demo title="a > b"> are not truncated, and continue to use
createPlaceholder / placeholders map for protected content so
restorePlaceholders can reinstate originals.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/x-markdown/src/XMarkdown/__tests__/Parser.test.ts`:
- Around line 123-132: The protectCustomTagNewlines option is unused: either
remove it from ParserOptions and from all constructor invocations (including
tests like the cases in Parser.test.ts that pass protectCustomTagNewlines) and
delete the two redundant tests, or explicitly mark it deprecated by adding a
JSDoc `@deprecated` to the protectCustomTagNewlines field in the
ParserOptions/type and update the tests to assert deprecation (or keep a comment
noting it's noop); ensure you also remove any references in the Parser
constructor signature and any default options handling (symbols to check:
Parser, ParserOptions, protectCustomTagNewlines, protectCustomTags()) so the
code and tests stay consistent.

In `@packages/x-markdown/src/XMarkdown/core/Parser.ts`:
- Around line 477-479: The parser currently always calls
protectCustomTags/restorePlaceholders inside parse(), ignoring the
ParserOptions.protectCustomTagNewlines flag and causing a silent breaking
change; update parse() to read the ParserOptions.protectCustomTagNewlines (or
this.options.protectCustomTagNewlines) and only run the protect/restore flow
(calls to protectCustomTags and restorePlaceholders) when that flag is true,
otherwise skip those calls and pass content straight to markdownInstance.parse;
keep references to the existing methods protectCustomTags, restorePlaceholders,
and the ParserOptions property protectCustomTagNewlines so callers' behavior
remains configurable.

---

Outside diff comments:
In `@packages/x-markdown/src/XMarkdown/core/Parser.ts`:
- Around line 290-405: protectCustomTags currently scans raw markdown and
mis-identifies custom tags inside code spans/blocks and breaks open-tag matching
when attribute values contain >; update protectCustomTags to first detect and
replace fenced code blocks and inline code spans with temporary placeholders
(reused by restorePlaceholders) before running the custom-tag scan, or switch to
a token-based approach if a Markdown tokenizer is available; also strengthen the
openTagRegex used in protectCustomTags to allow attributes with quoted " or '
characters (e.g., match attributes with a pattern that accepts quoted strings)
so opening tags like <Demo title="a > b"> are not truncated, and continue to use
createPlaceholder / placeholders map for protected content so
restorePlaceholders can reinstate originals.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: b6859f1f-9fb7-453f-8f0b-ea6c6e4159fa

📥 Commits

Reviewing files that changed from the base of the PR and between 557c127 and f1cd335.

📒 Files selected for processing (3)
  • packages/x-markdown/src/XMarkdown/__tests__/Parser.test.ts
  • packages/x-markdown/src/XMarkdown/__tests__/index.test.tsx
  • packages/x-markdown/src/XMarkdown/core/Parser.ts

Comment thread packages/x-markdown/src/XMarkdown/__tests__/Parser.test.ts
Comment thread packages/x-markdown/src/XMarkdown/core/Parser.ts
@Div627
Copy link
Copy Markdown
Contributor

Div627 commented May 19, 2026

@meet-student Hi,x-markdown 当前遵循 CommonMark 规范,这里的行为符合预期,并非 bug。根据规范,非块级 HTML 标签中的内容仍会继续按 Markdown 解析

如果你的诉求是让 HTML 标签内部内容按纯文本处理(例如不再继续解析 Markdown),可以考虑增加一个可选配置:开启后基于传入的 components 做白名单过滤,对命中的标签按自定义规则处理。这样既能保持规范兼容,也能覆盖这类特殊场景。

@meet-student
Copy link
Copy Markdown
Member Author

Hi,x-markdown 当前遵循 CommonMark 规范,这里的行为符合预期,并非 bug。根据规范,非块级 HTML 标签中的内容仍会继续按 Markdown 解析

如果你的诉求是让 HTML 标签内部内容按纯文本处理(例如不再继续解析 Markdown),可以考虑增加一个可选配置:开启后基于传入的 components 做白名单过滤,对命中的标签按自定义规则处理。这样既能保持规范兼容,也能覆盖这类特殊场景。

那 定义个 api ??

@meet-student
Copy link
Copy Markdown
Member Author

@meet-student Hi,x-markdown 当前遵循 CommonMark 规范,这里的行为符合预期,并非 bug。根据规范,非块级 HTML 标签中的内容仍会继续按 Markdown 解析

如果你的诉求是让 HTML 标签内部内容按纯文本处理(例如不再继续解析 Markdown),可以考虑增加一个可选配置:开启后基于传入的 components 做白名单过滤,对命中的标签按自定义规则处理。这样既能保持规范兼容,也能覆盖这类特殊场景。

既然已经定义成自定义组件的 html , 是不是就该自定义组件下的内容是纯字符串??? 由用户去处理.

没有自定义组件的 内容中的 标签 html 标签则过滤?

非自定义组件中的 内容遵循 CommonMark 规范,这里的行为符合预期.

@Div627
Copy link
Copy Markdown
Contributor

Div627 commented May 19, 2026

@meet-student Hi,x-markdown 当前遵循 CommonMark 规范,这里的行为符合预期,并非 bug。根据规范,非块级 HTML 标签中的内容仍会继续按 Markdown 解析
如果你的诉求是让 HTML 标签内部内容按纯文本处理(例如不再继续解析 Markdown),可以考虑增加一个可选配置:开启后基于传入的 components 做白名单过滤,对命中的标签按自定义规则处理。这样既能保持规范兼容,也能覆盖这类特殊场景。

既然已经定义成自定义组件的 html , 是不是就该自定义组件下的内容是纯字符串??? 由用户去处理.

没有自定义组件的 内容中的 标签 html 标签则过滤?

非自定义组件中的 内容遵循 CommonMark 规范,这里的行为符合预期.

@meet-student 理解诉求。建议以配置项(如 rawCustomComponents)显式开启,而非默认行为:

  1. 规避 Breaking Change:默认开启会破坏嵌套自定义组件的现有解析逻辑;
  2. 保持规范兼容:普通 HTML 仍遵循 CommonMark,仅命中 components 白名单的标签按纯文本处理。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants