fix(parser): enhance handling of custom tags and markdown syntax#1904
fix(parser): enhance handling of custom tags and markdown syntax#1904meet-student wants to merge 1 commit into
Conversation
📝 Walkthrough总体概览该PR重构了Markdown自定义标签的保护机制,引入原生HTML标签识别,改进占位符策略,确保自定义组件内容始终作为纯文本透传而不被Markdown解析器处理。 变更自定义标签保护机制
代码评审工作量🎯 3 (中等) | ⏱️ ~20 分钟 兔语诗篇
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 ESLint
ESLint skipped: no ESLint configuration detected in root package.json. To enable, add Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request modifies the markdown parser to protect custom component tags from being processed as markdown, ensuring their content is treated as plain text. Key changes include the introduction of a native HTML tag exclusion list, support for protecting unclosed tags during streaming, and a transition to a more robust placeholder system. Feedback highlights a performance concern in the restorePlaceholders method, where a regex-based approach is recommended over the current iterative split().join() implementation. Additionally, it was noted that the protectCustomTagNewlines option is now bypassed, potentially causing unintended breaking changes for users who rely on markdown parsing within custom components.
| let restored = content; | ||
| placeholders.forEach((value, placeholder) => { | ||
| restored = restored.split(placeholder).join(value); | ||
| }); | ||
| return restored; |
There was a problem hiding this comment.
The current implementation of restorePlaceholders is inefficient for large documents or many placeholders because it performs a full string traversal (split().join()) for every single placeholder. Since placeholders are unique and non-overlapping, using a single regex replacement is much more performant.
return content.replace(/\u0000XMDPLACEHOLDER\d+\u0000/g, (match) => {
return placeholders.get(match) ?? match;
});| const { protected: protectedContent, placeholders } = this.protectCustomTags(content); | ||
| const parsed = this.markdownInstance.parse(protectedContent) as string; | ||
| return this.restorePlaceholders(parsed, placeholders); |
There was a problem hiding this comment.
The protectCustomTagNewlines option is now effectively ignored as protectCustomTags is called unconditionally in the parse method. This represents a significant change in behavior: markdown syntax will no longer be parsed inside any custom components provided in the components prop.
If this is the intended new default behavior, the protectCustomTagNewlines prop should be deprecated or its documentation updated to reflect that it no longer controls this protection. Otherwise, the logic should respect the flag to allow users to opt-out of this behavior if they want markdown parsing within their custom components.
Bundle ReportChanges will increase total bundle size by 185.07kB (5.63%) ⬆️
Affected Assets, Files, and Routes:view changes for bundle: x-markdown-array-pushAssets Changed:
view changes for bundle: antdx-array-pushAssets Changed:
|
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
packages/x-markdown/src/XMarkdown/core/Parser.ts (1)
290-405:⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift这里的占位符扫描会误伤代码片段,而且开标签匹配也不够稳。
protectCustomTags()现在直接在原始 markdown 上做全局正则扫描,没有跳过 fenced code / inline code;像`<Demo>**x**</Demo>`这类示例会先被替换成占位符,marked生成<code>...</code>后又被restorePlaceholders()还原成真实标签,最终代码示例不再被转义。另外,<Demo title="a > b">这种属性值里带>的开标签也会被提前截断。这里最好改成基于 token 的保护,或者至少先排除 code span / code block 再匹配标签。Also applies to: 408-416
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/x-markdown/src/XMarkdown/core/Parser.ts` around lines 290 - 405, protectCustomTags currently scans raw markdown and mis-identifies custom tags inside code spans/blocks and breaks open-tag matching when attribute values contain >; update protectCustomTags to first detect and replace fenced code blocks and inline code spans with temporary placeholders (reused by restorePlaceholders) before running the custom-tag scan, or switch to a token-based approach if a Markdown tokenizer is available; also strengthen the openTagRegex used in protectCustomTags to allow attributes with quoted " or ' characters (e.g., match attributes with a pattern that accepts quoted strings) so opening tags like <Demo title="a > b"> are not truncated, and continue to use createPlaceholder / placeholders map for protected content so restorePlaceholders can reinstate originals.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@packages/x-markdown/src/XMarkdown/__tests__/Parser.test.ts`:
- Around line 123-132: The protectCustomTagNewlines option is unused: either
remove it from ParserOptions and from all constructor invocations (including
tests like the cases in Parser.test.ts that pass protectCustomTagNewlines) and
delete the two redundant tests, or explicitly mark it deprecated by adding a
JSDoc `@deprecated` to the protectCustomTagNewlines field in the
ParserOptions/type and update the tests to assert deprecation (or keep a comment
noting it's noop); ensure you also remove any references in the Parser
constructor signature and any default options handling (symbols to check:
Parser, ParserOptions, protectCustomTagNewlines, protectCustomTags()) so the
code and tests stay consistent.
In `@packages/x-markdown/src/XMarkdown/core/Parser.ts`:
- Around line 477-479: The parser currently always calls
protectCustomTags/restorePlaceholders inside parse(), ignoring the
ParserOptions.protectCustomTagNewlines flag and causing a silent breaking
change; update parse() to read the ParserOptions.protectCustomTagNewlines (or
this.options.protectCustomTagNewlines) and only run the protect/restore flow
(calls to protectCustomTags and restorePlaceholders) when that flag is true,
otherwise skip those calls and pass content straight to markdownInstance.parse;
keep references to the existing methods protectCustomTags, restorePlaceholders,
and the ParserOptions property protectCustomTagNewlines so callers' behavior
remains configurable.
---
Outside diff comments:
In `@packages/x-markdown/src/XMarkdown/core/Parser.ts`:
- Around line 290-405: protectCustomTags currently scans raw markdown and
mis-identifies custom tags inside code spans/blocks and breaks open-tag matching
when attribute values contain >; update protectCustomTags to first detect and
replace fenced code blocks and inline code spans with temporary placeholders
(reused by restorePlaceholders) before running the custom-tag scan, or switch to
a token-based approach if a Markdown tokenizer is available; also strengthen the
openTagRegex used in protectCustomTags to allow attributes with quoted " or '
characters (e.g., match attributes with a pattern that accepts quoted strings)
so opening tags like <Demo title="a > b"> are not truncated, and continue to use
createPlaceholder / placeholders map for protected content so
restorePlaceholders can reinstate originals.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: b6859f1f-9fb7-453f-8f0b-ea6c6e4159fa
📒 Files selected for processing (3)
packages/x-markdown/src/XMarkdown/__tests__/Parser.test.tspackages/x-markdown/src/XMarkdown/__tests__/index.test.tsxpackages/x-markdown/src/XMarkdown/core/Parser.ts
|
@meet-student Hi,x-markdown 当前遵循 CommonMark 规范,这里的行为符合预期,并非 bug。根据规范,非块级 HTML 标签中的内容仍会继续按 Markdown 解析。 如果你的诉求是让 HTML 标签内部内容按纯文本处理(例如不再继续解析 Markdown),可以考虑增加一个可选配置:开启后基于传入的 components 做白名单过滤,对命中的标签按自定义规则处理。这样既能保持规范兼容,也能覆盖这类特殊场景。 |
那 定义个 api ?? |
既然已经定义成自定义组件的 html , 是不是就该自定义组件下的内容是纯字符串??? 由用户去处理. 没有自定义组件的 内容中的 标签 html 标签则过滤? 非自定义组件中的 内容遵循 CommonMark 规范,这里的行为符合预期. |
@meet-student 理解诉求。建议以配置项(如 rawCustomComponents)显式开启,而非默认行为:
|
中文版模板 / Chinese template
🤔 This is a ...
🔗 Related Issues
💡 Background and Solution
📝 Change Log
Summary by CodeRabbit
发布说明
https://codesandbox.io/p/devbox/zi-ding-yi-zu-jian-antd-6-1-1-forked-3gsml5?file=%2Fdemo.tsx%3A52%2C36&workspaceId=ws_SZaFiNzC93UXcaQ8qB75mT
Tests
Bug Fixes