Markdown: fix table formatting alignment for CJK and full-width characters#3478
Markdown: fix table formatting alignment for CJK and full-width characters#3478sxin0 wants to merge 2 commits intoJetBrains:masterfrom
Conversation
…cters CJK characters (Chinese, Japanese, Korean) occupy two display-width units in monospace fonts, but the table formatter used string length (byte count) for column width calculation. This caused misaligned columns when tables contained CJK characters. - Add TableCharacterWidthUtils for East Asian Width-aware display width calculation - Update TableFormattingUtils and TableModificationUtils to use display width instead of string length - Add MarkdownCharacterGridCustomizer to enable Character Grid mode in the editor for Markdown files, ensuring CJK characters render at exactly 2x ASCII character width - Add unit tests for width calculation and integration test for Chinese table formatting Co-Authored-By: jiangshengxin <jiangshengxin@tal.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
| codePoint in 0x1F1E0..0x1F1FF -> true // Regional Indicator Symbols | ||
|
|
||
| else -> false | ||
| } |
There was a problem hiding this comment.
Missing CJK Symbols and Punctuation full-width range
Medium Severity
isFullWidthCharacter is missing the CJK Symbols and Punctuation range (U+3000–U+303F), which contains frequently-used East Asian Wide characters like ideographic space (U+3000), 、(U+3001), 。(U+3002), 「」brackets, and 〈〉angle brackets. These fall through to getCharacterWidth's else -> 1 default instead of returning width 2. Tables containing common CJK punctuation will still have misaligned columns. Since this function also drives the editor's DoubleWidthCharacterStrategy, the mismatch would compound in rendering.
| codePoint in 0xA960..0xA97F -> true | ||
|
|
||
| // Hangul Jamo Extended-B | ||
| codePoint in 0xD7B0..0xD7FF -> true |
There was a problem hiding this comment.
Hangul Jamo range over-classifies narrow characters as wide
Low Severity
The Hangul Jamo range 0x1100..0x11FF is entirely classified as full-width, but per the Unicode East Asian Width property only 0x1100..0x115F (leading consonants) are Wide — characters 0x1160..0x11FF (vowels and trailing consonants) are Neutral/narrow. Similarly, Hangul Jamo Extended-B (0xD7B0..0xD7FF) is entirely Neutral but classified as full-width here. This over-counts the display width for these characters.
- Add missing CJK Symbols and Punctuation range (U+3000-U+303F) - Narrow Hangul Jamo to U+1100-U+115F (only leading consonants are Wide) - Remove Hangul Jamo Extended-B (U+D7B0-U+D7FF, entirely Neutral) - Add tests for CJK punctuation characters
|
You have used all of your free Bugbot PR reviews. To receive reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial. |


Summary
String.lengthfor column width calculation, causing misaligned columnsTableCharacterWidthUtilsfor East Asian Width-aware display width calculationTableFormattingUtilsandTableModificationUtilsto use display width instead of string lengthMarkdownCharacterGridCustomizerto enable Character Grid mode for Markdown files in the editor, ensuring CJK characters render at exactly 2x ASCII character width (same mechanism used by the terminal)Before / After
Before fix (columns misaligned with CJK characters):
After fix (columns properly aligned):
Test plan
TableCharacterWidthUtilsTest— unit tests for width calculation (ASCII, CJK, mixed, emoji, Japanese, Korean, full-width ASCII)MarkdownTablePostFormatProcessorTest#chinese table test— integration test for Chinese table formatting.mdfile, paste a table with CJK characters, run Reformat Code (Opt+Cmd+L), verify column alignment