Skip to content

Markdown: fix table formatting alignment for CJK and full-width characters#3478

Open
sxin0 wants to merge 2 commits intoJetBrains:masterfrom
sxin0:fix/markdown-cjk-table-alignment
Open

Markdown: fix table formatting alignment for CJK and full-width characters#3478
sxin0 wants to merge 2 commits intoJetBrains:masterfrom
sxin0:fix/markdown-cjk-table-alignment

Conversation

@sxin0
Copy link
Copy Markdown

@sxin0 sxin0 commented Apr 3, 2026

Summary

  • CJK characters (Chinese, Japanese, Korean) occupy two display-width units in monospace fonts, but the Markdown table formatter used String.length for column width calculation, causing misaligned columns
  • Add TableCharacterWidthUtils for East Asian Width-aware display width calculation
  • Update TableFormattingUtils and TableModificationUtils to use display width instead of string length
  • Add MarkdownCharacterGridCustomizer to enable Character Grid mode for Markdown files in the editor, ensuring CJK characters render at exactly 2x ASCII character width (same mechanism used by the terminal)
  • Add unit tests and integration test with Chinese table test data

Before / After

Before fix (columns misaligned with CJK characters):

IntelliJ IDEA 2026-04-03 23 04 00

After fix (columns properly aligned):

PhpStorm 2026-04-03 23 04 15

Test plan

  • TableCharacterWidthUtilsTest — unit tests for width calculation (ASCII, CJK, mixed, emoji, Japanese, Korean, full-width ASCII)
  • MarkdownTablePostFormatProcessorTest#chinese table test — integration test for Chinese table formatting
  • Manual: open a .md file, paste a table with CJK characters, run Reformat Code (Opt+Cmd+L), verify column alignment

…cters

CJK characters (Chinese, Japanese, Korean) occupy two display-width
units in monospace fonts, but the table formatter used string length
(byte count) for column width calculation. This caused misaligned
columns when tables contained CJK characters.

- Add TableCharacterWidthUtils for East Asian Width-aware display
  width calculation
- Update TableFormattingUtils and TableModificationUtils to use
  display width instead of string length
- Add MarkdownCharacterGridCustomizer to enable Character Grid mode
  in the editor for Markdown files, ensuring CJK characters render
  at exactly 2x ASCII character width
- Add unit tests for width calculation and integration test for
  Chinese table formatting

Co-Authored-By: jiangshengxin <jiangshengxin@tal.com>
Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Fix All in Cursor

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

codePoint in 0x1F1E0..0x1F1FF -> true // Regional Indicator Symbols

else -> false
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing CJK Symbols and Punctuation full-width range

Medium Severity

isFullWidthCharacter is missing the CJK Symbols and Punctuation range (U+3000–U+303F), which contains frequently-used East Asian Wide characters like ideographic space (U+3000), 、(U+3001), 。(U+3002), 「」brackets, and 〈〉angle brackets. These fall through to getCharacterWidth's else -> 1 default instead of returning width 2. Tables containing common CJK punctuation will still have misaligned columns. Since this function also drives the editor's DoubleWidthCharacterStrategy, the mismatch would compound in rendering.

Fix in Cursor Fix in Web

codePoint in 0xA960..0xA97F -> true

// Hangul Jamo Extended-B
codePoint in 0xD7B0..0xD7FF -> true
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hangul Jamo range over-classifies narrow characters as wide

Low Severity

The Hangul Jamo range 0x1100..0x11FF is entirely classified as full-width, but per the Unicode East Asian Width property only 0x1100..0x115F (leading consonants) are Wide — characters 0x1160..0x11FF (vowels and trailing consonants) are Neutral/narrow. Similarly, Hangul Jamo Extended-B (0xD7B0..0xD7FF) is entirely Neutral but classified as full-width here. This over-counts the display width for these characters.

Fix in Cursor Fix in Web

- Add missing CJK Symbols and Punctuation range (U+3000-U+303F)
- Narrow Hangul Jamo to U+1100-U+115F (only leading consonants are Wide)
- Remove Hangul Jamo Extended-B (U+D7B0-U+D7FF, entirely Neutral)
- Add tests for CJK punctuation characters
@cursor
Copy link
Copy Markdown

cursor bot commented Apr 8, 2026

You have used all of your free Bugbot PR reviews.

To receive reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.

@sxin0
Copy link
Copy Markdown
Author

sxin0 commented Apr 8, 2026

@JB-Dmitry

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants