Skip to content

[Repo Assist] Fix Unicode combining character column tracking (#2945)#3344

Merged
nojaf merged 5 commits intomainfrom
repo-assist/fix-issue-2945-unicode-column-tracking-v2-d609b15d49dde407
Apr 15, 2026
Merged

[Repo Assist] Fix Unicode combining character column tracking (#2945)#3344
nojaf merged 5 commits intomainfrom
repo-assist/fix-issue-2945-unicode-column-tracking-v2-d609b15d49dde407

Conversation

@github-actions
Copy link
Copy Markdown
Contributor

🤖 This PR was created by Repo Assist, an automated AI assistant.

Summary

Fixes #2945 — Unicode strings containing combining characters (e.g. diacritics like U+036E, U+0312, U+036B) caused incorrect column tracking in the formatter's WriterModel, leading to wrong line-break decisions.

Root Cause

WriterModel.update advanced the column counter using String.length s, which counts UTF-16 code units. Combining characters have no visual advance — they attach to the preceding base character. A string like "l\u036e\u0312\u036b" has String.length = 4 but visually occupies only 1 column. This caused Fantomas to over-estimate line width and make wrong line-break decisions.

Fix

  • Added String.visualWidth in Utils.fs/Utils.fsi, backed by System.Globalization.StringInfo.LengthInTextElements, which counts grapheme clusters rather than UTF-16 code units.
  • Changed WriterModel.update (Write/WriteTrivia cases in Context.fs) to use String.visualWidth instead of String.length.
  • Added a fast path for pure-ASCII strings to avoid allocating StringInfo on every token write (which is the common case for typical F# source code).

Trade-offs

The fast path means no overhead for ASCII tokens (the common case). StringInfo is only allocated for tokens with non-ASCII characters.

Test Status

  • dotnet build fantomas.sln — ✅ Build succeeded
  • dotnet test src/Fantomas.Core.Tests/ — ✅ 2838 passed, 0 failed, 11 skipped

Two new tests added:

  1. String.visualWidth counts grapheme clusters not UTF-16 code units, 2945 (UtilsTests.fs)
  2. string with Unicode combining characters should not affect formatting decisions, 2945 (StringTests.fs)

Generated by 🌈 Repo Assist, see workflow run. Learn more.

To install this agentic workflow, run

gh aw add githubnext/agentics/workflows/repo-assist.md@7ee2b60744abf71b985bead4599640f165edcd93

Use String.visualWidth (backed by StringInfo.LengthInTextElements) instead
of String.length when advancing the column counter in WriterModel.update.
Combining characters attach to a preceding base character and have no
visual width; counting them as separate code units caused the formatter to
over-estimate line length and make wrong line-break decisions.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

@nojaf nojaf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems legit I guess

@nojaf nojaf marked this pull request as ready for review April 13, 2026 06:45
@nojaf
Copy link
Copy Markdown
Contributor

nojaf commented Apr 13, 2026

/repo-assist format the code

@nojaf nojaf closed this Apr 13, 2026
@nojaf nojaf reopened this Apr 13, 2026
@github-actions
Copy link
Copy Markdown
Contributor Author

github-actions bot commented Apr 13, 2026

✗ Repo Assist encountered failed, see workflow run.

@nojaf nojaf merged commit c8729b1 into main Apr 15, 2026
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Unicode leads to incorrect indentation

1 participant