Skip to content

fix(vi): use Vi-standard word boundary rules for w/b/e motions#1059

Open
sim590 wants to merge 5 commits intonushell:mainfrom
sim590:vim-word-boundaries-fix
Open

fix(vi): use Vi-standard word boundary rules for w/b/e motions#1059
sim590 wants to merge 5 commits intonushell:mainfrom
sim590:vim-word-boundaries-fix

Conversation

@sim590
Copy link
Copy Markdown

@sim590 sim590 commented Apr 21, 2026

Summary

Fixes Vi small-word motions (w, b, e) and their operator variants (dw, de, db, cw, ce, cb, yw, ye, yb) to use standard Vi word boundary rules instead of Unicode UAX #29 segmentation.

Note on scope: The initial goal was to fix Vi word boundary rules (#563, #667), but the existing code shared word boundary functions and EditCommand variants between Vi and Emacs modes. Since Vi and Emacs have fundamentally different word semantics (Vi uses a three-class keyword/punctuation/whitespace system, while Emacs uses Unicode UAX #29 boundaries), a clean fix required decoupling the two paths. This PR introduces dedicated vi_word_* functions and Vi-prefixed EditCommand variants, preserving the existing Emacs behavior untouched. The unprefixed word_* functions are deprecated in favor of explicit emacs_* aliases to prevent future conflation.

Problem

The w, e, and b motions in Vi mode shared the same word boundary functions as Emacs mode, which use unicode_segmentation::split_word_bound_indices() (UAX #29). This treats punctuation inconsistently — for example, in foo.bar, Vi should see three words (foo, ., bar), but reedline treated it as one.

Additionally, inclusive Vi operator commands (de, dE, ce, cE, ye, yE) shared EditCommand variants with exclusive Emacs commands (Alt+d), causing incorrect cut/copy ranges.

Solution

  • Introduced a three-class character classification (Keyword, Punctuation, Whitespace) matching the Vim/POSIX standard, and a vi_word_segments() iterator that segments text accordingly.
  • Fully decoupled Vi and Emacs word boundary paths:
    • Added vi_word_* functions (vi_word_right_index, vi_word_right_start_index, vi_word_right_end_index, vi_word_left_index, vi_current_word_range) using Vi three-class segmentation.
    • Added emacs_* aliases (emacs_word_right_index, emacs_word_right_start_index, etc.) for the existing UAX added a slightly-fancy prompt #29-based functions. The unprefixed word_* variants are deprecated to encourage callers to choose explicitly.
  • Added new EditCommand variants for Vi motions (MoveViWordRightStart, MoveViWordRightEnd, MoveViWordLeft) and Vi-specific cut/copy operations (CutViWordLeft, CopyViWordLeft, CutViWordRightEnd, CutViBigWordRightEnd, CopyViWordRightEnd, CopyViBigWordRightEnd).
  • Updated Vi command dispatch to use the new Vi-specific variants throughout.
  • No changes to Emacs behavior — all Emacs commands continue to use the original UAX added a slightly-fancy prompt #29 word boundaries. All internal callers have been migrated to emacs_* to avoid deprecation warnings.

Tests

  • 7 unit tests for vi_word_segments()
  • 16 parameterized cases for Vi word motions with punctuation (w, e, b)
  • 24 basic/edge-case tests for vi_word_right_start_index, vi_word_right_end_index, vi_word_left_index, vi_word_right_index, and vi_current_word_range
  • 14 cases for Vi cut/copy operations in editor (de, dE, db, ye, yE, yb)
  • All existing Emacs tests unchanged and passing (904 pass, 0 fail)

Note on other open PRs

This PR introduces a structural change to how Vi word motions are handled — the Vi and Emacs word boundary paths are now fully decoupled. Open PRs that touch Vi motions or EditCommand variants (notably #1016 and #960) may need to rebase after this lands. PRs that only touch Emacs mode or other areas are unaffected.

Closes #563, closes #667, addresses #788 (point 2)

sim590 added 5 commits April 21, 2026 00:27
Replace unicode_segmentation word boundaries with a Vi-compatible
three-class system (keyword, punctuation, whitespace) for w/e/b
motions. Separate inclusive Vi cut/copy (de/dE/ce/cE/ye/yE) from
exclusive Emacs word commands (Alt+d) by adding dedicated EditCommand
variants.

Closes: nushell#563, nushell#667
Addresses: nushell#788 (point 2)
…iases

Add emacs_word_right_index, emacs_word_right_end_index,
emacs_word_right_start_index, emacs_word_left_index, and
emacs_current_word_range as the canonical Emacs-path functions.
Deprecate the unprefixed word_* variants to encourage callers to
choose explicitly between emacs_* and vi_* word boundaries.

All internal callers migrated to emacs_* variants.
Cover basic cases (spaces, single word, empty string, edge positions)
for vi_word_right_start_index, vi_word_right_end_index,
vi_word_left_index, vi_word_right_index, and vi_current_word_range.
Add Vi prefix to CutWordRightEnd, CutBigWordRightEnd,
CopyWordRightEnd, and CopyBigWordRightEnd for consistency with
other Vi-specific EditCommand variants.
Add tests for cut_vi_big_word_right_end (dE), copy_vi_word_right_end
(ye), and copy_vi_big_word_right_end (yE) which contain non-trivial
inclusive-end logic via grapheme_right_index_from_pos.
@fdncred
Copy link
Copy Markdown
Contributor

fdncred commented Apr 21, 2026

What LLM did you use? The description is verbose and repetitive which makes me concerned that the code will be the same.

@sim590
Copy link
Copy Markdown
Author

sim590 commented Apr 21, 2026

I'm not sure where the description is repetitive. But the LLM used is Claude Opus 4.6.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

vi mode word definition shouldn't include the period . Incorrect word boundary in VI mode

2 participants