fix(vi): use Vi-standard word boundary rules for w/b/e motions#1059
Open
sim590 wants to merge 5 commits intonushell:mainfrom
Open
fix(vi): use Vi-standard word boundary rules for w/b/e motions#1059sim590 wants to merge 5 commits intonushell:mainfrom
sim590 wants to merge 5 commits intonushell:mainfrom
Conversation
Replace unicode_segmentation word boundaries with a Vi-compatible three-class system (keyword, punctuation, whitespace) for w/e/b motions. Separate inclusive Vi cut/copy (de/dE/ce/cE/ye/yE) from exclusive Emacs word commands (Alt+d) by adding dedicated EditCommand variants. Closes: nushell#563, nushell#667 Addresses: nushell#788 (point 2)
…iases Add emacs_word_right_index, emacs_word_right_end_index, emacs_word_right_start_index, emacs_word_left_index, and emacs_current_word_range as the canonical Emacs-path functions. Deprecate the unprefixed word_* variants to encourage callers to choose explicitly between emacs_* and vi_* word boundaries. All internal callers migrated to emacs_* variants.
Cover basic cases (spaces, single word, empty string, edge positions) for vi_word_right_start_index, vi_word_right_end_index, vi_word_left_index, vi_word_right_index, and vi_current_word_range.
Add Vi prefix to CutWordRightEnd, CutBigWordRightEnd, CopyWordRightEnd, and CopyBigWordRightEnd for consistency with other Vi-specific EditCommand variants.
Add tests for cut_vi_big_word_right_end (dE), copy_vi_word_right_end (ye), and copy_vi_big_word_right_end (yE) which contain non-trivial inclusive-end logic via grapheme_right_index_from_pos.
Contributor
|
What LLM did you use? The description is verbose and repetitive which makes me concerned that the code will be the same. |
Author
|
I'm not sure where the description is repetitive. But the LLM used is Claude Opus 4.6. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes Vi small-word motions (
w,b,e) and their operator variants (dw,de,db,cw,ce,cb,yw,ye,yb) to use standard Vi word boundary rules instead of Unicode UAX #29 segmentation.Note on scope: The initial goal was to fix Vi word boundary rules (#563, #667), but the existing code shared word boundary functions and
EditCommandvariants between Vi and Emacs modes. Since Vi and Emacs have fundamentally different word semantics (Vi uses a three-class keyword/punctuation/whitespace system, while Emacs uses Unicode UAX #29 boundaries), a clean fix required decoupling the two paths. This PR introduces dedicatedvi_word_*functions andVi-prefixedEditCommandvariants, preserving the existing Emacs behavior untouched. The unprefixedword_*functions are deprecated in favor of explicitemacs_*aliases to prevent future conflation.Problem
The
w,e, andbmotions in Vi mode shared the same word boundary functions as Emacs mode, which useunicode_segmentation::split_word_bound_indices()(UAX #29). This treats punctuation inconsistently — for example, infoo.bar, Vi should see three words (foo,.,bar), but reedline treated it as one.Additionally, inclusive Vi operator commands (
de,dE,ce,cE,ye,yE) sharedEditCommandvariants with exclusive Emacs commands (Alt+d), causing incorrect cut/copy ranges.Solution
Keyword,Punctuation,Whitespace) matching the Vim/POSIX standard, and avi_word_segments()iterator that segments text accordingly.vi_word_*functions (vi_word_right_index,vi_word_right_start_index,vi_word_right_end_index,vi_word_left_index,vi_current_word_range) using Vi three-class segmentation.emacs_*aliases (emacs_word_right_index,emacs_word_right_start_index, etc.) for the existing UAX added a slightly-fancy prompt #29-based functions. The unprefixedword_*variants are deprecated to encourage callers to choose explicitly.EditCommandvariants for Vi motions (MoveViWordRightStart,MoveViWordRightEnd,MoveViWordLeft) and Vi-specific cut/copy operations (CutViWordLeft,CopyViWordLeft,CutViWordRightEnd,CutViBigWordRightEnd,CopyViWordRightEnd,CopyViBigWordRightEnd).emacs_*to avoid deprecation warnings.Tests
vi_word_segments()w,e,b)vi_word_right_start_index,vi_word_right_end_index,vi_word_left_index,vi_word_right_index, andvi_current_word_rangede,dE,db,ye,yE,yb)Note on other open PRs
This PR introduces a structural change to how Vi word motions are handled — the Vi and Emacs word boundary paths are now fully decoupled. Open PRs that touch Vi motions or
EditCommandvariants (notably #1016 and #960) may need to rebase after this lands. PRs that only touch Emacs mode or other areas are unaffected.Closes #563, closes #667, addresses #788 (point 2)