Skip to content

Hyphenate Braille using pyphen#19916

Open
LeonarddeR wants to merge 32 commits intonvaccess:masterfrom
LeonarddeR:pyphen
Open

Hyphenate Braille using pyphen#19916
LeonarddeR wants to merge 32 commits intonvaccess:masterfrom
LeonarddeR:pyphen

Conversation

@LeonarddeR
Copy link
Copy Markdown
Collaborator

@LeonarddeR LeonarddeR commented Apr 7, 2026

Link to issue number:

Closes #17010

Summary of the issue:

Word wrap is sometimes pretty aggressive, especially on shorter braille displays.

Description of user facing changes:

The boolean "word wrap" option in the braille settings has been replaced with a four-valued Text wrap option, giving finer-grained control over how words are broken when they don't fit on the display. The four choices are:

  • Off — Wrap at the raw edge of the display, cutting words in the middle if necessary. No visual indication that a word was cut.
  • Show mark when words are cut — Wrap at the raw edge, but whenever a word is cut mid-way, replace the last cell of the row with a continuation mark (braille dots 7-8) so the reader knows the word continues on the next row.
  • At word boundaries — Prefer breaking at spaces. If no space fits on the row, fall back to cutting the word and showing the continuation mark.
  • At word or syllable boundaries — As above, but when a word is too long to fit, try to split it at a syllable boundary (using hyphenation dictionaries from the pyphen library) so less of the word spills onto the next row. NVDA marks the split with braille dots 7-8, not a printed hyphen, because braille conventions use word division rather than print-style hyphenation.

Whenever a word is cut mid-way across rows — regardless of which mode is selected — the cut is now marked with the continuation symbol. This makes it easy to tell at a glance whether a row ends cleanly at a space or carries over into the next row.

Existing user profiles with the old wordWrap = True / wordWrap = False setting are automatically upgraded: True becomes "At word boundaries" and False becomes "Off".

Description of developer facing changes:

The deprecated braille.wordWrap boolean is bridged to the new braille.textWrap feature flag in both directions via _linkDeprecatedValues, so add-ons that still read or write the old key keep working (with a deprecation warning).

Description of development approach:

  • Feature flag. Added BrailleTextWrapFlag with members DEFAULT, NONE, MARK_WORD_CUTS, AT_WORD_BOUNDARIES, AT_WORD_OR_SYLLABLE_BOUNDARIES. The default behaviour is AT_WORD_OR_SYLLABLE_BOUNDARIES.
  • Unified continuation marker. The continuation mark consistently means "a word was cut here" across all modes.
  • Hyphenation module. New textUtils.hyphenation module wraps the pyphen library. getHyphenPositions(text, locale) returns an empty tuple for locales without a pyphen dictionary (logging once at debug level per locale), so the wrap logic falls back cleanly to word-boundary behaviour without raising.
  • Region language tracking. Region._languageIndexes records language changes within a braille region so hyphenation can be performed in the correct language when regions contain multilingual content.
  • Frozen builds. A py2exe hook (_hook_pyphen in source/setup.py) bundles pyphen's *.dic files into dist/pyphenDictionaries/ and rewrites pyphen's dictionary lookup path at freeze time. Only the .dic files are included — README files are skipped.
  • Profile upgrade. upgradeConfigFrom_22_to_23 maps the old wordWrap boolean to the new textWrap string enum.

Testing strategy:

Automated unit tests cover:

  • All four wrap modes in _calculateWindowRowBufferOffsets, including the case where no whitespace fits on the row, the syllable-boundary success path, the fallback when no syllable boundary fits before the display edge, and the unknown-language case.
  • Continuation-marker rendering in _get_windowBrailleCells.
  • Region language-index bookkeeping: default language lookup, _addFieldText inserting switch/restore entries when a field is in a different language, _addTextWithFields handling a formatChange command with a language attribute, and TextInfoRegion.update resetting the language index across updates.
  • textUtils.hyphenation.getHyphenPositions for both a known language (en_US) and an unknown one (returns () without raising).

Manual testing: loaded a pre-upgrade profile with wordWrap = True/False and confirmed the profile upgrade writes the expected textWrap value and that the braille settings panel shows the matching label; confirmed scons.bat dist produces dist/pyphenDictionaries/ containing only hyph_*.dic files.

Known issues with pull request:.

Unit tests were written by AI and are a bit difficult to parse, though the behavior has been manually tested too and the unit tests ensure that the behavior stays stable.

Code Review Checklist:

  • Documentation:
    • Change log entry
    • User Documentation
    • Developer / Technical Documentation
    • Context sensitive help for GUI changes
  • Testing:
    • Unit tests
    • System (end to end) tests
    • Manual testing
  • UX of all users considered:
    • Speech
    • Braille
    • Low Vision
    • Different web browsers
    • Localization in other languages / culture than English
  • API is compatible with existing add-ons.
  • Security precautions taken.

@seanbudd seanbudd added the conceptApproved Similar 'triaged' for issues, PR accepted in theory, implementation needs review. label Apr 10, 2026
@seanbudd seanbudd requested a review from SaschaCowley April 10, 2026 00:50
@cary-rowen
Copy link
Copy Markdown
Contributor

Once the Chinese word segmentation PR is merged, will it be possible to use its rules to handle Chinese line breaks within this new text wrap framework

@LeonarddeR
Copy link
Copy Markdown
Collaborator Author

I have no idea honestly. That is something we'd need to find out after that is merged.

Replace BrailleTextWrap IntEnum with BrailleTextWrapFlag feature flag stored via featureFlag config spec, mirroring reviewRoutingMovesSystemCaret. Rename members to NONE, MARK_WORD_CUTS, AT_WORD_BOUNDARIES, AT_WORD_OR_SYLLABLE_BOUNDARIES for clarity (braille uses word division, not print hyphenation). Unify continuation-marker semantics under rule A: the marker now fires on any mid-word row end regardless of mode, including the no-whitespace fallback in AT_WORD_BOUNDARIES/AT_WORD_OR_SYLLABLE_BOUNDARIES. Handle unknown languages gracefully in getHyphenPositions by returning an empty tuple and logging once per locale. Update profile upgrade, deprecation bridge for wordWrap, settings dialog (FeatureFlagCombo), and user guide.
… region language, and hyphenation

Update test_calculateWindowRowBufferOffsets for the renamed BrailleTextWrapFlag feature flag and add tests #1-nvaccess#8 covering NONE, MARK_WORD_CUTS, AT_WORD_BOUNDARIES (including the rule-A marker fix for the no-whitespace fallback), and AT_WORD_OR_SYLLABLE_BOUNDARIES (success, empty positions, past-edge position, unknown language). Add test_windowBrailleCells for CONTINUATION_SHAPE rendering (nvaccess#9-nvaccess#10). Add test_regionLanguageIndexes for Region._languageIndexes defaults, _addFieldText switch/restore entries, _addTextWithFields formatChange language handling, and TextInfoRegion.update reset (nvaccess#11-nvaccess#14). Add test_hyphenation for getHyphenPositions with known and unknown locales (nvaccess#15-nvaccess#16).
@LeonarddeR LeonarddeR changed the title Proof of concept: Hyphenate Braille using pyphen Hyphenate Braille using pyphen Apr 20, 2026
@LeonarddeR
Copy link
Copy Markdown
Collaborator Author

I guess that when the chinese work is merged, we can fallback to that in the hyphenation module.

LeonarddeR and others added 6 commits April 25, 2026 14:41
Patch auto-properties (rawToBraillePos, brailleToRawPos) on the buffer
instance instead of the class — they are non-data Getter descriptors via
AutoPropertyObject, so instance attributes shadow them directly. Add
comments explaining the mocking strategy for syllable-boundary isolation
and the side_effect=RuntimeError pattern for halting update() mid-method.

Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
…ject

rawToBraillePos/brailleToRawPos are non-data Getter descriptors, so
instance assignment shadows them directly. Cleanup in tearDown.

Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@LeonarddeR LeonarddeR marked this pull request as ready for review May 2, 2026 14:50
@LeonarddeR LeonarddeR requested review from a team as code owners May 2, 2026 14:50
@LeonarddeR LeonarddeR requested review from Qchristensen and Copilot May 2, 2026 14:50
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a new braille “Text wrap” setting with optional continuation markers and syllable-aware wrapping (via pyphen), replacing the old boolean wordWrap while preserving add-on compatibility through deprecated-key bridging.

Changes:

  • Add BrailleTextWrapFlag feature-flag setting and update braille wrapping logic to support 4 wrap modes plus a continuation indicator.
  • Add locale-aware hyphenation support via new textUtils.hyphenation wrapper around pyphen, including py2exe bundling for frozen builds.
  • Update GUI, documentation, config schema upgrade, and add unit tests for wrap behavior, continuation rendering, hyphenation, and language tracking.

Reviewed changes

Copilot reviewed 16 out of 17 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
uv.lock Adds pyphen dependency to the locked environment.
pyproject.toml Declares pyphen dependency for builds.
source/textUtils/hyphenation.py New hyphenation utility module wrapping pyphen with locale fallback/logging.
source/braille.py Implements new wrap modes, continuation marker rendering, and region language tracking for correct hyphenation locale.
source/config/featureFlagEnums.py Adds BrailleTextWrapFlag enum with display strings for the GUI.
source/config/configSpec.py Bumps schema version and adds braille.textWrap featureFlag (keeps deprecated wordWrap).
source/config/profileUpgradeSteps.py Adds upgrade step mapping old wordWrap to new textWrap.
source/config/__init__.py Enables and implements deprecated config key bridging between wordWrap and textWrap.
source/gui/settingsDialogs.py Replaces old checkbox with a FeatureFlagCombo for Text wrap.
source/louisHelper.py Adds helper to get braille table language for default region language.
source/setup.py Adds py2exe hook to bundle pyphen dictionaries and rewrite lookup path in frozen builds.
user_docs/en/userGuide.md Documents the new Text wrap setting and its behaviors.
user_docs/en/changes.md Adds changelog and deprecations notes for text wrap changes.
tests/unit/test_hyphenation.py Tests hyphenation positions for known/unknown locales.
tests/unit/test_braille/test_calculateWindowRowBufferOffsets.py Expands tests to cover all wrap modes and syllable-boundary behavior.
tests/unit/test_braille/test_windowBrailleCells.py Tests continuation marker rendering in window cells.
tests/unit/test_braille/test_regionLanguageIndexes.py Tests language index tracking for multilingual regions.

Comment thread source/braille.py
Comment thread source/textUtils/hyphenation.py Outdated
Comment thread source/braille.py
Comment thread source/braille.py
Comment thread source/braille.py Outdated
Comment thread source/braille.py Outdated
Comment thread source/braille.py Outdated
Comment thread source/config/__init__.py Outdated
Comment thread user_docs/en/userGuide.md Outdated
Comment thread source/braille.py
LeonarddeR and others added 3 commits May 2, 2026 18:26
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
@LeonarddeR LeonarddeR marked this pull request as draft May 2, 2026 16:33
LeonarddeR and others added 5 commits May 2, 2026 18:34
- braille.py: guard MARK_WORD_CUTS continuation mark behind `end < bufferEnd`
  to prevent a phantom mark on the final row when the buffer ends exactly at
  the display edge
- config/__init__.py: fix wordWrap→textWrap bridge writing a raw string into
  _cache; now validates through the spec so the cache holds a proper
  FeatureFlag object, matching what __setitem__ normally stores
- userGuide.md: rephrase "Off" description — text is cut at the display edge
  (not "not wrapped"), just without a continuation mark
- test_calculateWindowRowBufferOffsets.py: fix two tests that expected end
  positions without room for the continuation marker; _get_windowBrailleCells
  only appends the marker when remaining > 0, so end must be numCols - 1
  to leave space

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Introduce cacheVal alongside val so the wordWrap→textWrap bridge can
store a string in the profile and a validated FeatureFlag in the cache
without duplicating the _getUpdateSection/_cache calls or returning early.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@LeonarddeR LeonarddeR marked this pull request as ready for review May 5, 2026 21:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

conceptApproved Similar 'triaged' for issues, PR accepted in theory, implementation needs review.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add hyphenation support to braille using Pyphen

5 participants