diff --git a/.gitignore b/.gitignore index 8bd1de02..728eda4f 100644 --- a/.gitignore +++ b/.gitignore @@ -52,3 +52,9 @@ ktrace.out mcsh.man test.c po/ + +# Security reports and local analysis +.gemini_security/ + +# AI/Agent Planning artifacts +conductor/ diff --git a/ISSUES.md b/ISSUES.md index 94110fab..eb5fb50b 100644 --- a/ISSUES.md +++ b/ISSUES.md @@ -8,7 +8,7 @@ See `PLAN.md` for the full phased execution plan derived from this log. --- -## Completed work (2026-04-22, round 5 — PR #4 Copilot review fixes) +## Completed work (2026-04-21, round 5 — PR #4 Copilot review fixes) ### PR #4 Copilot inline review comments resolved ✓ @@ -31,7 +31,7 @@ See `PLAN.md` for the full phased execution plan derived from this log. --- -## Completed work (2026-04-22, round 4 — upstream carry-forward sweep) +## Completed work (2026-04-21, round 4 — upstream carry-forward sweep) ### Upstream tcsh-org/tcsh bug fixes applied ✓ @@ -83,7 +83,6 @@ Items **not applied** (rejected upstream or out of scope): Items **still open upstream and tracked in mcsh** (see "Remaining open items"): - **#119** — `unshare --user --pid` hang (critical) -- **#117 / #121** — Unicode/wide-char regression (critical) - **#93** — `ls-F` colour with `CLICOLOR_FORCE` (low) - **#102 / #82** — Acute accent lintian; man page pipe workaround (low) - **#123** — Syntax improvement/alias multi-line (feature request; tracking) @@ -91,7 +90,7 @@ Items **still open upstream and tracked in mcsh** (see "Remaining open items"): --- -## Completed work (2026-04-22, round 3 — PR3 CodeRabbit round-2 review fixes) +## Completed work (2026-04-21, round 3 — PR3 CodeRabbit round-2 review fixes) ### Phase 8 (round 3) — CodeRabbit PR3 review fixes ✓ @@ -351,9 +350,6 @@ integration (no raw ESC bypass). - **#119** (`sh.proc.c`) — `unshare --user --pid` hang. Fork retry loop sleeps with interrupts disabled. Fix: use `SIGALRM`-based timeout or `nanosleep` with signal unblocking. -- **#117 / #121** (`sh.lex.c`, `sh.dol.c`) — Unicode regression: emoji/wide - chars stripped from filenames and variable assignments since 6.24.14. Root - cause: byte vs. character length confusion in the wide-string path. - **#93** (`tw.color.c`) — `ls-F` colour failures with `CLICOLOR_FORCE`, `LSCOLORS`, `LS_COLORS`. Audit colour detection and environment-variable precedence. @@ -367,12 +363,6 @@ integration (no raw ESC bypass). wide-character input or terminal resize. Full fix: integrate ghost rendering into the `Refresh()` pipeline. -### 4. Test suite - -- `tests/` not yet initialised. Minimum suite required: startup file order, - `$mcsh`/`$tcsh` variable correctness, unicode filename round-trip, expression - overflow, job-count prompt, `cd -N` stack navigation. - ### 5. Scope of this consolidation push Present on the branch: @@ -392,6 +382,198 @@ Present on the branch: Explicitly deferred / excluded: - **Native Windows support** — dropped. -- **Test suite (`tests/`)** — deferred. - **Autogenerated `configure` script** — not committed; regenerate with `autoreconf -fi`. + +--- + +## Round 6 — Review response: three flagged weaknesses (dev4) (2026-04-21) + +Addresses all findings from the deep-dive analysis (paste_1 / paste_2) and +Gemini PR #5 inline comments. + +### 1. Short-circuit evaluation (`sh.dol.c`) + +**Root cause (confirmed):** `Dfix()` in `sh.sem.c` expands all `$` tokens +*before* `doif` calls `expr()`. The expression evaluator in `sh.exp.c` already +implements correct `TEXP_IGNORE` short-circuit at the `exp0`/`exp1` level, but +`Dfix` runs unconditionally before evaluation, so `"$a"` in +`if ($?a && "$a" != "")` threw `ERR_UNDVAR` before `&&` could suppress it. + +**Fix (`sh.dol.c:643`):** In `Dgetdol()`, when a variable is unset and not +found in the environment, instead of calling `udvar()` (which throws +`ERR_UNDVAR`), set `dolp = STRNULL` and jump to `eatbrac`. This makes unset +`$varname` silently expand to `""` — matching bash/zsh double-quote semantics +— so the expression evaluator receives `"" != ""` (false) rather than dying. +`$?varname` continues to work correctly via the existing `bitset` path. + +**Test:** `t003_shortcircuit.sh` — `unset a; if ($?a && "$a" != "") echo yes` +must produce no output and exit 0. + +### 2. Unicode regression (`sh.lex.c`, `sh.dol.c`) — fixed + +**Scope:** Inherited from tcsh 6.24.14. Byte-vs-character length confusion in +the wide-string expansion path caused multi-byte characters (emoji, CJK, Latin +Extended) to be dropped or corrupted during filename glob expansion and variable +assignment. Affected any locale where `MB_CUR_MAX > 1`. + +**Affected upstream issues:** tcsh #117, #121. + +**Root cause:** Two `mbtowc` accumulation loops compared the partial-byte count +against `MB_LEN_MAX` (compile-time worst case across all locales, 16 on glibc) +instead of `MB_CUR_MAX` (runtime maximum for the current locale, 4 for UTF-8). +When `mbtowc` returned `-1` for a stray invalid byte, the loop continued reading +up to 15 additional bytes of lookahead before giving up, swallowing valid +multi-byte sequences that immediately followed. + +**Fix (two-line change):** +- `sh.lex.c` `wide_read()` — `(partial - i) < MB_LEN_MAX` → `(partial - i) < + (size_t)MB_CUR_MAX`. Covers script-file reads, stdin pipes, and backquote + command substitution. +- `sh.dol.c` `Dgetdol()` `$<` accumulation loop — `cbp < MB_LEN_MAX` → `cbp < + (size_t)MB_CUR_MAX`. Covers the `$<` line-read primitive. + +The corrected pattern matches the existing reference implementation at +`ed.inputl.c:814`. Buffer declarations (`char cbuf[MB_LEN_MAX]`) are unchanged +because they must size for the worst case across all platforms. + +**Tests:** `tests/t009_unicode_vars.sh` through `tests/t014_unicode_script_source.sh` +cover variable round-trip, `$%` character count, glob expansion, `$<` stdin read, +backquote substitution, invalid-byte recovery, and sourced-script Unicode. + +### 3. Test suite — initial suite created (`tests/`) + +`tests/` directory created with 8 regression scripts and a `Makefile`: + +| Script | What it tests | +|--------|---------------| +| `t001_vars.sh` | `$mcsh` and `$tcsh` are set on startup | +| `t002_overflow.sh` | `@ x = (1 << 31)` yields `2147483648` (unsigned left-shift) | +| `t003_shortcircuit.sh` | `$?a && "$a" != ""` is silent when `$a` unset | +| `t004_pipe_to_var.sh` | `echo foo \| set x` assigns `x=foo` | +| `t005_cd_stack.sh` | `pushd`/`cd -1` navigates directory stack correctly | +| `t006_function_builtin.sh` | `function` builtin stores and executes body | +| `t007_arith_rsh.sh` | `@ x = (-8 >> 1)` yields `-4` (signed right-shift) | +| `t008_unset_modifiers.sh` | `${unset:h}` and `$#unset` don't error when var is unset | + +Run with: `make -C tests MCSH=./mcsh check` + +### 4. Gemini PR #5 inline comment — `cache_store()` goto removed (`ed.syntax.c`) + +Rewrote `cache_store()` with two explicit loops: first pass finds an empty +slot; second pass (only if needed) scans for the LRU victim. No `goto`. Logic +and semantics are identical; the victim variable is initialised to `-1` so the +first pass's early-break is the only way it gets set to a valid index before +the second pass. + +### 5. Gemini PR #5 inline comment — magic number `2` replaced (`tc.prompt.c`) + +Added `#define GIT_POLL_INTERVAL 2` near the top of the file and replaced the +literal `2` in the throttle check with the named constant. + +--- + +## Round 7 — PR #5 Copilot + Gemini review response (2026-04-21) + +### 1. `sh.dol.c` — unset variable modifier handling fixed + +**Copilot + Gemini finding:** The unset-variable expansion path jumped directly to +`eatbrac` without calling `fixDolMod()`, causing `${unset:h}` and similar +modifier expressions to crash with "Missing }" because the `:h` was left in +the input stream. Also, `$#unset` (dimen) and `$%unset` (length) did not return +a sensible value. + +**Fix:** Call `fixDolMod()` before branching to `eatbrac`, consume modifiers +properly, and return `0` for both `$#unset` and `$%unset` (consistent with +treating an unset variable as empty/zero-length). The comment now correctly +states this applies to all variable expansions, not only double-quoted ones. + +**Test:** `t008_unset_modifiers.sh` — `${unset:h}` must not error; `$#unset` must yield `0`. + +### 2. `tests/run_tests.sh` — portability hardening + +**Copilot findings:** +- Header comment incorrectly stated scripts "print PASS or FAIL"; they actually + exit 0/non-zero with optional failure output. +- Glob `t*.sh` could iterate the literal pattern on a `/bin/sh` with no + matching files; now guarded with `set -- t*.sh; [ -e "$1" ] || exit`. +- `echo "$result"` with arbitrary content is non-portable (leading `-n` or + backslash sequences); replaced with `printf '%s\n' "$result"` throughout. + +### 3. `tests/t006_function_builtin.sh` — mktemp portability + +**Copilot finding:** `mktemp /tmp/t006.XXXXXX.csh` fails on BSD/macOS because +`mktemp` requires the template to end with X characters (suffixes after the X +block are rejected). Removed `.csh` suffix — the shell interpreter is set by +the heredoc content, not the filename. + +### 4. `tests/Makefile` — was already present + +The `tests/Makefile` was created in Round 6 and supports `make check` and +`make MCSH=/path/to/mcsh check`. All documentation references to +`tests/run_tests.sh` are therefore accurate. + +### 5. `ed.syntax.c` — syntax highlighting improvements + +- Fixed `in_table()`: removed unused loop variable `i` (loop is pointer-based). +- Fixed `ST_VARIABLE` state machine: `$$`, `$!`, `$<` are single-character + special variables and now correctly transition to `ST_NORMAL` after being + coloured. Previously the redundant inner check re-tested `buf[i]` (same as + `ch`) causing `$$` to sometimes stay in variable state. +- Extended redirection operator colouring to cover `>!`, `>>!`, `>|`, `>>&` + (noclobber-override and append-noclobber forms). + +### 6. `ed.chared.c` — command- and file-aware predictive autocomplete + +`predict_from_history()` now falls through to two additional predictors when +no history match is found: + +- **`predict_file()`** — fires when the current word starts with `/`, `./`, or + `~/`. Splits the word into directory + basename prefix, does a single + `opendir()` scan, and sets GhostBuf to the unique suffix. Directories get a + trailing `/`. Ambiguous or no matches produce no ghost. +- **`predict_cmd()`** — fires when the word is at the command position in the + input line (no prior non-space characters, or immediately after `;`/`|`/`&`). + Scans all `$PATH` directories for a uniquely matching executable and sets + GhostBuf to the suffix. + +Priority: history > file path > command name. Existing Tab completion is +unchanged; the new predictors are ghost-text only (accept with right-arrow / +Ctrl-F). + +### 7. `tests/t008_unset_modifiers.sh` — new regression test + +Covers the `${unset:h}` modifier fix and `$#unset` == 0 behaviour. + +--- + +## Round 9 — PR #5 review response part 2 (Apr 2026) + +### 1. `ed.syntax.c` — redirection coloring tightened + +**CodeRabbit finding:** Redirection continuation loop was over-broad for `<`. +Fixed to only allow `!` and `|` when the opening operator is `>`. Both still +accept `&`, `-`, `>`, and `<`. + +### 2. `ed.chared.c` — predictive completion enhancements + +- **Caching:** Added caching for `predict_file()` and `predict_cmd()` to avoid + redundant filesystem scans on every keystroke. `f_cache` tracks directory + mtime; `c_cache` tracks the `$PATH` string. +- **User Toggle:** All predictive logic gated behind `set predict` shell + variable. +- **`~user` Expansion:** `predict_file()` now supports `~user/` expansion via + `getpwnam()`. +- **Empty PATH components:** `predict_cmd()` now treats empty components in + `$PATH` as the current directory (`.`), matching `cmd_on_path()`. + +### 3. Test suite hardening + +- **`tests/run_tests.sh`:** Now recognizes exit code `77` as `SKIP` and reports + it in the summary. Fixed a literal newline in an error message. +- **`tests/lib_locale.sh`:** Updated to use portable ERE (`grep -E`) and exit + code `77` for skips. +- **`tests/t006_function_builtin.sh`:** Added cleanup `trap` and exit status + verification. +- **`tests/t008_unset_modifiers.sh`:** Escaped `$` in failure message and + switched to portable `grep -E`. diff --git a/PLAN.md b/PLAN.md index b72a1bff..08dc88de 100644 --- a/PLAN.md +++ b/PLAN.md @@ -153,7 +153,7 @@ Status: **partial** | **#99** | High | `configure.ac`, `tc.func.c` | `undefined reference to 'crypt'` on modern glibc. Fix: `AC_SEARCH_LIBS([crypt], [crypt xcrypt])`. | | **#101** (PR) | Medium | `sh.exp.c` | Signed integer overflow: `@ x = (1 << 63)` raises "Badly formed number". Fix: unsigned arithmetic with overflow detection. | | **#110** | Medium | `tc.prompt.c` | `%j` job-count in prompt counted all proclist entries. Fix: counts only live job leaders (`p_procid == p_jobid` && `PRUNNING\|PSTOPPED`). | -| **#107** (PR) | Medium | `sh.exp.c`, `sh.sem.c` | `$?a && "$a" != ""` throws if `a` is unset. Fix: `Dfix()` skips expansion for expression-evaluating builtins; expansion deferred until after short-circuit. | +| **#107** (PR) | Medium | `sh.exp.c`, `sh.dol.c` | `$?a && "$a" != ""` throws if `a` is unset. Fix: `Dgetdol()` returns STRNULL for unset variables in double-quoted contexts; expansion deferred until after short-circuit. | | **#116** | Medium | `sh.file.c` | 32-bit `wcscoll` type mismatch: cast through `(const wchar_t *)(const void *)`. | | **#115** | Low | `config_f.h`, `sh.h` | Shift-JIS: `SIZEOF_WCHAR_T < 4` → `<= 4`; `AUTOSET_KANJI` removes non-macro `CODESET` guard. | | **#103** | Low | `nls/Makefile.in` | Greek locale `el` (ISO 639-1). Already correct in mcsh. | @@ -227,7 +227,7 @@ Features developed natively for mcsh, with no upstream tcsh counterpart. | Feature | `set` variable | Primary Files | Notes | |---------|----------------|---------------|-------| -| Fish-style predictive autocomplete | *(always active)* | `ed.chared.c`, `ed.refresh.c`, `ed.inputl.c` | Scans `Histlist` for prefix match; ghost text rendered dimmed after cursor. Right-Arrow / `^F` accepts. | +| Fish-style predictive autocomplete | `set predict` | `ed.chared.c`, `ed.refresh.c`, `ed.inputl.c` | Predicts file paths (`predict_file`) and commands (`predict_cmd`), controllable via the `set predict` toggle. Ghost text rendered dimmed after cursor. Right-Arrow / `^F` accepts. | | Native git branch prompt escapes | *(always active)* | `tc.prompt.c` | `%g` = branch name; `%G` = branch + operation state. Cached per-CWD with independent HEAD and state-marker mtime tracking. | | Filetype colouring in completion | `set color` | `tw.color.c`, `sh.set.c` | Drives `ls-F` completion listings via `LSCOLORS`/`LS_COLORS`. | | **Interactive syntax highlighting** | **`set syntax`** | **`ed.syntax.c/h`, `ed.screen.c`, `ed.refresh.c`, `ed.inputl.c`, `sh.set.c`** | Virtual-display pipeline integration. Single-pass tokeniser fills `SyntaxColor[]`; `Draw()` propagates token colour into `Vdisplay[]` via `SYN_PACK()`; `so_write()` emits ANSI SGR per cell via `SetSGRColor()`. 32-entry LRU command cache avoids per-keystroke `stat(2)`. | @@ -276,4 +276,8 @@ Key fixes: | 2026-04-21 | Phase 4b + Phase 8 (round 1): all Gemini + CodeRabbit PR3 review items addressed. `vms.termcap.c` repurposed as portable termcap shim; `sh.func.c` `doif` widened to `tcsh_number_t`; `configure.ac` fixes; `dot.mcshrc` rewritten. README, PLAN, ISSUES updated. | | 2026-04-21 | Phase 9: native interactive syntax highlighting (`set syntax`) landed. Virtual-display pipeline: `ed.syntax.c/h` tokeniser + LRU cache; `SYN_PACK`/`SYN_TOK`/`SYN_GLYPH` bit-packing into `Vdisplay Char`; `SetSGRColor()` per-cell ANSI SGR. | | 2026-04-21 | Phase 8 (rounds 2–3): remaining Copilot review findings resolved — `TCSH_BASELINE_VERSION` string literal; git cache marker-mtime independence; `SetSGRColor`/`DrawGhost` `ESC[22;39m` SGR fix; `ed.inputl.c` no double `Refresh()`; zsh-style pushd/popd tree display and `cd -N` navigation added. All documentation updated to reflect current state. | -| 2026-04-22 | Phase 4 upstream sweep: full audit of all tcsh-org/tcsh open + recently closed issues/PRs. Applied: `sh.file.c` 32-bit `wcscoll` cast (#116); `config_f.h` Shift-JIS `<= 4` condition (#115); `sh.h` `AUTOSET_KANJI` CODESET guard removal (#115). Confirmed already-present: #103, #104, #99, #101, #110. Rejected (upstream closed/not merged): #118 FIONREAD, #114 Shift-JIS runtime check. ISSUES.md, PLAN.md, README.md all updated. | +| 2026-04-21 | Phase 4/6 review response (dev4): Three flagged weaknesses addressed. Short-circuit fix: `sh.dol.c` Dgetdol() expands unset vars to STRNULL instead of throwing ERR_UNDVAR. Unicode regression documented (no upstream fix available). Test suite created: `tests/` with 7 scripts + Makefile. Gemini inline fixes: `cache_store()` goto removed (two-loop LRU); `GIT_POLL_INTERVAL` named constant. README Known Limitations section added. ISSUES.md Round 6 appended. | +| 2026-04-21 | Phase 4 upstream sweep: full audit of all tcsh-org/tcsh open + recently closed issues/PRs. Applied: `sh.file.c` 32-bit `wcscoll` cast (#116); `config_f.h` Shift-JIS `<= 4` condition (#115); `sh.h` `AUTOSET_KANJI` CODESET guard removal (#115). Confirmed already-present: #103, #104, #99, #101, #110. Rejected (upstream closed/not merged): #118 FIONREAD, #114 Shift-JIS runtime check. ISSUES.md, PLAN.md, README.md all updated. | +| 2026-04-21 | PR #5 Round 7 (Copilot + Gemini full review): `sh.dol.c` unset-var modifier fix (call `fixDolMod()` before eatbrac; return 0 for `$#unset`/`$%unset`). `tests/run_tests.sh` portability (glob guard, printf, comment fix). `t006` mktemp portability. `ed.syntax.c` ST_VARIABLE state machine fix ($$/$!/`$<` single-char specials; `>!` / `>>!` / `>\|` operator colouring; remove unused `in_table` var). `ed.chared.c` command+file-aware ghost-text predictor (`predict_file`, `predict_cmd`). New test `t008_unset_modifiers.sh`. ISSUES.md Round 7 appended. | +| 2026-04-27 | PR #5 Round 9 review response: `ed.syntax.c` redirection coloring fix; `ed.chared.c` caching, `set predict` toggle, `~user` expansion, and empty PATH component handling; test suite hardening (SKIP support, portability, robustness). ISSUES.md Round 9 appended. | +ES.md Round 9 appended. | diff --git a/README.md b/README.md index f0f48784..55898a6f 100644 --- a/README.md +++ b/README.md @@ -47,7 +47,7 @@ mcsh is a drop-in replacement for tcsh and csh: | Feature | `set` variable | Description | |---------|----------------|-------------| -| **Fish-style predictive autocomplete** | *(always active)* | As you type, the most recent matching history entry is shown as inline ghost text (dimmed). Press Right-Arrow or `^F` to accept the full suggestion. | +| **Fish-style predictive autocomplete** | `set predict` | As you type, the most recent matching history entry, file path, or command is shown as inline ghost text (dimmed). Press Right-Arrow or `^F` to accept the full suggestion. Includes a filesystem/PATH cache to ensure zero latency. | | **Interactive syntax highlighting** | `set syntax` | Per-keystroke ANSI colour highlighting of keywords, builtins, commands (ok/bad), operators, variables, strings (double/single/backtick), comments, and unmatched-quote errors. A 32-entry LRU cache avoids repeated `stat(2)` calls per `$PATH` lookup. | | **Filetype colouring in completion** | `set color` | Coloured filetype indicators in tab-completion listings, driven by `LSCOLORS` / `LS_COLORS`. | @@ -286,6 +286,51 @@ Sections and what they provide: --- +## Resolved regressions + +### Unicode / wide-character handling (Round 9) + +Multi-byte characters — emoji, CJK, Latin Extended, and any character whose +UTF-8 encoding is longer than one byte — were previously silently dropped or +corrupted during filename glob expansion and variable assignment. This was a +byte-vs-character length bug inherited from tcsh 6.24.14 (tracked as tcsh +issues #117 / #121). + +**Resolution (`sh.lex.c`, `sh.dol.c`):** the `mbtowc` accumulation loops in +`wide_read()` and the `$<` line-read primitive now bound partial-byte +lookahead by the runtime `MB_CUR_MAX` instead of the compile-time +`MB_LEN_MAX`. After a stray invalid byte the loop no longer over-reads up to +15 bytes of subsequent valid UTF-8. + +**Tests:** `tests/t009_unicode_vars.sh` through +`tests/t014_unicode_script_source.sh` cover variable round-trip, +`$%` character count, glob expansion, `$<` stdin read, backquote +substitution, invalid-byte recovery, and sourced-script Unicode. +See `ISSUES.md` Round 9 for details. + +### Short-circuit evaluation (dev4, improved in Round 7) + +`if ($?a && "$a" != "")` previously threw "Undefined variable" even when +`$a` was unset because `Dfix()` expanded all `$` tokens before `&&` +short-circuiting could suppress evaluation. Fixed in `sh.dol.c`: unset +variables now silently expand to `""`, matching bash/zsh semantics. +Modifier expressions like `${unset:h}` and dimen expressions like `$#unset` +also no longer error — they return `""` and `0` respectively. +See `ISSUES.md` Rounds 6–7. + +### Test coverage + +An initial regression suite is in `tests/` covering startup variables, +arithmetic overflow/shift semantics, short-circuit evaluation, pipe-to-var, +directory stack, the `function` builtin, signed right-shift, and unset variable +modifier handling. This covers the core new features but is not exhaustive. Run with: + +```sh +tests/run_tests.sh +``` + +--- + ## Licensing mcsh is BSD 3-Clause (see `LICENSE`). The upstream tcsh / etcsh source is also BSD 3-Clause (see `UPSTREAM-COPYRIGHT`). Redistribution must carry both notices — see `NOTICE` for details. diff --git a/dot.mcshrc b/dot.mcshrc index 35ed3b86..693dde79 100644 --- a/dot.mcshrc +++ b/dot.mcshrc @@ -47,6 +47,7 @@ set autoexpand # Expand history (!$) automatically set autocorrect # Structural spell correction set color # Enable color in builtins (ls-F completion coloring) set syntax # Native interactive syntax highlighting +set predict # Fish-style predictive autocomplete set correct = cmd # Command spell correction set ellipsis # Use ... for truncated path prefix in prompt set filec # Filename completion diff --git a/ed.chared.c b/ed.chared.c index 50e98684..479eff47 100644 --- a/ed.chared.c +++ b/ed.chared.c @@ -73,6 +73,9 @@ #include "ed.h" #include "tw.h" #include "ed.defns.h" +#include +#include +#include /* #define SDEBUG */ @@ -3862,6 +3865,477 @@ e_page_down(Char c) return (CC_ERROR); } +/* + * set_ghost(suffix) — copy suffix into GhostBuf, NUL-terminated. + * Truncates silently if it wouldn't fit. + */ +static void +set_ghost(const Char *suffix) +{ + Char *p = GhostBuf; + while (*suffix && p < GhostBuf + INBUFSIZE - 1) + *p++ = *suffix++; + *p = '\0'; +} + +/* + * Caching for predictive completion to avoid expensive FS/PATH scans. + */ +static struct { + char cwd[1024]; + char prefix[256]; + char dir[512]; + time_t mtime; + char no_match_prefix[256]; + Char ghost[512]; + int valid; +} f_cache; + +static struct { + char cwd[1024]; + char prefix[256]; + char path[4096]; + size_t path_len; + Char ghost[512]; + int valid; +} c_cache; + +static char trusted_dirs[16][1024]; +static int trusted_count = 0; +static size_t no_match_prefix_len = 0; + +/* + * predict_cache_clear — invalidate both predictive completion caches. + */ +void +predict_cache_clear(void) +{ + f_cache.valid = 0; + c_cache.valid = 0; + trusted_count = 0; + no_match_prefix_len = 0; +} + +static int +is_word_break(int c) +{ + return (c == ' ' || c == '\t' || c == ';' || c == '|' || + c == '&' || c == '(' || c == ')' || c == '`' || c == '<' || c == '>'); +} + +/* + * predict_file — try to complete the current word as a filesystem path. + * + * Extracts the word under the cursor (from last whitespace/operator to + * LastChar), then looks for exactly one file/directory entry whose name + * starts with the word's basename portion. On a unique match it fills + * GhostBuf with the missing suffix. Directories get a trailing '/'. + * Returns 1 if a ghost was set, 0 otherwise. + */ +static int +predict_file(void) +{ + char word[512]; + const Char *wp; + size_t i, wlen; + const char *slash, *prefix, *dirpath; + char dirbuf[512]; + DIR *dp; + struct dirent *de; + struct stat st; + char match[512]; + int nmatch = 0; + int is_dir_match = 0; + size_t pfxlen; + + /* Extract the last whitespace/operator-delimited word from InputBuf */ + wp = LastChar; + while (wp > InputBuf) { + int c = (int)((wp[-1]) & CHAR); + if (is_word_break(c)) + break; + wp--; + } + + wlen = (size_t)(LastChar - wp); + if (wlen == 0 || wlen >= sizeof(word)) + return 0; + + { + Char *temp = xmalloc((wlen + 1) * sizeof(Char)); + char *mb; + for (i = 0; i < wlen; i++) + temp[i] = wp[i] & CHAR; + temp[wlen] = '\0'; + mb = short2str(temp); + strncpy(word, mb, sizeof(word) - 1); + word[sizeof(word) - 1] = '\0'; + xfree(temp); + } + + /* Tilde expansion for ghost text purposes */ + if (word[0] == '~') { + if (word[1] == '/' || word[1] == '\0') { + const char *home = getenv("HOME"); + char expanded[512]; + if (!home) + return 0; + if (xsnprintf(expanded, sizeof(expanded), "%s%s", home, word + 1) >= (int)sizeof(expanded)) + return 0; + strncpy(word, expanded, sizeof(word) - 1); + word[sizeof(word) - 1] = '\0'; + } else { + /* ~user expansion */ + char user[128]; + const char *s = word + 1; + size_t ul = 0; + struct passwd *pw; + static char last_user[128] = ""; + static char last_pw_dir[1024] = ""; + while (*s && *s != '/' && ul < sizeof(user) - 1) + user[ul++] = *s++; + user[ul] = '\0'; + if (last_user[0] != '\0' && strcmp(user, last_user) == 0) { + char expanded[512]; + if (xsnprintf(expanded, sizeof(expanded), "%s%s", last_pw_dir, s) >= (int)sizeof(expanded)) + return 0; + strncpy(word, expanded, sizeof(word) - 1); + word[sizeof(word) - 1] = '\0'; + } else { + pw = getpwnam(user); + if (pw) { + char expanded[512]; + strncpy(last_user, user, sizeof(last_user) - 1); + last_user[sizeof(last_user) - 1] = '\0'; + strncpy(last_pw_dir, pw->pw_dir, sizeof(last_pw_dir) - 1); + last_pw_dir[sizeof(last_pw_dir) - 1] = '\0'; + if (xsnprintf(expanded, sizeof(expanded), "%s%s", pw->pw_dir, s) >= (int)sizeof(expanded)) + return 0; + strncpy(word, expanded, sizeof(word) - 1); + word[sizeof(word) - 1] = '\0'; + } else { + return 0; + } + } + } + } + + /* Split word into directory and basename prefix */ + slash = strrchr(word, '/'); + if (slash) { + prefix = slash + 1; + size_t dlen = (size_t)(slash - word) + 1; /* include trailing / */ + if (dlen >= sizeof(dirbuf)) + return 0; + memcpy(dirbuf, word, dlen); + dirbuf[dlen] = '\0'; + dirpath = dirbuf; + } else { + prefix = word; + /* Trailing slash so the "%s%s" join below produces "./foo", + * not ".foo" (which would refer to a hidden file). */ + dirpath = "./"; + } + + pfxlen = strlen(prefix); + + /* Get current working directory for cache key */ + char current_cwd[1024] = ""; + if (getcwd(current_cwd, sizeof(current_cwd)) == NULL) { + current_cwd[0] = '\0'; + f_cache.valid = 0; + } + + /* Check cache */ + if (f_cache.valid && strcmp(f_cache.cwd, current_cwd) == 0 && + strcmp(f_cache.prefix, prefix) == 0 && + strcmp(f_cache.dir, dirpath) == 0) { + if (stat(dirpath, &st) == 0 && st.st_mtime == f_cache.mtime) { + if (f_cache.ghost[0]) { + set_ghost(f_cache.ghost); + return 1; + } + return 0; + } + } + + dp = opendir(dirpath); + if (!dp) + return 0; + if (stat(dirpath, &st) == 0) + f_cache.mtime = st.st_mtime; + else + f_cache.mtime = 0; + + while ((de = readdir(dp)) != NULL) { + const char *name = de->d_name; + /* skip dot files unless prefix starts with dot */ + if (name[0] == '.' && (pfxlen == 0 || prefix[0] != '.')) + continue; + if (pfxlen > 0 && strncmp(name, prefix, pfxlen) != 0) + continue; + if (strcmp(name, prefix) == 0) + continue; /* exact match — nothing to complete */ + nmatch++; + if (nmatch == 1) { + strncpy(match, name + pfxlen, sizeof(match) - 1); + match[sizeof(match) - 1] = '\0'; + /* check if it's a directory to append trailing slash */ + char fullpath[1024]; + xsnprintf(fullpath, sizeof(fullpath), "%s%s", dirpath, name); + is_dir_match = (stat(fullpath, &st) == 0 && S_ISDIR(st.st_mode)); + } else { + break; + } + } + closedir(dp); + + /* Update cache */ + strncpy(f_cache.cwd, current_cwd, sizeof(f_cache.cwd) - 1); + f_cache.cwd[sizeof(f_cache.cwd) - 1] = '\0'; + strncpy(f_cache.prefix, prefix, sizeof(f_cache.prefix) - 1); + f_cache.prefix[sizeof(f_cache.prefix) - 1] = '\0'; + strncpy(f_cache.dir, dirpath, sizeof(f_cache.dir) - 1); + f_cache.dir[sizeof(f_cache.dir) - 1] = '\0'; + f_cache.valid = 1; + + if (nmatch != 1) { + f_cache.ghost[0] = '\0'; + if (nmatch == 0) { + strncpy(f_cache.no_match_prefix, prefix, sizeof(f_cache.no_match_prefix) - 1); + f_cache.no_match_prefix[sizeof(f_cache.no_match_prefix) - 1] = '\0'; + } else { + f_cache.no_match_prefix[0] = '\0'; + } + return 0; /* zero or ambiguous — no ghost */ + } + + f_cache.no_match_prefix[0] = '\0'; + + /* Build ghost: suffix + optional / */ + { + Char *wide_match = str2short(match); + size_t gi = 0; + if (wide_match == NULL) + return 0; + const Char *sp = wide_match; + while (*sp && gi < sizeof(f_cache.ghost) / sizeof(f_cache.ghost[0]) - 2) + f_cache.ghost[gi++] = *sp++; + if (is_dir_match && gi < sizeof(f_cache.ghost) / sizeof(f_cache.ghost[0]) - 1) + f_cache.ghost[gi++] = '/'; + f_cache.ghost[gi] = '\0'; + set_ghost(f_cache.ghost); + } + return 1; +} + +/* + * predict_cmd — suggest a command name from $PATH when at command position. + * + * Scans $PATH directories for executables whose name starts with the current + * word prefix. Only fires when the word is at the start of the command + * (no preceding non-space characters). Returns 1 if a ghost was set. + */ +static int +predict_cmd(void) +{ + char prefix[256]; + const Char *wp; + size_t i, wlen; + const char *pathenv, *p, *q; + size_t dlen, pfxlen; + DIR *dp; + struct dirent *de; + struct stat st; + char match[256]; + int nmatch = 0; + char fullpath[2048]; + char match_dir[1024] = ""; + + /* Walk backward from LastChar to find the start of the word under + * the cursor (mirrors predict_file). The break-set is the set of + * characters that terminate a word: whitespace plus the command + * separators `;`, `|`, `&`, and the parens `(` / `)` (a closing + * paren also terminates a word, so prediction can fire right + * after `)` once that subshell has been closed). */ + wp = LastChar; + while (wp > InputBuf) { + int c = (int)((wp[-1]) & CHAR); + if (is_word_break(c)) + break; + wp--; + } + + /* Scan backward from wp, skipping whitespace, to find the previous + * non-whitespace character. We are at command position iff that + * character is start-of-buffer (no preceding text) or one of the + * recognised command separators. This correctly accepts + * "echo hi; ls" while still rejecting "echo hi ls". */ + { + const Char *prev = wp; + while (prev > InputBuf) { + int c = (int)(prev[-1] & CHAR); + if (c != ' ' && c != '\t') + break; + prev--; + } + if (prev > InputBuf) { + int c = (int)(prev[-1] & CHAR); + if (c != ';' && c != '|' && c != '&' && + c != '(' && c != ')') + return 0; + } + } + + wlen = (size_t)(LastChar - wp); + if (wlen == 0 || wlen >= sizeof(prefix)) + return 0; + + /* word must not contain / (otherwise use predict_file) */ + { + Char *temp = xmalloc((wlen + 1) * sizeof(Char)); + char *mb; + for (i = 0; i < wlen; i++) { + if ((int)(wp[i] & CHAR) == '/') { + xfree(temp); + return 0; + } + temp[i] = wp[i] & CHAR; + } + temp[wlen] = '\0'; + mb = short2str(temp); + strncpy(prefix, mb, sizeof(prefix) - 1); + prefix[sizeof(prefix) - 1] = '\0'; + xfree(temp); + } + pfxlen = wlen; + + pathenv = getenv("PATH"); + if (!pathenv) + return 0; + + /* Get current working directory for cache key */ + char current_cwd[1024] = ""; + if (getcwd(current_cwd, sizeof(current_cwd)) == NULL) { + current_cwd[0] = '\0'; + c_cache.valid = 0; + } + + /* Check cache */ + if (c_cache.valid && strcmp(c_cache.cwd, current_cwd) == 0 && + strcmp(c_cache.prefix, prefix) == 0 && + strcmp(c_cache.path, pathenv) == 0) { + if (c_cache.ghost[0]) { + set_ghost(c_cache.ghost); + return 1; + } + return 0; + } + + p = pathenv; + while (p && nmatch <= 1) { + char dirpath[1024]; + q = strchr(p, ':'); + dlen = q ? (size_t)(q - p) : strlen(p); + if (dlen == 0) { + dirpath[0] = '.'; + dirpath[1] = '\0'; + } else if (dlen < sizeof(dirpath)) { + memcpy(dirpath, p, dlen); + dirpath[dlen] = '\0'; + } else { + p = q ? q + 1 : NULL; + continue; + } + + dp = opendir(dirpath); + if (dp) { + int trust_readdir = 0; + for (int k = 0; k < trusted_count; k++) { + if (strcmp(trusted_dirs[k], dirpath) == 0) { + trust_readdir = 1; + break; + } + } + if (!trust_readdir && trusted_count < 16) { + strncpy(trusted_dirs[trusted_count], dirpath, sizeof(trusted_dirs[0]) - 1); + trusted_dirs[trusted_count][sizeof(trusted_dirs[0]) - 1] = '\0'; + trusted_count++; + trust_readdir = 1; + } + + while ((de = readdir(dp)) != NULL && nmatch <= 1) { + const char *name = de->d_name; + if (strncmp(name, prefix, pfxlen) != 0) continue; + if (strcmp(name, prefix) == 0) continue; + if (trust_readdir) { + if (nmatch == 0) { + strncpy(match, name + pfxlen, sizeof(match) - 1); + match[sizeof(match) - 1] = '\0'; + strncpy(match_dir, dirpath, sizeof(match_dir) - 1); + match_dir[sizeof(match_dir) - 1] = '\0'; + nmatch = 1; + } else if (strcmp(match, name + pfxlen) != 0) { + /* Different suffix — truly ambiguous. */ + nmatch = 2; + break; + } + } else { + xsnprintf(fullpath, sizeof(fullpath), "%s/%s", dirpath, name); + if (stat(fullpath, &st) == 0 && S_ISREG(st.st_mode) && + access(fullpath, X_OK) == 0) { + if (nmatch == 0) { + strncpy(match, name + pfxlen, sizeof(match) - 1); + match[sizeof(match) - 1] = '\0'; + strncpy(match_dir, dirpath, sizeof(match_dir) - 1); + match_dir[sizeof(match_dir) - 1] = '\0'; + nmatch = 1; + } else if (strcmp(match, name + pfxlen) != 0) { + /* Different suffix — truly ambiguous. */ + nmatch = 2; + break; + } + } + } + } + closedir(dp); + } + p = q ? q + 1 : NULL; + } + + /* Update cache */ + strncpy(c_cache.cwd, current_cwd, sizeof(c_cache.cwd) - 1); + c_cache.cwd[sizeof(c_cache.cwd) - 1] = '\0'; + strncpy(c_cache.prefix, prefix, sizeof(c_cache.prefix) - 1); + c_cache.prefix[sizeof(c_cache.prefix) - 1] = '\0'; + strncpy(c_cache.path, pathenv, sizeof(c_cache.path) - 1); + c_cache.path[sizeof(c_cache.path) - 1] = '\0'; + c_cache.valid = 1; + + if (nmatch == 1) { + /* Always validate the final candidate, even if it came from a trusted dir. */ + xsnprintf(fullpath, sizeof(fullpath), "%s/%s%s", match_dir, prefix, match); + if (stat(fullpath, &st) != 0 || !S_ISREG(st.st_mode) || access(fullpath, X_OK) != 0) + nmatch = 0; + } + + if (nmatch != 1) { + c_cache.ghost[0] = '\0'; + return 0; + } + + { + Char *wide_match = str2short(match); + size_t gi = 0; + const Char *sp = wide_match; + while (*sp && gi < sizeof(c_cache.ghost) / sizeof(c_cache.ghost[0]) - 1) + c_cache.ghost[gi++] = *sp++; + c_cache.ghost[gi] = '\0'; + set_ghost(c_cache.ghost); + } + return 1; +} + void predict_from_history(void) { @@ -3872,6 +4346,9 @@ predict_from_history(void) GhostBuf[0] = '\0'; + if (adrof(STRpredict) == NULL) + return; + if (Cursor != LastChar) return; @@ -3879,6 +4356,7 @@ predict_from_history(void) if (inputlen == 0) return; + /* 1. History-based prediction (highest priority, exact-prefix match) */ { int limit = 500; /* cap scan depth to bound latency on large histories */ for (hp = Histlist.Hnext; hp != NULL && limit-- > 0; hp = hp->Hnext) { @@ -3901,8 +4379,16 @@ predict_from_history(void) } } } + + /* 2. Command prediction: fires when at the command position in the line */ + if (predict_cmd()) + return; + + /* 3. File-path prediction: fires for any word (relative or absolute) */ + (void)predict_file(); } + CCRETVAL e_predict_accept(Char c) { diff --git a/ed.decls.h b/ed.decls.h index e44e6c24..b1589cf6 100644 --- a/ed.decls.h +++ b/ed.decls.h @@ -287,5 +287,6 @@ extern unsigned char *unparsestring (const CStr *, const Char *); extern void syntax_colorize (void); extern void syntax_clear (void); extern void syntax_cache_clear(void); +extern void predict_cache_clear(void); #endif /* _h_ed_decls */ diff --git a/ed.syntax.c b/ed.syntax.c index 2dc03657..7dbdf44a 100644 --- a/ed.syntax.c +++ b/ed.syntax.c @@ -181,22 +181,30 @@ cache_lookup(const char *name) static void cache_store(const char *name, int found) { - int i, victim = 0; + int i, victim = -1; unsigned oldest_age; if (!cmd_cache_init) cache_init(); - /* Prefer an empty slot; otherwise evict the least-recently-used entry. */ - oldest_age = cmd_cache[0].age; + + /* First pass: prefer an empty slot. */ for (i = 0; i < CMD_CACHE_SIZE; i++) { if (cmd_cache[i].found < 0) { victim = i; - goto store; + break; } - if (cmd_cache[i].age < oldest_age) { - oldest_age = cmd_cache[i].age; - victim = i; + } + + /* Second pass: if no empty slot, evict the least-recently-used entry. */ + if (victim < 0) { + oldest_age = cmd_cache[0].age; + victim = 0; + for (i = 1; i < CMD_CACHE_SIZE; i++) { + if (cmd_cache[i].age < oldest_age) { + oldest_age = cmd_cache[i].age; + victim = i; + } } } -store: + strncpy(cmd_cache[victim].name, name, CMD_CACHE_NAMELEN - 1); cmd_cache[victim].name[CMD_CACHE_NAMELEN - 1] = '\0'; cmd_cache[victim].found = found; @@ -273,7 +281,6 @@ cmd_on_path(const char *word) static int in_table(const char * const *table, const char *word, size_t len) { - size_t i; for (; *table; table++) { size_t tl = strlen(*table); if (tl == len && strncmp(*table, word, len) == 0) @@ -394,19 +401,17 @@ syntax_colorize(void) } else if ((ch >= 'a' && ch <= 'z') || (ch >= 'A' && ch <= 'Z') || (ch >= '0' && ch <= '9') || - ch == '_' || ch == '?' || ch == '#' || - ch == '$' || ch == '!' || ch == '<') { + ch == '_') { + /* ordinary identifier character: stay in variable mode */ SyntaxColor[i] = SYN_VARIABLE; - /* '?' and '#' are single-char special-variable prefixes; - * keep state as ST_VARIABLE so the following alphanumeric - * characters remain part of the variable name (e.g. $?path). */ - if (ch == '?' || ch == '#') { - /* stay in ST_VARIABLE to absorb trailing name chars */ - } else if (!((buf[i] & CHAR) >= 'a' && (buf[i] & CHAR) <= 'z') && - !((buf[i] & CHAR) >= 'A' && (buf[i] & CHAR) <= 'Z') && - !((buf[i] & CHAR) >= '0' && (buf[i] & CHAR) <= '9') && - (buf[i] & CHAR) != '_') - state = ST_NORMAL; + } else if (ch == '?' || ch == '#') { + /* $?var / $#var — single-char modifier prefix; absorb and + * keep state so the trailing name is also coloured. */ + SyntaxColor[i] = SYN_VARIABLE; + } else if (ch == '$' || ch == '!' || ch == '<') { + /* $$, $!, $< — single-character special variables */ + SyntaxColor[i] = SYN_VARIABLE; + state = ST_NORMAL; } else { state = ST_NORMAL; /* reprocess this char in normal mode */ @@ -525,12 +530,14 @@ syntax_colorize(void) /* Redirection */ if (ch == '>' || ch == '<') { + int opener = ch; if (in_word) in_word = 0; SyntaxColor[i] = SYN_OPERATOR; - /* >> >& >>& etc. */ + /* >> >>! >>& >& >| >! < << */ while (i + 1 < len) { int nc = (int)(buf[i+1] & CHAR); - if (nc == '>' || nc == '&' || nc == '-') + if (nc == '>' || nc == '<' || nc == '&' || nc == '-' || + (opener == '>' && (nc == '!' || nc == '|'))) SyntaxColor[++i] = SYN_OPERATOR; else break; diff --git a/nls/Makefile.in b/nls/Makefile.in index 9b4ccdf0..0ef20f68 100644 --- a/nls/Makefile.in +++ b/nls/Makefile.in @@ -21,7 +21,7 @@ all: ${CATALOGS} INSTALLED+=${localedir}/C/LC_MESSAGES/tcsh.cat ${localedir}/C/LC_MESSAGES/tcsh.cat: C.cat $(MKDIR_P) $(@D) - $(INSTALL_DATA) $> $^ $@ + $(INSTALL_DATA) $^ $@ C.cat: ${srcdir}/C/charset ${srcdir}/C/*set[0-9]* @${CATGEN} $(GENCAT) $@ $^ $> @@ -29,7 +29,7 @@ C.cat: ${srcdir}/C/charset ${srcdir}/C/*set[0-9]* INSTALLED+=${localedir}/et/LC_MESSAGES/tcsh.cat ${localedir}/et/LC_MESSAGES/tcsh.cat: et.cat $(MKDIR_P) $(@D) - $(INSTALL_DATA) $> $^ $@ + $(INSTALL_DATA) $^ $@ et.cat: ${srcdir}/et/charset ${srcdir}/et/*set[0-9]* @${CATGEN} $(GENCAT) $@ $^ $> @@ -37,7 +37,7 @@ et.cat: ${srcdir}/et/charset ${srcdir}/et/*set[0-9]* INSTALLED+=${localedir}/fi/LC_MESSAGES/tcsh.cat ${localedir}/fi/LC_MESSAGES/tcsh.cat: finnish.cat $(MKDIR_P) $(@D) - $(INSTALL_DATA) $> $^ $@ + $(INSTALL_DATA) $^ $@ finnish.cat: ${srcdir}/finnish/charset ${srcdir}/finnish/*set[0-9]* @${CATGEN} $(GENCAT) $@ $^ $> @@ -45,7 +45,7 @@ finnish.cat: ${srcdir}/finnish/charset ${srcdir}/finnish/*set[0-9]* INSTALLED+=${localedir}/fr/LC_MESSAGES/tcsh.cat ${localedir}/fr/LC_MESSAGES/tcsh.cat: french.cat $(MKDIR_P) $(@D) - $(INSTALL_DATA) $> $^ $@ + $(INSTALL_DATA) $^ $@ french.cat: ${srcdir}/french/charset ${srcdir}/french/*set[0-9]* @${CATGEN} $(GENCAT) $@ $^ $> @@ -53,7 +53,7 @@ french.cat: ${srcdir}/french/charset ${srcdir}/french/*set[0-9]* INSTALLED+=${localedir}/de/LC_MESSAGES/tcsh.cat ${localedir}/de/LC_MESSAGES/tcsh.cat: german.cat $(MKDIR_P) $(@D) - $(INSTALL_DATA) $> $^ $@ + $(INSTALL_DATA) $^ $@ german.cat: ${srcdir}/german/charset ${srcdir}/german/*set[0-9]* @${CATGEN} $(GENCAT) $@ $^ $> @@ -61,7 +61,7 @@ german.cat: ${srcdir}/german/charset ${srcdir}/german/*set[0-9]* INSTALLED+=${localedir}/el/LC_MESSAGES/tcsh.cat ${localedir}/el/LC_MESSAGES/tcsh.cat: greek.cat $(MKDIR_P) $(@D) - $(INSTALL_DATA) $> $^ $@ + $(INSTALL_DATA) $^ $@ greek.cat: ${srcdir}/greek/charset ${srcdir}/greek/*set[0-9]* @${CATGEN} $(GENCAT) $@ $^ $> @@ -69,7 +69,7 @@ greek.cat: ${srcdir}/greek/charset ${srcdir}/greek/*set[0-9]* INSTALLED+=${localedir}/it/LC_MESSAGES/tcsh.cat ${localedir}/it/LC_MESSAGES/tcsh.cat: italian.cat $(MKDIR_P) $(@D) - $(INSTALL_DATA) $> $^ $@ + $(INSTALL_DATA) $^ $@ italian.cat: ${srcdir}/italian/charset ${srcdir}/italian/*set[0-9]* @${CATGEN} $(GENCAT) $@ $^ $> @@ -77,7 +77,7 @@ italian.cat: ${srcdir}/italian/charset ${srcdir}/italian/*set[0-9]* INSTALLED+=${localedir}/ja/LC_MESSAGES/tcsh.cat ${localedir}/ja/LC_MESSAGES/tcsh.cat: ja.cat $(MKDIR_P) $(@D) - $(INSTALL_DATA) $> $^ $@ + $(INSTALL_DATA) $^ $@ ja.cat: ${srcdir}/ja/charset ${srcdir}/ja/*set[0-9]* @${CATGEN} $(GENCAT) $@ $^ $> @@ -85,7 +85,7 @@ ja.cat: ${srcdir}/ja/charset ${srcdir}/ja/*set[0-9]* INSTALLED+=${localedir}/pl/LC_MESSAGES/tcsh.cat ${localedir}/pl/LC_MESSAGES/tcsh.cat: pl.cat $(MKDIR_P) $(@D) - $(INSTALL_DATA) $> $^ $@ + $(INSTALL_DATA) $^ $@ pl.cat: ${srcdir}/pl/charset ${srcdir}/pl/*set[0-9]* @${CATGEN} $(GENCAT) $@ $^ $> @@ -93,7 +93,7 @@ pl.cat: ${srcdir}/pl/charset ${srcdir}/pl/*set[0-9]* INSTALLED+=${localedir}/ru/LC_MESSAGES/tcsh.cat ${localedir}/ru/LC_MESSAGES/tcsh.cat: russian.cat $(MKDIR_P) $(@D) - $(INSTALL_DATA) $> $^ $@ + $(INSTALL_DATA) $^ $@ russian.cat: ${srcdir}/russian/charset ${srcdir}/russian/*set[0-9]* @${CATGEN} $(GENCAT) $@ $^ $> @@ -101,7 +101,7 @@ russian.cat: ${srcdir}/russian/charset ${srcdir}/russian/*set[0-9]* INSTALLED+=${localedir}/es/LC_MESSAGES/tcsh.cat ${localedir}/es/LC_MESSAGES/tcsh.cat: spanish.cat $(MKDIR_P) $(@D) - $(INSTALL_DATA) $> $^ $@ + $(INSTALL_DATA) $^ $@ spanish.cat: ${srcdir}/spanish/charset ${srcdir}/spanish/*set[0-9]* @${CATGEN} $(GENCAT) $@ $^ $> @@ -109,7 +109,7 @@ spanish.cat: ${srcdir}/spanish/charset ${srcdir}/spanish/*set[0-9]* INSTALLED+=${localedir}/ru_UA/LC_MESSAGES/tcsh.cat ${localedir}/ru_UA/LC_MESSAGES/tcsh.cat: ukrainian.cat $(MKDIR_P) $(@D) - $(INSTALL_DATA) $> $^ $@ + $(INSTALL_DATA) $^ $@ ukrainian.cat: ${srcdir}/ukrainian/charset ${srcdir}/ukrainian/*set[0-9]* @${CATGEN} $(GENCAT) $@ $^ $> diff --git a/sh.dol.c b/sh.dol.c index 893408e6..a65967ff 100644 --- a/sh.dol.c +++ b/sh.dol.c @@ -483,7 +483,7 @@ Dgetdol(void) len = normal_mbtowc(&wc, cbuf, cbp); if (len == -1) { reset_mbtowc(); - if (cbp < MB_LEN_MAX) + if (cbp < (size_t)MB_CUR_MAX) continue; /* Maybe a partial character */ wc = (unsigned char)*cbuf | INVALID_BYTE; } @@ -640,8 +640,25 @@ Dgetdol(void) } goto eatbrac; } - udvar(name->s); - /* NOTREACHED */ + /* Unset variable: expand to empty string rather than aborting. + * This applies to all variable expansions (quoted and unquoted) and + * allows short-circuit expressions like + * if ($?a && "$a" != "") ... + * to work correctly: Dfix runs before expr(), so $a must silently + * yield "" when unset instead of raising ERR_UNDVAR. Also matches + * bash/zsh semantics for unset variable expansion. + * + * NOTE: fixDolMod() must be called first to consume any modifiers + * (e.g. ${unset:h}) and avoid a spurious "Missing }" error. */ + cleanup_until(name); + fixDolMod(); + if (dimen || length) { + /* $#unset and $%unset both return 0 */ + addla(putn((tcsh_number_t)0)); + } else { + setDolp(STRNULL); + } + goto eatbrac; } cleanup_until(name); c = DgetC(0); diff --git a/sh.exec.c b/sh.exec.c index 8789fc49..88931417 100644 --- a/sh.exec.c +++ b/sh.exec.c @@ -30,6 +30,7 @@ * SUCH DAMAGE. */ #include "sh.h" +#include "ed.h" #include "tc.h" #include "tw.h" @@ -678,6 +679,7 @@ dohash(Char **vv, struct command *c) (void) getusername(NULL); /* flush the tilde cashe */ tw_cmd_free(); + predict_cache_clear(); havhash = 1; if (v == NULL) return; diff --git a/sh.lex.c b/sh.lex.c index 027a6fe4..73247e68 100644 --- a/sh.lex.c +++ b/sh.lex.c @@ -1615,7 +1615,7 @@ wide_read(int fildes, Char *buf, size_t nchars, int use_fclens) tlen = normal_mbtowc(buf + res, cbuf + i, partial - i); if (tlen == -1) { reset_mbtowc(); - if ((partial - i) < MB_LEN_MAX && r > 0) + if ((partial - i) < (size_t)MB_CUR_MAX && r > 0) /* Maybe a partial character and there is still a chance to read more */ break; diff --git a/sh.set.c b/sh.set.c index 94cef23e..3ded9957 100644 --- a/sh.set.c +++ b/sh.set.c @@ -69,6 +69,7 @@ update_vars(Char *vp) exportpath(p->vec); dohash(NULL, NULL); syntax_cache_clear(); + predict_cache_clear(); } } else if (eq(vp, STRnoclobber)) { diff --git a/tc.const.c b/tc.const.c index 90f90488..312c1cf3 100644 --- a/tc.const.c +++ b/tc.const.c @@ -436,6 +436,7 @@ Char STRrmstar[] = { 'r', 'm', 's', 't', 'a', 'r', '\0' }; Char STRrm[] = { 'r', 'm', '\0' }; Char STRhighlight[] = { 'h', 'i', 'g', 'h', 'l', 'i', 'g', 'h', 't', '\0' }; Char STRsyntax[] = { 's', 'y', 'n', 't', 'a', 'x', '\0' }; +Char STRpredict[] = { 'p', 'r', 'e', 'd', 'i', 'c', 't', '\0' }; Char STRimplicitcd[] = { 'i', 'm', 'p', 'l', 'i', 'c', 'i', 't', 'c', 'd', '\0' }; diff --git a/tc.prompt.c b/tc.prompt.c index 542e357e..5ff3b899 100644 --- a/tc.prompt.c +++ b/tc.prompt.c @@ -45,6 +45,8 @@ * 29-Dec-96 added rprompt support */ +#define GIT_POLL_INTERVAL 2 /* seconds between filesystem mtime polls */ + static const char *month_list[12]; static const char *day_list[7]; @@ -782,10 +784,17 @@ tprintf(int what, const Char *fmt, const char *str, time_t tim, ptr_t info) }; if (!need_refresh) { /* Throttle mtime stat() calls: only poll the - * filesystem at most once every 2 seconds. + * filesystem at most once every 2 seconds by default, + * or GIT_POLL_INTERVAL seconds if set. * CWD/validity changes bypass the throttle. */ time_t _now = time(NULL); - if (_now - git_last_stattime >= 2) { + int poll_interval = 2; + const char *env_interval = getenv("GIT_POLL_INTERVAL"); + if (env_interval) { + poll_interval = atoi(env_interval); + if (poll_interval < 0) poll_interval = 0; + } + if (_now - git_last_stattime >= poll_interval) { /* Check HEAD mtime and state-marker mtimes * independently so a live MERGE_HEAD whose * mtime differs from HEAD's always triggers diff --git a/tests/lib_locale.sh b/tests/lib_locale.sh new file mode 100644 index 00000000..bd30592e --- /dev/null +++ b/tests/lib_locale.sh @@ -0,0 +1,21 @@ +# Shared helper sourced by the t009..t014 Unicode regression tests. +# +# Sets $utf8_locale to a UTF-8 locale that is actually installed on the +# host (preferring en_US.UTF-8, then C.UTF-8, then any UTF-8 entry from +# `locale -a`). If no UTF-8 locale is available, prints a SKIP message +# and exits with code 77 so the runner recognizes it as a skip. + +# Try preferred locales first (en_US.UTF-8, then C.UTF-8), allowing +# optional @modifier suffixes (e.g. en_US.UTF-8@euro). Fall back to +# any UTF-8 locale reported by `locale -a`. +utf8_locale=$(locale -a 2>/dev/null | grep -Ei '^en_US\.UTF-?8(@.*)?$' | head -n 1) +if [ -z "$utf8_locale" ]; then + utf8_locale=$(locale -a 2>/dev/null | grep -Ei '^C\.UTF-?8(@.*)?$' | head -n 1) +fi +if [ -z "$utf8_locale" ]; then + utf8_locale=$(locale -a 2>/dev/null | grep -Ei 'UTF-?8' | head -n 1) +fi +if [ -z "$utf8_locale" ]; then + echo "SKIP: no UTF-8 locale available" + exit 77 +fi diff --git a/tests/run_tests.sh b/tests/run_tests.sh new file mode 100755 index 00000000..4dd18c56 --- /dev/null +++ b/tests/run_tests.sh @@ -0,0 +1,58 @@ +#!/bin/sh +# mcsh regression test runner +# Usage: sh run_tests.sh [path-to-mcsh] +# +# Each t*.sh script exits 0 on pass, 77 on skip, and non-zero on fail. +# It may emit an optional failure message on stdout/stderr for display. + +# Change into the directory containing this script so that the t*.sh glob +# works regardless of where the runner is invoked from. +SCRIPT_DIR=$(dirname "$0") +cd "$SCRIPT_DIR" || { printf 'ERROR: cannot cd to %s\n' "$SCRIPT_DIR"; exit 2; } + +MCSH="${1:-../mcsh}" + +if [ ! -x "$MCSH" ]; then + printf 'ERROR: mcsh binary not found at %s\n' "$MCSH" + printf 'Build first with: make -C .. -j4\n' + exit 2 +fi + +export MCSH +pass=0 +fail=0 +skip=0 +total=0 + +# Guard against no test files matching the glob +set -- t*.sh +if [ ! -e "$1" ]; then + printf 'No test scripts found (t*.sh)\n' + exit 1 +fi + +for t in "$@"; do + total=$((total + 1)) + result=$(sh "$t" 2>&1) + status=$? + if [ $status -eq 0 ]; then + printf 'PASS %s\n' "$t" + pass=$((pass + 1)) + elif [ $status -eq 77 ]; then + printf 'SKIP %s\n' "$t" + if [ -n "$result" ]; then + printf ' (%s)\n' "$result" | head -1 + fi + skip=$((skip + 1)) + else + printf 'FAIL %s\n' "$t" + if [ -n "$result" ]; then + printf '%s\n' "$result" | head -5 + fi + fail=$((fail + 1)) + fi +done + +printf '\nResults: %d passed, %d failed, %d skipped out of %d tests\n' \ + "$pass" "$fail" "$skip" "$total" +[ $fail -eq 0 ] diff --git a/tests/t001_vars.sh b/tests/t001_vars.sh new file mode 100755 index 00000000..2d6038ac --- /dev/null +++ b/tests/t001_vars.sh @@ -0,0 +1,13 @@ +#!/bin/sh +# t001_vars.sh — $mcsh and $tcsh variables must be set on startup + +out=$("$MCSH" -f -c 'if ($?mcsh && $?tcsh) echo ok' 2>&1) +status=$? +if [ $status -ne 0 ]; then + printf 'mcsh exited %d; output: %s\n' "$status" "$out" + exit 1 +fi +case "$out" in + ok) exit 0 ;; + *) printf "expected 'ok', got: %s\n" "$out"; exit 1 ;; +esac diff --git a/tests/t002_overflow.sh b/tests/t002_overflow.sh new file mode 100755 index 00000000..63bcfdf7 --- /dev/null +++ b/tests/t002_overflow.sh @@ -0,0 +1,14 @@ +#!/bin/sh +# t002_overflow.sh — left-shift overflow must not invoke UB (uses unsigned path) +# @ x = (1 << 31) must produce 2147483648, not a negative number or crash. + +out=$("$MCSH" -f -c '@ x = (1 << 31); echo $x' 2>&1) +status=$? +if [ $status -ne 0 ]; then + printf 'mcsh exited %d; output: %s\n' "$status" "$out" + exit 1 +fi +case "$out" in + 2147483648) exit 0 ;; + *) printf "expected 2147483648, got: %s\n" "$out"; exit 1 ;; +esac diff --git a/tests/t003_shortcircuit.sh b/tests/t003_shortcircuit.sh new file mode 100755 index 00000000..4730d955 --- /dev/null +++ b/tests/t003_shortcircuit.sh @@ -0,0 +1,19 @@ +#!/bin/sh +# t003_shortcircuit.sh — $?a && "$a" must not raise "Undefined variable" +# when $a is unset. Should produce no output and exit 0. + +out=$("$MCSH" -f -c 'unset a; if ($?a && "$a" != "") echo yes; endif' 2>&1) +if [ $? -ne 0 ] || [ -n "$out" ]; then + echo "expected silence and exit 0, got: $out" + exit 1 +fi + +# ${undef:q} must not leave the modifier in the input stream +# (regression: fixDolMod() must be called before eatbrac for unset vars) +out=$("$MCSH" -f -c 'unset b; set x = "${b:q}"; echo ok' 2>&1) +if [ $? -ne 0 ] || [ "$out" != "ok" ]; then + echo "expected 'ok' from \${undef:q}, got: $out" + exit 1 +fi + +exit 0 diff --git a/tests/t004_pipe_to_var.sh b/tests/t004_pipe_to_var.sh new file mode 100755 index 00000000..fd2cbeff --- /dev/null +++ b/tests/t004_pipe_to_var.sh @@ -0,0 +1,14 @@ +#!/bin/sh +# t004_pipe_to_var.sh — "set x" reads from piped stdin (pipe-to-variable feature) +# The form tested: pipe data into the shell, then "set x" reads from stdin. + +out=$(echo "foo" | "$MCSH" -f -c 'set x; echo $x' 2>&1) +status=$? +if [ $status -ne 0 ]; then + printf 'mcsh exited %d; output: %s\n' "$status" "$out" + exit 1 +fi +case "$out" in + foo) exit 0 ;; + *) printf "expected 'foo', got: %s\n" "$out"; exit 1 ;; +esac diff --git a/tests/t005_cd_stack.sh b/tests/t005_cd_stack.sh new file mode 100755 index 00000000..6cfe86d8 --- /dev/null +++ b/tests/t005_cd_stack.sh @@ -0,0 +1,27 @@ +#!/bin/sh +# t005_cd_stack.sh — pushd/popd directory stack, cd -1 navigation + +tmpdir=$(mktemp -d) +dir1="$tmpdir/d1" +dir2="$tmpdir/d2" + +# Ensure tmpdir is removed even on early exit or interrupt +trap 'rm -rf "$tmpdir"' EXIT INT TERM + +mkdir "$dir1" "$dir2" + +# Resolve symlinks on both sides so /tmp vs /private/tmp mismatches don't fail +out=$("$MCSH" -f -c "pushd $dir1; pushd $dir2; cd -1; echo \$cwd" 2>&1 | tail -1) +expected=$(cd "$dir1" && pwd -P) +# Also canonicalize the mcsh output in case $cwd contains a symlink prefix +out_canon=$(cd "$out" 2>/dev/null && pwd -P) +if [ -n "$out_canon" ]; then + out="$out_canon" +fi + +if [ "$out" = "$expected" ]; then + exit 0 +else + printf "expected '%s', got: %s\n" "$expected" "$out" + exit 1 +fi diff --git a/tests/t006_function_builtin.sh b/tests/t006_function_builtin.sh new file mode 100755 index 00000000..503f2945 --- /dev/null +++ b/tests/t006_function_builtin.sh @@ -0,0 +1,25 @@ +#!/bin/sh +# t006_function_builtin.sh — "function" builtin stores and executes body + +tmpscript=$(mktemp /tmp/t006.XXXXXX) +trap 'rm -f "$tmpscript"' EXIT INT TERM + +cat > "$tmpscript" << 'MCSH_SCRIPT' +function greet +echo hello +return +greet +MCSH_SCRIPT + +out=$("$MCSH" -f "$tmpscript" 2>&1) +rc=$? + +if [ $rc -ne 0 ]; then + echo "FAIL: mcsh exited with status $rc: $out" + exit 1 +fi + +case "$out" in + hello) exit 0 ;; + *) echo "expected 'hello', got: $out"; exit 1 ;; +esac diff --git a/tests/t007_arith_rsh.sh b/tests/t007_arith_rsh.sh new file mode 100755 index 00000000..70c5f073 --- /dev/null +++ b/tests/t007_arith_rsh.sh @@ -0,0 +1,14 @@ +#!/bin/sh +# t007_arith_rsh.sh — right-shift must preserve sign (arithmetic shift semantics) +# @ x = (-8 >> 1) should give -4 on all supported platforms + +out=$("$MCSH" -f -c '@ x = (-8 >> 1); echo $x' 2>&1) +status=$? +if [ $status -ne 0 ]; then + printf 'mcsh exited %d; output: %s\n' "$status" "$out" + exit 1 +fi +case "$out" in + -4) exit 0 ;; + *) printf "expected -4, got: %s\n" "$out"; exit 1 ;; +esac diff --git a/tests/t008_unset_modifiers.sh b/tests/t008_unset_modifiers.sh new file mode 100755 index 00000000..c1af1dd5 --- /dev/null +++ b/tests/t008_unset_modifiers.sh @@ -0,0 +1,18 @@ +#!/bin/sh +# t008_unset_modifiers.sh — unset variable with modifiers and $# must not error + +# ${unset:h} should produce "" (not "Missing }") +out=$("$MCSH" -f -c 'unset x; echo "${x:h}"' 2>&1) +if echo "$out" | grep -qEi 'Missing|Undefined'; then + echo "FAIL: modifier on unset var caused error: $out" + exit 1 +fi + +# $#unset should produce 0 (not error) +out=$("$MCSH" -f -c 'unset x; echo $#x' 2>&1) +case "$out" in + 0) ;; + *) echo "FAIL: \$#unset expected 0, got: $out"; exit 1 ;; +esac + +exit 0 diff --git a/tests/t009_unicode_vars.sh b/tests/t009_unicode_vars.sh new file mode 100755 index 00000000..18eb00d2 --- /dev/null +++ b/tests/t009_unicode_vars.sh @@ -0,0 +1,37 @@ +#!/bin/sh +# t009_unicode_vars.sh — multibyte variable assignment and $% character count + +# Discover an available UTF-8 locale (skips the test if none exists). +. "$(dirname "$0")/lib_locale.sh" + +# Round-trip variable assignment +out=$(LANG="$utf8_locale" LC_ALL="$utf8_locale" "$MCSH" -f -c \ + 'set v = "café"; echo $v' 2>&1) +case "$out" in + café) ;; + *) echo "FAIL: café roundtrip: got '$out'"; exit 1 ;; +esac + +out=$(LANG="$utf8_locale" LC_ALL="$utf8_locale" "$MCSH" -f -c \ + 'set v = "漢字"; echo $v' 2>&1) +case "$out" in + 漢字) ;; + *) echo "FAIL: 漢字 roundtrip: got '$out'"; exit 1 ;; +esac + +out=$(LANG="$utf8_locale" LC_ALL="$utf8_locale" "$MCSH" -f -c \ + 'set v = "😀"; echo $v' 2>&1) +case "$out" in + 😀) ;; + *) echo "FAIL: emoji roundtrip: got '$out'"; exit 1 ;; +esac + +# $% must count Unicode characters, not bytes +out=$(LANG="$utf8_locale" LC_ALL="$utf8_locale" "$MCSH" -f -c \ + 'set v = "café"; echo $%v' 2>&1) +case "$out" in + 4) ;; + *) echo "FAIL: \$%café expected 4 chars, got '$out'"; exit 1 ;; +esac + +exit 0 diff --git a/tests/t010_unicode_glob.sh b/tests/t010_unicode_glob.sh new file mode 100755 index 00000000..02dd8a94 --- /dev/null +++ b/tests/t010_unicode_glob.sh @@ -0,0 +1,33 @@ +#!/bin/sh +# t010_unicode_glob.sh — glob expansion with Unicode filenames + +. "$(dirname "$0")/lib_locale.sh" + +tmpdir=$(mktemp -d /tmp/mcsh_unicode_XXXXXX) || exit 2 +trap 'rm -rf "$tmpdir"' EXIT + +touch "$tmpdir/café.txt" "$tmpdir/漢字.txt" "$tmpdir/emoji😀.txt" + +# * glob +out=$(LANG="$utf8_locale" LC_ALL="$utf8_locale" "$MCSH" -f -c \ + "set f = ($tmpdir/café*); echo \$#f \$f[1]" 2>&1) +case "$out" in + "1 $tmpdir/café.txt") ;; + *) echo "FAIL: café glob: got '$out'"; exit 1 ;; +esac + +out=$(LANG="$utf8_locale" LC_ALL="$utf8_locale" "$MCSH" -f -c \ + "set f = ($tmpdir/漢字*); echo \$#f" 2>&1) +case "$out" in + 1) ;; + *) echo "FAIL: 漢字 glob: got '$out'"; exit 1 ;; +esac + +out=$(LANG="$utf8_locale" LC_ALL="$utf8_locale" "$MCSH" -f -c \ + "set f = ($tmpdir/emoji*); echo \$#f" 2>&1) +case "$out" in + 1) ;; + *) echo "FAIL: emoji glob: got '$out'"; exit 1 ;; +esac + +exit 0 diff --git a/tests/t011_unicode_dollar_lt.sh b/tests/t011_unicode_dollar_lt.sh new file mode 100755 index 00000000..6b6e67f5 --- /dev/null +++ b/tests/t011_unicode_dollar_lt.sh @@ -0,0 +1,21 @@ +#!/bin/sh +# t011_unicode_dollar_lt.sh — $< stdin read with multibyte content + +. "$(dirname "$0")/lib_locale.sh" + +# Use command substitution; $< reads from stdin of the child shell +out=$(echo 'café' | LANG="$utf8_locale" LC_ALL="$utf8_locale" "$MCSH" -f -c \ + 'set x = $<; echo $x' 2>&1) +case "$out" in + café) ;; + *) echo "FAIL: \$< café: got '$out'"; exit 1 ;; +esac + +out=$(echo '漢字' | LANG="$utf8_locale" LC_ALL="$utf8_locale" "$MCSH" -f -c \ + 'set x = $<; echo $x' 2>&1) +case "$out" in + 漢字) ;; + *) echo "FAIL: \$< 漢字: got '$out'"; exit 1 ;; +esac + +exit 0 diff --git a/tests/t012_unicode_backtick.sh b/tests/t012_unicode_backtick.sh new file mode 100755 index 00000000..162ceb98 --- /dev/null +++ b/tests/t012_unicode_backtick.sh @@ -0,0 +1,20 @@ +#!/bin/sh +# t012_unicode_backtick.sh — backquote command substitution with multibyte output + +. "$(dirname "$0")/lib_locale.sh" + +out=$(LANG="$utf8_locale" LC_ALL="$utf8_locale" "$MCSH" -f -c \ + 'set x = `echo café`; echo $x' 2>&1) +case "$out" in + café) ;; + *) echo "FAIL: backtick café: got '$out'"; exit 1 ;; +esac + +out=$(LANG="$utf8_locale" LC_ALL="$utf8_locale" "$MCSH" -f -c \ + 'set x = `echo 漢字`; echo $x' 2>&1) +case "$out" in + 漢字) ;; + *) echo "FAIL: backtick 漢字: got '$out'"; exit 1 ;; +esac + +exit 0 diff --git a/tests/t013_unicode_invalid_recovery.sh b/tests/t013_unicode_invalid_recovery.sh new file mode 100755 index 00000000..26a7a55f --- /dev/null +++ b/tests/t013_unicode_invalid_recovery.sh @@ -0,0 +1,27 @@ +#!/bin/sh +# t013_unicode_invalid_recovery.sh — stray invalid byte does not corrupt +# subsequent valid multibyte characters. +# This directly exercises the MB_LEN_MAX→MB_CUR_MAX regression fix in +# wide_read() (sh.lex.c) and the $< loop (sh.dol.c). + +. "$(dirname "$0")/lib_locale.sh" + +tmpdir=$(mktemp -d /tmp/mcsh_invbyte_XXXXXX) || exit 2 +trap 'rm -rf "$tmpdir"' EXIT + +# Create a script file containing: set v = "<0x80>café" +# 0x80 is a lone UTF-8 continuation byte (invalid as a sequence start). +# With the bug, wide_read() over-reads up to MB_LEN_MAX-1=15 bytes of what +# follows 0x80, dropping 'é' (bytes C3 A9) into the discard window. +script="$tmpdir/script.csh" +# Use octal escapes (POSIX printf) — dash's builtin printf does not +# support \x hex escapes, but \NNN octal is portable. +printf 'set v = "\200caf\303\251"\nif ($v =~ *\303\251) echo ok\n' > "$script" + +out=$(LANG="$utf8_locale" LC_ALL="$utf8_locale" "$MCSH" -f "$script" 2>&1) +case "$out" in + ok) ;; + *) echo "FAIL: invalid-byte recovery: é was dropped; got '$out'"; exit 1 ;; +esac + +exit 0 diff --git a/tests/t014_unicode_script_source.sh b/tests/t014_unicode_script_source.sh new file mode 100755 index 00000000..03fee82e --- /dev/null +++ b/tests/t014_unicode_script_source.sh @@ -0,0 +1,29 @@ +#!/bin/sh +# t014_unicode_script_source.sh — Unicode variable values in a script file +# +# Verifies that a script containing multibyte values is parsed and compared +# correctly when executed via "$MCSH -f script" (csh script-file mode, not +# the `source` builtin). Variable names themselves stay ASCII because +# tcsh's set/varname grammar restricts names to ASCII identifiers; only +# the values exercise the multibyte path. + +. "$(dirname "$0")/lib_locale.sh" + +tmpdir=$(mktemp -d /tmp/mcsh_src_XXXXXX) || exit 2 +trap 'rm -rf "$tmpdir"' EXIT + +cat > "$tmpdir/script.csh" << 'EOF' +set greeting = "Héllo" +set japanese = "日本語" +if ($greeting != "Héllo") exit 1 +if ($japanese != "日本語") exit 1 +echo ok +EOF + +out=$(LANG="$utf8_locale" LC_ALL="$utf8_locale" "$MCSH" -f "$tmpdir/script.csh" 2>&1) +case "$out" in + ok) ;; + *) echo "FAIL: script file Unicode: got '$out'"; exit 1 ;; +esac + +exit 0