Skip to content

Implement shell parser and AST for /bin/sh#887

Merged
dburkart merged 2 commits into
mainfrom
m875-shell-parser
May 5, 2026
Merged

Implement shell parser and AST for /bin/sh#887
dburkart merged 2 commits into
mainfrom
m875-shell-parser

Conversation

@dburkart
Copy link
Copy Markdown
Owner

@dburkart dburkart commented May 5, 2026

Closes #875

Summary

  • Add recursive-descent parser (base/sh/src/parser.rs) that consumes the lexer's Token stream and produces an AST
  • AST types: SimpleCommand (words + assignments + redirections), Pipeline (commands connected by |), List (pipelines connected by ;/&&/||), Subshell (parenthesized lists with optional redirections)
  • Operator precedence is structural: | binds inside Pipeline, while &&/||/; are connectors between pipelines in the flat List — matching POSIX.1-2024 section 2.10 left-to-right evaluation semantics
  • Handles all 7 redirection operators (<, >, >>, <<, >&, <&, <>), fd-number prefixes (e.g. 2>&1), and variable assignments preceding commands

Test plan

  • cargo test --manifest-path base/sh/Cargo.toml — 95 tests pass (49 lexer + 46 parser), zero warnings
  • cargo xtask build — clean
  • cargo xtask smoke — all 42 markers present
  • cargo xtask testblocking_sync::rwlock_concurrent_readers flake is pre-existing on main (confirmed by running same test on main branch)

Add a recursive-descent parser that consumes the lexer's Token stream
and produces an abstract syntax tree. AST types: SimpleCommand (words,
assignments, redirections), Pipeline (commands connected by |), List
(pipelines connected by ;, &&, ||), and Subshell (parenthesized lists).

Operator precedence is structural: | binds inside Pipeline, while
&&/||/; are connectors between pipelines in the flat List. The parser
handles all seven redirection operators including fd-number prefixes
(e.g. 2>&1), variable assignments preceding commands, and nested
subshells.

46 host-side unit tests cover simple commands, pipelines, compound
lists, subshells, all redirection forms, fd-number redirects, syntax
errors, and realistic command lines.

Closes #875

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 5, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: be94a5c5-6d09-45b3-a887-766df697effb

📥 Commits

Reviewing files that changed from the base of the PR and between 9fc75a2 and c7f8f71.

📒 Files selected for processing (1)
  • base/sh/src/parser.rs
🚧 Files skipped from review as they are similar to previous changes (1)
  • base/sh/src/parser.rs

📝 Walkthrough

Walkthrough

A new parser module adds a recursive-descent POSIX shell parser that consumes Lexer tokens and produces an AST (List, Pipeline, Command, Redirect) with precedence, subshells, and full redirection support; unit tests cover nesting, precedence, redirections, and error cases.

Changes

Shell Parser Implementation

Layer / File(s) Summary
Module Wiring
base/sh/src/main.rs
Declared new internal module: mod parser;.
Data Shape
base/sh/src/parser.rs (lines 30–115)
Added AST types: Redirect, RedirectOp, SimpleCommand, Command (Simple/Subshell), Pipeline, ListOp, List.
Error Types
base/sh/src/parser.rs (lines 118–151)
Added ParseError enum for lex errors, unexpected tokens, missing redirect targets, unmatched (, empty commands; Display and From<LexError> impls.
Core Parser Entry
base/sh/src/parser.rs (lines 153–173)
Parser::new and Parser::parse entry points; top-level parsing workflow.
List / Precedence Parsing
base/sh/src/parser.rs (lines 174–245)
Recursive-descent handling of lists and precedence (;/newline, &&/`
Pipeline / Command Parsing
base/sh/src/parser.rs (lines 246–346)
Pipeline parsing and command forms: simple commands and parenthesized subshells (with optional trailing redirects).
Simple Command Semantics
base/sh/src/parser.rs (lines 274–346)
Parsing of mixed assignments, words, and redirects; fd-number detection when a digit-only word precedes a redirect operator.
Redirection Parsing
base/sh/src/parser.rs (lines 349–423)
Parsing of redirect operators and targets, support for dup/close forms, heredoc/append variants, and fd-number validation.
Token Navigation & Helpers
base/sh/src/parser.rs (lines 424–505)
Lookahead/bump/skip utilities, redirect/assignment detection helpers, parse(input: &str) convenience wrapper.
Tests & Validation
base/sh/src/parser.rs (lines 509–1084)
Comprehensive unit tests: empty inputs, simple/multi-word commands, assignments, pipelines, list connector precedence, subshells, many redirect forms (including fd-number and dup), and syntax/lex error cases.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related issues

Possibly related PRs

  • dburkart/vibix#885: Adds the lexer and Token types that this parser consumes to build the AST.

Poem

🐰 I hopped through tokens, one by one, with cheer,
Pipes and subshells now stand crystal clear.
Redirects curl like ribbons in the sun,
AST roots planted—parsing's work is done.
Hooray, dear shell, your grammar's blooming here!

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Implement shell parser and AST for /bin/sh' clearly and specifically summarizes the main change—adding a complete parser and AST types for shell command processing.
Description check ✅ Passed The description is well-structured and clearly related to the changeset, covering the parser implementation, AST types, operator precedence, redirection handling, and comprehensive test results.
Linked Issues check ✅ Passed All coding requirements from issue #875 are met: AST types defined (SimpleCommand, Pipeline, List, Subshell), recursive-descent parser implemented, operator precedence enforced, I/O redirections with fd-numbers supported, and comprehensive unit tests added.
Out of Scope Changes check ✅ Passed All changes are directly scoped to the parser implementation and AST definition. The only modifications are the parser module declaration and the new parser.rs file containing the parser logic and AST types.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch m875-shell-parser

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@base/sh/src/parser.rs`:
- Around line 354-372: The code consumes a digit word via self.bump() as soon as
try_parse_fd(w) succeeds, then bails out if the following token isn't a
redirect, which drops input like "(echo) 2"; instead, only consume the digit
when a redirect operator actually follows. Change the flow in the parse loop so
you check for a redirect operator before calling self.bump() — e.g., use a
lookahead/peek of self.current (or a peek method) with
is_redirect_token(&self.current) or similar, and only call self.bump() then
parse_redirect_op() and expect_redirect_target() to push into redirects; if no
redirect operator is present, do not bump/consume the digit and simply break.
Ensure you update the branch that currently calls self.bump() and then checks
is_redirect_token so it peeks first and preserves the digit token when no
redirect follows.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: d4287abc-7a24-44e9-b436-6e63710fafdc

📥 Commits

Reviewing files that changed from the base of the PR and between 159d210 and 9fc75a2.

📒 Files selected for processing (2)
  • base/sh/src/main.rs
  • base/sh/src/parser.rs

Comment thread base/sh/src/parser.rs
When parse_redirect_list consumed a digit-only word speculatively and
the following token was not a redirect operator, the word was silently
dropped. Now return UnexpectedToken instead, preventing inputs like
"(echo) 2" from parsing as if the "2" never existed.

Adds two tests: one for the error case and one confirming that
fd-number redirects after subshells (e.g. "(echo) 2> err") still work.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@dburkart dburkart merged commit f8b1407 into main May 5, 2026
15 checks passed
@dburkart dburkart deleted the m875-shell-parser branch May 5, 2026 17:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement shell parser and AST for /bin/sh

2 participants