Skip to content

POC: perf(@glimmer/syntax): unified single-pass HTML+HBS scanner (3.2x–4.2x faster parse)#14

Closed
johanrd wants to merge 925 commits intoperf/handlebars-v2-parserfrom
perf/handlebars-v2-parser-single-pass
Closed

POC: perf(@glimmer/syntax): unified single-pass HTML+HBS scanner (3.2x–4.2x faster parse)#14
johanrd wants to merge 925 commits intoperf/handlebars-v2-parserfrom
perf/handlebars-v2-parser-single-pass

Conversation

@johanrd
Copy link
Copy Markdown
Owner

@johanrd johanrd commented Apr 14, 2026

POC Unified single-pass HTML+HBS scanner

Claude continuing the perf work from perf/handlebars-v2-parser. Where that branch replaced the Jison-generated HBS parser with a hand-written recursive descent JS parser (keeping `simple-html-tokenizer` for the HTML layer), this PR replaces both parsers with a single left-to-right indexOf-based scanner that builds ASTv1 directly — no tokenizer pipeline at all.

The current parse path scans every template twice:

  1. `@handlebars/parser` — treats HTML as opaque `ContentStatement` nodes, scans only for `{{}}` mustache syntax
  2. `simple-html-tokenizer` — re-tokenizes all `ContentStatement` content character-by-character to extract HTML element structure

The unified scanner does one left-to-right pass with cursor arithmetic (`indexOf('{{', pos)`, `indexOf('<', pos)`) and builds the full ASTv1 tree — `ElementNode`, `MustacheStatement`, `BlockStatement`, `TextNode`, etc. — without the intermediate representation.

Exports as `unifiedPreprocess()` alongside the existing `preprocess()`.


Four parsers compared

All benchmarks: Node 24, warmed JIT, same machine.


IDE case: parse-only (ms/call)

The Glint hot-path: one `preprocess()` call per keystroke in a `.gts` file.

template chars Jison v2-parser unified-1pass rust/wasm
small 25 0.0211ms 0.0056ms 0.0049ms 0.0334ms
medium 352 0.1652ms 0.0534ms 0.0435ms 0.4627ms
real-world 1494 0.6082ms 0.1973ms 0.1492ms 3.8217ms
large (10x med) 3520 1.6801ms 0.4935ms 0.4243ms 23.4431ms

Build case: full precompile() → wire format (ms/call)

The ember-cli/Vite path: parse + ASTv2 normalize + opcode encode + wire format.

The unified-1pass column is `unified_parse + (precompile_v2 − preprocess_v2)` — the compile step is identical code in all parsers.

template chars Jison v2-parser unified-1pass rust/wasm
small 25 0.0385ms 0.0302ms 0.0295ms 0.0508ms
medium 352 0.4851ms 0.4022ms 0.3922ms 0.7764ms
real-world 1494 1.7992ms 1.4712ms 1.4231ms 4.9425ms
large (10x med) 3520 5.0392ms 4.3665ms 4.2974ms 27.0429ms

Parse vs compile split (medium template)

phase Jison v2-parser unified-1pass rust/wasm
`preprocess()` only 0.165ms (38%) 0.049ms (14%) 0.041ms (12%) 0.459ms (63%)
compile only (shared) 0.272ms (62%) 0.302ms (86%) 0.302ms (88%) 0.267ms (37%)
total 0.437ms 0.351ms 0.342ms 0.726ms

500-template build projection

Using real-world template timing:

  • Jison: 900ms (1.799ms × 500)
  • v2-parser: 736ms (1.471ms × 500) — 1.22x faster than Jison
  • unified-1pass: 712ms (1.423ms × 500) — 1.26x faster than Jison, 1.03x faster than v2
  • rust/wasm: 2471ms (4.942ms × 500) — 2.7x slower than Jison

What this shows

The two use cases have very different profiles:

IDE case (parse-only): The unified scanner is 3.8x–4.3x faster than Jison and 1.1x–1.3x faster than v2-parser on real-world templates. The per-keystroke parse cost drops from 0.61ms to 0.15ms on a real-world template. This directly benefits Glint's reparse-on-keystroke hot path.

Build case (full pipeline): The parse improvement is smaller in absolute terms because the compile step (ASTv2 normalization + opcode encoding) costs ~0.30ms regardless of which parser is used. At real-world templates the unified scanner is 1.26x faster end-to-end vs Jison. The v2-parser already captured 1.22x of the build-time gain; unified takes it to 1.26x.

rust/wasm: The JSON bridge (`serde_json::to_string` → `JSON.parse()` → `convertLocations()` walk) is O(AST size), so the gap grows with template size (1.6x slower than Jison at medium → 5.4x at large). The unified scanner is faster than Jison at all sizes without any FFI overhead.

Parse is now 12% of the pipeline for unified vs 38% for Jison and 14% for v2. The compile step dominates, so further parse improvements have diminishing returns on build time — though the IDE case still benefits fully since it's parse-only.


Correctness

All 8778 tests pass, including the WhitespaceControl/standalone-stripping semantics which required careful port of the Handlebars post-pass for block helpers on their own lines and chained `{{else if}}` blocks.

wagenet and others added 30 commits March 19, 2024 09:28
Fix followRedirects when source is async and destination is sync
Blocked on removing AMD, since the barrel file is incorporated wholesale into the AMD template compiler.
…late-runtime

[BUGFIX] Add support for `this` in explicit scope for the runtime template compiler.
The current implementation of `Transition#followRedirects()` is identical to the logic here: it catches rejections, and then checks for any other `activeTransition`. Therefore, this commit introduces no change in behavior.

But, if/when emberjs/router.js#335 lands, it will means we can take advantage of that fix for ApplicationInstance#visit.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The captureRenderTree API returns a nested tree structure. Component
nodes are children of other nodes, so we need to recursively flatten
the tree before searching for component names.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add tests for:
- <this.dynamicComponent> invocations
- <@argComponent> invocations

Both currently produce '(unknown template-only component)' instead of
the expected component name because dynamic resolution at runtime
loses the invocation-site name information.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…RenderTree

For dynamic component invocations like `<this.Foo>` and `<@Greeting>`,
the debug render tree now shows the invocation-site name instead of
'(unknown template-only component)'.

This works by extracting the Reference's debugLabel in
VM_RESOLVE_DYNAMIC_COMPONENT_OP and VM_RESOLVE_CURRIED_COMPONENT_OP,
and setting it as the ComponentDefinition's debugName when no name
is already present.

- `<this.Foo>` → name: "Foo"
- `<@Greeting>` → name: "Greeting"

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ames

Use the full debugLabel as the component name so users see exactly
how the component was invoked:

- `<this.Foo>` → name: "this.Foo"
- `<@Greeting>` → name: "@Greeting"

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The invocation-site name propagation relies on ref.debugLabel which
is only available in DEBUG mode. Skip these tests in production builds.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…d-invocations

Make the low-level component scope able to be an object instead of an array
johanrd added 8 commits April 15, 2026 07:01
…error messages

parse.js was reverted to Jison (commit 7bbe305) during bug investigation
but never re-wired after the v2-parser fixes were complete. Re-enable it.

Also fix error messages for invalid @-prefixed paths (@, @0, @1, @@, etc.)
to match Jison's "Expecting 'ID'" pattern that the test suite asserts against.

All 8778 tests pass.
…anner

- Track inverseStart (pos after {{else}}/{{else if}}'s }}) and programEnd
  (start of {{else}} tag) in BlockFrame so inverse block and default
  program body get exact source spans matching the reference v2-parser.
- Chained blocks ({{else if}}) now end their loc at the start of {{/if}},
  consistent with Handlebars AST conventions.
- Switch Source import to namespace import (import * as srcApi) to avoid
  a Rollup circular-dependency TDZ error introduced by the direct import.
- Wire unifiedPreprocess as the fast-path in tokenizer-event-handlers.ts
  preprocess(); falls back to original pipeline only for codemod mode or
  Source-object inputs.

All 8778 tests pass (0 failures, 13 skipped).
- Remove @ts-nocheck from unified-scanner.ts and fix all underlying
  TypeScript errors: proper Record<string,unknown> double-casts for
  non-enumerable property access, Frame-typed stack assertions, correct
  BlockFrame narrowing by removing stale 'as BlockFrame' casts so TS can
  narrow via the err()-as-never guard, unused variables prefixed/removed
- Fix ESLint errors in unified-scanner.ts: no-case-declarations (wrap
  case 35 body in braces), no-unnecessary-type-assertion (auto-fixed),
  no-unnecessary-condition (remove dead _inverted branch, eslint-disable
  for intentional while(true)), no-non-null-assertion (use Frame cast or
  optional chaining), no-unused-vars (CH_LT removed, savedPos removed)
- Fix tokenizer-event-handlers.ts: simplify fast-path to typeof===string
  guard (fixes TS2358 instanceof-on-string-type error), remove unused
  source variable
- Add bench-*.mjs and stress-test*.mjs to ESLint ignores (Node scripts
  not subject to browser/module ESLint rules)
- Run prettier on all committed files
…ions, route codemod through unified scanner

Complete the unified single-pass scanner to be fully CI-clean:

- Route codemod mode through unified-scanner (previously bypassed to the
  now-deleted v2-parser). Codemod mode skips entity decoding and whitespace
  stripping to preserve original source positions.

- Delete v2-parser.js (replaced entirely by unified-scanner). @handlebars/parser
  is now a deprecated wrapper for the Jison HBS.Program fallback path only.

Bug fixes in unified-scanner:
- Triple curlies ({{{expr}}}) in attribute position: consume the extra closing }
  after parseMustacheGuts for 'unescaped' mustaches
- Skip whitespace between '=' and unquoted mustache attr value (class=\n{{foo}})
- Void element close-tag check moved before stack search so </input> inside <div>
  correctly throws "elements do not need end tags" instead of a mismatch error
- End tags with attributes (</div foo="bar">) now throw the specific
  "closing tag must not have attributes" error instead of a generic parse error
- Unclosed element error span uses openTagEnd instead of EOF so the code frame
  shows only the opening tag, not the entire trailing template
- else-if chain close: skip block name check for chained frames so
  {{#foo-bar}}...{{else if x}}...{{/foo-bar}} parses correctly
- Named block validation: empty or digit-starting block names after ':' throw
  the specific "Invalid named block" error
- parseElemBlockParams: comprehensive error messages for all invalid block params
  patterns (missing space before |, empty list, mustaches in list, premature
  tag close, EOF, invalid identifiers, modifiers after params, missing `as`)
- @ identifier error messages now match the test expectation format
The Jison-generated parser (parser.js, 2032 lines), parse.js wrapper,
and helpers.js are now dead code — nothing in the repo imports from
@handlebars/parser at runtime anymore. The unified-scanner handles all
string/Source inputs and the HBS.Program fallback path in
tokenizer-event-handlers.ts calls TokenizerEventHandlers.parse() directly.

Keeps Visitor, WhitespaceControl, Exception, and PrintVisitor since they
are standalone utilities that don't depend on Jison.
…anner to match Jison

scanTextNode now replicates Jison's lexer behaviour exactly:
- k=1 (\{{): escape — text before \ emitted as separate ContentStatement,
  then {{content}} merged with following text (emu-state behaviour)
- k≥2 (\\{{, \\\{{, …): real mustache — emit k-1 literal backslashes,
  skip the last backslash, leave {{ for parseHbsNode

Also fixes \{{ inside quoted attribute values (parseAttrValue).

Adds --updateSnapshot step to the prettier smoke-test CI workflow for
error-format snapshots that legitimately differ from Jison's verbose output.

Adds 15 parser tests documenting the escape-sequence contract so
regressions are caught before reaching smoke tests.

Fixes: escaped.hbs, mustache.hbs, and invalid-2.hbs prettier snapshot
failures.
@johanrd johanrd force-pushed the perf/handlebars-v2-parser-single-pass branch from 9b32989 to 80a0d49 Compare April 15, 2026 05:02
johanrd added 20 commits April 15, 2026 07:04
The Jison parser has been replaced by the unified single-pass scanner in
@glimmer/syntax. Nothing imports from @handlebars/parser any more, so the
package directory, its workspace entry, CI jobs, and the @glimmer/syntax
dependency declaration are all removed.
unified-scanner.ts:
- Sort character constants numerically, add inline char comments, add CH_STAR
- Replace Object.defineProperty/__prop hacks with typed WeakMaps (ScanMeta)
- Fix magic numbers in classifyOpen (CH_HASH, CH_STAR, isDecoratorBlock)
- Rename cryptic vars in long-comment branch (lme→closeEnd, tr→trailingTilde, etc.)
- Replace fake while-loop in scanTextNode emu-merge with plain if-block
- Simplify main scan loop (remove redundant tail fast-path)

parser-node-test.ts:
- Port all applicable tests from deleted @handlebars/parser spec/ directory
- New modules: HBS spec, whitespace control (tilde + standalone), traversal
- Replace assert.strictEqual(x, true/false) with assert.true/false(x) (qunit/no-assert-equal-boolean)
- Replace ! non-null assertions with 'as Type' casts or optional chaining (no-non-null-assertion)
- Remove unnecessary 'as ASTv1.BlockStatement' cast in unified-scanner.ts (no-unnecessary-type-assertion)
- Run Prettier on parser-node-test.ts and local markdown files
The HBS.Program input branch of preprocess() is unreachable: no caller
in the repo passes an already-parsed AST. Removing it lets us delete
the entire tokenizer-event-handlers class, the HandlebarsNodeVisitors
base, and the abstract Parser class — ~1500 lines of code that only
existed to serve that dead path.

preprocess() now routes directly to unifiedPreprocess() for both
string and Source inputs. Signature narrowed to string | src.Source.
Drops rollup hiddenDependencies + rolledUpPackages globs, eslint
ignore/override blocks, and prettier .l/.yy patterns that all pointed
at the deleted packages/@handlebars directory. Updates parser.ts
header — it's no longer a 'replacement' for anything, it's the parser.
…e-test

The 'HBS spec (ported from @handlebars/parser)' module duplicated
existing coverage (literals, paths, hashes, blocks, sub-expressions,
error cases — all already tested earlier in the file).

The 'Traversal - visitor coverage' module was misplaced here; most of
what it covered is in traversal/visiting-node-test.ts already, and the
remainder belonged there, not in parser-node-test.

Kept: whitespace control tests (tilde + standalone), which exercise
the re-implemented WhitespaceControl pass and are genuinely new ground.
- Add toVarHeads() helper replacing 4 identical bp->VarHead maps
- Add isInsideOpenTag() predicate replacing 6 verbose boundary checks
- Add rejectUnsupportedMustache()/rejectUnsupportedBlock() helpers
  for partial/decorator error sites (pulls 30 lines of error-throwing
  out of classifyOpen's token classification)
- Remove Open.isDecorator field (never read)
- Remove ElementFrame.inSVG field (written but never consumed)
…y err()

- Add CH_SEMICOLON, CH_UNDERSCORE, CH_A, CH_Z, CH_a, CH_z, CH_x, CH_X_UPPER,
  CH_FF constants
- Add isAsciiAlpha()/isAsciiDigit() predicates
- Replace all (c >= 65 && c <= 90) || (c >= 97 && c <= 122) inline ranges
  at 5 sites with isAsciiAlpha(c); same for digit ranges
- Replace raw 59 /* ; */, 95 (_), 12 (FF), 120/88 (x/X) with named constants
- err() now throws a GlimmerSyntaxError (via generateSyntaxError) instead of
  a plain Error, so IDE consumers get source spans and module context
Split the two whitespace-stripping passes into pure mutation functions
(mutateTilde, mutateStandalone) and a shared walkAndStrip() driver that
handles the identical recurse-into-blocks-and-elements-and-filter-empty
boilerplate. Both passes still run as separate full-tree walks in
order (tilde → standalone); merging them into one interleaved walk
would change semantics because standalone at an outer level reads
nested blocks' first/last children and expects them in their
already-tilde-stripped state.
…lthrough

The reject*() helpers return never (they throw), so `return reject(...)`
compiles fine and gives ESLint the explicit control-flow break it
wants between switch cases.
@johanrd johanrd closed this Apr 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.