Skip to content

feat(strands-command): add /strands-ts multi-agent TypeScript runner#68

Open
yonib05 wants to merge 29 commits into
strands-agents:mainfrom
yonib05:feat/strands-ts-command
Open

feat(strands-command): add /strands-ts multi-agent TypeScript runner#68
yonib05 wants to merge 29 commits into
strands-agents:mainfrom
yonib05:feat/strands-ts-command

Conversation

@yonib05

@yonib05 yonib05 commented Jun 12, 2026

Copy link
Copy Markdown
Member

Summary

Adds a new, parallel /strands-ts <command> runner built on the TypeScript SDK, with a multi-agent PR reviewer as the first mode. The existing Python command is untouched; the two run side by side so the TS path can be validated before any cutover.

Architecture

  • Agents-as-tools reviewer: an orchestrator agent dispatches five tuned, SOP-form specialist reviewers (adherence, api, bug, history, test — prompts are version-controlled markdown under sops/) plus a custom_reviewer meta-agent for concerns no tuned SOP covers (explicitly second-choice).
  • Confidence filtering: findings are zod-validated structured output, then deterministically filtered in code — score ≥ 80, dedupe, cap 15. Empty result posts a "No issues found" template; malformed model output fails the run loudly instead of posting a misleading clean review.
  • Dynamic model selection per agent with precedence: user config (STRANDS_TS_AGENTS JSON: {model?, sop?} per agent key, supplied via workflow input / repo variable) > orchestrator's per-dispatch modelTier (haiku/sonnet/opus/fable or raw Bedrock id) > default.

Safeguards (mirrors and tightens the Python model)

  • Agent runs read-only (GITHUB_WRITE=false): all write tool calls are recorded to a JSONL artifact; a separate finalize action replays them with the only write-capable token. Replay is allowlisted and repo-pinned (foreign-repo ops rejected, undefined repo pinned).
  • The reviewer agent has exactly 4 read-only GitHub tools — no shell, no file editing, no arbitrary HTTP. Sub-agents have no tools at all.
  • Example consumer workflow ships with permissions: {} at workflow level and a per-job split: the agent job can never hold pull-requests: write; finalize only runs on agent success.
  • Hardening from review: dot-segment/URL-encoding guards on path inputs, SOP path-traversal guard (separator-pinned), prototype-chain-safe tier lookup, paginated listings with a surfaced cap (no silent truncation), realpath-checked entry guards with explicit non-zero exit.

Components

  • strands-command/scripts/typescript/ — the runner (Node 20, NodeNext ESM, committed lockfile, strict npm ci)
  • strands-command/actions/strands-ts-runner/ — read-only agent run + artifact upload
  • strands-command/actions/strands-ts-finalize/ — deferred-write replay
  • strands-command/examples/strands-ts-command.yml — consumer workflow showing the auth gate + permission split

Test plan

Reviewer evaluation

Review quality was validated against two purpose-built pull requests, each with a pre-authored answer key of planted defects and deliberate false-positive baits. For reference, the same PRs were reviewed by the existing single-agent Python /strands review command and by an independent Claude Code review session.

All three reviewers ran on Claude Opus 4.8, for an apples-to-apples comparison.

Methodology: each reviewer was run exactly once on a pristine PR with no prior review comments — a reviewer that can read existing threads is no longer measuring independent recall.

Scenario A — new feature, convention-heavy

A new public SDK-style module added to a package shipping a committed CONVENTIONS.md. 9 planted defects: two correctness bugs (a 1000x token-refill unit error; an off-by-one window guard), a breaking public-signature change, and six convention violations (ms-in-API, _internal re-export, bare Callable vs Protocol, typing.Optional, f-string logging, missing test). 4 false-positive baits.

Reviewer Planted defects found False positives Inline comments
strands-ts 9 / 9 + 1 unplanted 0 / 4 8, line-anchored
Python /strands 9 / 9 0 / 4 8
Claude Code (reference) 9 / 9 + 1 unplanted 0 / 4 n/a (CLI)

strands-ts and the Claude Code reference both additionally caught an unplanted real bug (the bucket seeds its clock in __init__ but accepts an injected now in acquire(), so the first call sees negative elapsed time). Every convention finding cited the exact CONVENTIONS.md clause.

Scenario B — refactor hiding a regression

A PR described as "pure cleanup, no behavior change" that actually moves tax onto the pre-discount subtotal (contradicting the class's own preserved docstring) and edits the existing test to expect the wrong value, so the suite stays green. 4 defects, 5 false-positive baits (legitimate refactors that must not be flagged).

Reviewer Defects found Regression + masking test linked False positives
strands-ts 3 / 4 yes 0 / 5
Python /strands 3 / 4 yes 0 / 5
Claude Code (reference) 3 / 4 yes 0 / 5

All three caught the regression, identified that the test was rewritten to mask it, and left the 5 legitimate refactors alone. All three missed only the weakest signal (a commit-message vs. PR-description mismatch).

Why the multi-agent design helps

  • Full recall with zero false positives across both scenarios — the deterministic confidence filter (drop < 80, dedupe, cap) held precision at 100%, ignoring all 9 baits including a Decimal-for-money scope trap.
  • Convention enforcement is real: the orchestrator fetches governance docs that never appear in the diff, and every adherence finding cites the exact clause.
  • Line-anchored inline comments: findings post on their specific changed lines, not only as a summary.
  • Parallel specialized lenses surfaced an unplanted correctness bug the single-agent baseline missed.

Evidence

The trued-up evaluation runs are left open for inspection:

yonib05 added 29 commits June 12, 2026 10:46
Drop the version.ts smoke-test artifact (RUNNER_NAME had no production
use) and the unused ReviewOutput type export. Remove three change-detector
tests that asserted config data or a constant's value rather than behavior.
Replace the near-uniform per-tier maxTokens map + fallback with a single
default and a haiku special-case. Extract the duplicated path-traversal
guard into one safeSegments() helper so the security check has a single
definition.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant