feat(strands-command): add /strands-ts multi-agent TypeScript runner by yonib05 · Pull Request #68 · strands-agents/devtools

yonib05 · 2026-06-12T23:54:21Z

Summary

Adds a new, parallel /strands-ts <command> runner built on the TypeScript SDK, with a multi-agent PR reviewer as the first mode. The existing Python command is untouched; the two run side by side so the TS path can be validated before any cutover.

Architecture

Agents-as-tools reviewer: an orchestrator agent dispatches five tuned, SOP-form specialist reviewers (adherence, api, bug, history, test — prompts are version-controlled markdown under sops/) plus a custom_reviewer meta-agent for concerns no tuned SOP covers (explicitly second-choice).
Confidence filtering: findings are zod-validated structured output, then deterministically filtered in code — score ≥ 80, dedupe, cap 15. Empty result posts a "No issues found" template; malformed model output fails the run loudly instead of posting a misleading clean review.
Dynamic model selection per agent with precedence: user config (STRANDS_TS_AGENTS JSON: {model?, sop?} per agent key, supplied via workflow input / repo variable) > orchestrator's per-dispatch modelTier (haiku/sonnet/opus/fable or raw Bedrock id) > default.

Safeguards (mirrors and tightens the Python model)

Agent runs read-only (GITHUB_WRITE=false): all write tool calls are recorded to a JSONL artifact; a separate finalize action replays them with the only write-capable token. Replay is allowlisted and repo-pinned (foreign-repo ops rejected, undefined repo pinned).
The reviewer agent has exactly 4 read-only GitHub tools — no shell, no file editing, no arbitrary HTTP. Sub-agents have no tools at all.
Example consumer workflow ships with permissions: {} at workflow level and a per-job split: the agent job can never hold pull-requests: write; finalize only runs on agent success.
Hardening from review: dot-segment/URL-encoding guards on path inputs, SOP path-traversal guard (separator-pinned), prototype-chain-safe tier lookup, paginated listings with a surfaced cap (no silent truncation), realpath-checked entry guards with explicit non-zero exit.

Components

strands-command/scripts/typescript/ — the runner (Node 20, NodeNext ESM, committed lockfile, strict npm ci)
strands-command/actions/strands-ts-runner/ — read-only agent run + artifact upload
strands-command/actions/strands-ts-finalize/ — deferred-write replay
strands-command/examples/strands-ts-command.yml — consumer workflow showing the auth gate + permission split

Test plan

57 unit tests (vitest), tsc --noEmit clean — covers the deterministic core (filter, deferred writes, replay guards, SOP loader, pagination, formatter) and the runReviewer seam with a mocked orchestrator
Runner smoke test: unknown command errors loudly with non-zero exit
Live validation: ran /strands-ts review end-to-end on test PRs (read-only agent run, artifact replay, posted summary + inline comments, label lifecycle) — see feat: add token-bucket limiter and align window units yonib05/devtools#12 and refactor: tidy pricing engine internals yonib05/devtools#13

Reviewer evaluation

Review quality was validated against two purpose-built pull requests, each with a pre-authored answer key of planted defects and deliberate false-positive baits. For reference, the same PRs were reviewed by the existing single-agent Python /strands review command and by an independent Claude Code review session.

All three reviewers ran on Claude Opus 4.8, for an apples-to-apples comparison.

Methodology: each reviewer was run exactly once on a pristine PR with no prior review comments — a reviewer that can read existing threads is no longer measuring independent recall.

Scenario A — new feature, convention-heavy

A new public SDK-style module added to a package shipping a committed CONVENTIONS.md. 9 planted defects: two correctness bugs (a 1000x token-refill unit error; an off-by-one window guard), a breaking public-signature change, and six convention violations (ms-in-API, _internal re-export, bare Callable vs Protocol, typing.Optional, f-string logging, missing test). 4 false-positive baits.

Reviewer	Planted defects found	False positives	Inline comments
strands-ts	9 / 9 + 1 unplanted	0 / 4	8, line-anchored
Python `/strands`	9 / 9	0 / 4	8
Claude Code (reference)	9 / 9 + 1 unplanted	0 / 4	n/a (CLI)

strands-ts and the Claude Code reference both additionally caught an unplanted real bug (the bucket seeds its clock in __init__ but accepts an injected now in acquire(), so the first call sees negative elapsed time). Every convention finding cited the exact CONVENTIONS.md clause.

Scenario B — refactor hiding a regression

A PR described as "pure cleanup, no behavior change" that actually moves tax onto the pre-discount subtotal (contradicting the class's own preserved docstring) and edits the existing test to expect the wrong value, so the suite stays green. 4 defects, 5 false-positive baits (legitimate refactors that must not be flagged).

Reviewer	Defects found	Regression + masking test linked	False positives
strands-ts	3 / 4	yes	0 / 5
Python `/strands`	3 / 4	yes	0 / 5
Claude Code (reference)	3 / 4	yes	0 / 5

All three caught the regression, identified that the test was rewritten to mask it, and left the 5 legitimate refactors alone. All three missed only the weakest signal (a commit-message vs. PR-description mismatch).

Why the multi-agent design helps

Full recall with zero false positives across both scenarios — the deterministic confidence filter (drop < 80, dedupe, cap) held precision at 100%, ignoring all 9 baits including a Decimal-for-money scope trap.
Convention enforcement is real: the orchestrator fetches governance docs that never appear in the diff, and every adherence finding cites the exact clause.
Line-anchored inline comments: findings post on their specific changed lines, not only as a summary.
Parallel specialized lenses surfaced an unplanted correctness bug the single-agent baseline missed.

Evidence

The trued-up evaluation runs are left open for inspection:

Scenario A: feat: add token-bucket limiter and align window units yonib05/devtools#12
Scenario B: refactor: tidy pricing engine internals yonib05/devtools#13

…tion

…ests

…lidation

…ne comments

… seams

…ce lens

…chors

…eshold

…lifecycle

Drop the version.ts smoke-test artifact (RUNNER_NAME had no production use) and the unused ReviewOutput type export. Remove three change-detector tests that asserted config data or a constant's value rather than behavior.

Replace the near-uniform per-tier maxTokens map + fallback with a single default and a haiku special-case. Extract the duplicated path-traversal guard into one safeSegments() helper so the security check has a single definition.

yonib05 added 29 commits June 12, 2026 10:46

chore(strands-ts): scaffold typescript project

877f64e

test(strands-ts): add vitest smoke test

20544cc

fix(strands-ts): use NodeNext module resolution for direct node execu…

4506596

…tion

feat(strands-ts): add Finding/ReviewOutput zod schemas

deae676

feat(strands-ts): add deterministic score/dedupe/cap filter

b169320

feat(strands-ts): add deferred-write safeguard wrapper

7b9d536

fix(strands-ts): async artifact writes, schema comment, test coverage

d1fe60f

feat(strands-ts): add github read tools and deferred addPrComment

57670b1

feat(strands-ts): add writeExecutor finalize replay

c6d22c1

fix(strands-ts): encode contents url, pin replay repo, pagination + t…

920518f

…ests

feat(strands-ts): add model factory, lens SOPs, and SOP loader

d82da69

feat(strands-ts): add specialist and orchestrator agent builders

39e628b

feat(strands-ts): add reviewer mode, registry, runner entry

2f56b25

fix(strands-ts): harden sop path guard, tier lookup, finding range va…

26fe579

…lidation

feat(strands-ts): add read-only runner action

b86367e

feat(strands-ts): add finalize action and example consumer workflow

457f330

fix(strands-ts): gate finalize on agent success, env-pass pr number

2435eba

fix(strands-ts): decode file contents, give orchestrator the head ref

cf5dade

fix(strands-ts): reject dot-segment paths, require commit_id for inli…

0c3505f

…ne comments

fix(strands-ts): paginate listings, harden entry guards, cover runner…

a5e18d9

… seams

refactor(strands-ts): restructure SOPs to devtools house style

3fdbef2

feat(strands-ts): post inline review comments per finding

cfd45fe

fix(strands-ts): treat inline-comment 422 as skipped, not a run failure

a301b09

fix(strands-ts): orchestrator fetches governance docs for the adheren…

4c83da7

…ce lens

fix(strands-ts): keep masking-test findings separate, clamp inline an…

14eabf6

…chors

fix(strands-ts): score verified/doc-cited findings at the posting thr…

1a41528

…eshold

feat(strands-ts): default to opus tier and add strands-running label …

5313a60

…lifecycle

yonib05 mentioned this pull request Jun 15, 2026

ci: add /strands-ts command handler strands-agents/harness-sdk#2793

Open

yonib05 mentioned this pull request Jun 15, 2026

ci: add /strands-ts command handler strands-agents/evals#266

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(strands-command): add /strands-ts multi-agent TypeScript runner#68

feat(strands-command): add /strands-ts multi-agent TypeScript runner#68
yonib05 wants to merge 29 commits into
strands-agents:mainfrom
yonib05:feat/strands-ts-command

yonib05 commented Jun 12, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

yonib05 commented Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Architecture

Safeguards (mirrors and tightens the Python model)

Components

Test plan

Reviewer evaluation

Scenario A — new feature, convention-heavy

Scenario B — refactor hiding a regression

Why the multi-agent design helps

Evidence

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

yonib05 commented Jun 12, 2026 •

edited

Loading