feat(strands-command): add /strands-ts multi-agent TypeScript runner#68
Open
yonib05 wants to merge 29 commits into
Open
feat(strands-command): add /strands-ts multi-agent TypeScript runner#68yonib05 wants to merge 29 commits into
yonib05 wants to merge 29 commits into
Conversation
Drop the version.ts smoke-test artifact (RUNNER_NAME had no production use) and the unused ReviewOutput type export. Remove three change-detector tests that asserted config data or a constant's value rather than behavior.
Replace the near-uniform per-tier maxTokens map + fallback with a single default and a haiku special-case. Extract the duplicated path-traversal guard into one safeSegments() helper so the security check has a single definition.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a new, parallel
/strands-ts <command>runner built on the TypeScript SDK, with a multi-agent PR reviewer as the first mode. The existing Python command is untouched; the two run side by side so the TS path can be validated before any cutover.Architecture
sops/) plus acustom_reviewermeta-agent for concerns no tuned SOP covers (explicitly second-choice).STRANDS_TS_AGENTSJSON:{model?, sop?}per agent key, supplied via workflow input / repo variable) > orchestrator's per-dispatchmodelTier(haiku/sonnet/opus/fable or raw Bedrock id) > default.Safeguards (mirrors and tightens the Python model)
GITHUB_WRITE=false): all write tool calls are recorded to a JSONL artifact; a separate finalize action replays them with the only write-capable token. Replay is allowlisted and repo-pinned (foreign-repo ops rejected, undefined repo pinned).permissions: {}at workflow level and a per-job split: the agent job can never holdpull-requests: write; finalize only runs on agent success.Components
strands-command/scripts/typescript/— the runner (Node 20, NodeNext ESM, committed lockfile, strictnpm ci)strands-command/actions/strands-ts-runner/— read-only agent run + artifact uploadstrands-command/actions/strands-ts-finalize/— deferred-write replaystrands-command/examples/strands-ts-command.yml— consumer workflow showing the auth gate + permission splitTest plan
tsc --noEmitclean — covers the deterministic core (filter, deferred writes, replay guards, SOP loader, pagination, formatter) and therunReviewerseam with a mocked orchestrator/strands-ts reviewend-to-end on test PRs (read-only agent run, artifact replay, posted summary + inline comments, label lifecycle) — see feat: add token-bucket limiter and align window units yonib05/devtools#12 and refactor: tidy pricing engine internals yonib05/devtools#13Reviewer evaluation
Review quality was validated against two purpose-built pull requests, each with a pre-authored answer key of planted defects and deliberate false-positive baits. For reference, the same PRs were reviewed by the existing single-agent Python
/strands reviewcommand and by an independent Claude Code review session.All three reviewers ran on Claude Opus 4.8, for an apples-to-apples comparison.
Methodology: each reviewer was run exactly once on a pristine PR with no prior review comments — a reviewer that can read existing threads is no longer measuring independent recall.
Scenario A — new feature, convention-heavy
A new public SDK-style module added to a package shipping a committed
CONVENTIONS.md. 9 planted defects: two correctness bugs (a 1000x token-refill unit error; an off-by-one window guard), a breaking public-signature change, and six convention violations (ms-in-API,_internalre-export, bareCallablevsProtocol,typing.Optional, f-string logging, missing test). 4 false-positive baits./strandsstrands-ts and the Claude Code reference both additionally caught an unplanted real bug (the bucket seeds its clock in
__init__but accepts an injectednowinacquire(), so the first call sees negative elapsed time). Every convention finding cited the exactCONVENTIONS.mdclause.Scenario B — refactor hiding a regression
A PR described as "pure cleanup, no behavior change" that actually moves tax onto the pre-discount subtotal (contradicting the class's own preserved docstring) and edits the existing test to expect the wrong value, so the suite stays green. 4 defects, 5 false-positive baits (legitimate refactors that must not be flagged).
/strandsAll three caught the regression, identified that the test was rewritten to mask it, and left the 5 legitimate refactors alone. All three missed only the weakest signal (a commit-message vs. PR-description mismatch).
Why the multi-agent design helps
Evidence
The trued-up evaluation runs are left open for inspection: