diff --git a/.flue/PLAN.md b/.flue/PLAN.md deleted file mode 100644 index 4df2e5d1d..000000000 --- a/.flue/PLAN.md +++ /dev/null @@ -1,262 +0,0 @@ -# Investigate bot — design & rewrite plan - -Status: design locked, rewrite not started. Supersedes the original PR #1090 scaffolding. - -## Goal - -A bot a maintainer invokes (by label) to investigate an EmDash issue end to end. The bot reproduces the bug, optionally writes and verifies a fix, posts evidence including screenshots, and asks the reporter to confirm before any PR enters our queue. - -Modelled on Astro's `.flue/agents/issue-triage.ts` flow, with three intentional differences: - -- **Maintainer-initiated, not auto-fired** — we trigger via a `bot:repro` label, not on every `issues.opened`. -- **No draft PRs in the queue** — the bot pushes branches and asks for verification; PRs are only opened after the reporter confirms. -- **Inline screenshots via an orphan artifact branch** — no Cloudflare token, no external bucket. - -## Why this design - -Calibrated against a 100-issue appraisal of our recent queue: - -- ~38% AUTO candidates (fix tractable, testable on existing harness) -- ~24% ASSIST candidates (reproduce + failing test is high value, fix is risky) -- ~38% HUMAN candidates (features, cross-isolate architecture, ops) - -The bot's value is concentrated in two activities: - -1. **Reproduction.** Even when the bot can't fix a bug, the failing test it leaves behind is a permanent regression check we'd otherwise skip writing. -2. **Reporter-verified fixes on tractable bugs.** The reporter, not the maintainer, is the only person who can reliably verify that a fix solves the actual problem they reported. - -Browser access (`agent-browser` against `pnpm dev`) shifts ~6 admin-UI issues from HUMAN to ASSIST/AUTO. Worth the complexity. - -## Trigger and state - -One label triggers the bot. The bot then manages its own label as the investigation progresses. Labels are mutually exclusive on a single issue. - -| Label | Set by | Meaning | -| -------------------------- | ---------- | ---------------------------------------------------- | -| `bot:repro` | Maintainer | Investigation requested | -| `triage/reproducing` | Bot | Investigation in progress | -| `triage/reproduced` | Bot | Reproduced; no fix attempted (low confidence) | -| `triage/awaiting-reporter` | Bot | Reproduced + fix attempted; reporter asked to verify | -| `triage/verified` | Bot | Reporter confirmed; PR opened | -| `triage/not-reproduced` | Bot | Could not observe the reported behaviour | -| `triage/skipped` | Bot | Declined (needs user data, host-specific, etc.) | -| `triage/failed` | Bot | Gave up after retries | - -A GitHub Project board mirrors these as columns via saved label queries. One-time UI setup; the bot does not touch the board directly. - -## Workflows - -Three GitHub Actions workflows. - -### 1. `investigate.yml` - -Triggered by `issues.labeled` with `bot:repro`. Runs the Flue `investigate` workflow inside a GH Actions runner. - -Steps: - -1. Mint App installation token (scoped: `issues: write, contents: write, pull-requests: write` on this repo only). -2. Workflow YAML transitions label: `bot:repro` → `triage/reproducing`. -3. Checkout, setup pnpm + node, install, build. -4. `pnpm exec flue run investigate` with payload `{ issueNumber, issueTitle, issueBody, owner, repo, retryContext? }`. -5. Workflow YAML reads structured output JSON; pushes branches; posts comment; transitions label to terminal state. - -Bot's stages inside the workflow (each a separate skill, structured output between): - -1. **Reproduce** — sub-skill chosen by classifier output (`repro-api` / `repro-admin` / `repro-public`). Returns `{ reproduced, approach, notes, screenshots }`. -2. **Diagnose** — read the code paths that explain the reproduction. Returns `{ rootCause, confidence: 'high' | 'medium' | 'low' }`. -3. **Verify** — is this actually a bug or intended behaviour? Returns `{ verdict: 'bug' | 'intended-behavior' | 'unclear' }`. If `intended-behavior`, the pipeline short-circuits. -4. **Fix** — conditional on `verify.verdict === 'bug'` AND `diagnose.confidence === 'high'`. Writes the fix, runs the failing test, confirms it now passes. Returns `{ fixed, commitMessage, filesChanged }`. - -Terminal label after a run: - -- Reproduce skipped → `triage/skipped` -- Reproduce returned `reproduced: false` → `triage/not-reproduced` -- Verify returned `intended-behavior` → `triage/reproduced` (with explanation in comment) -- Fix attempted and `fixed: false` → `triage/reproduced` -- Fix attempted and `fixed: true` → `triage/awaiting-reporter` - -### 2. `reporter-reply.yml` - -Triggered by `issue_comment.created`. Filters: issue carries `triage/awaiting-reporter`, comment author is the issue author. - -Steps: - -1. Mint App token. -2. Classify the reply with a cheap kimi call: `positive | negative | unclear`. -3. **positive** — open PR from `bot/fix-`. Apply `triage/verified`. Remove `triage/awaiting-reporter`. Comment on the new PR with the reporter quote. -4. **negative** — increment retry counter (stored in a hidden HTML comment on the issue, or in a per-issue gist; TBD during build). If < 3 retries, re-invoke `investigate.yml` via `workflow_dispatch` with `retryContext: `. If ≥ 3, apply `triage/failed` and ping the maintainer who originally added `bot:repro`. -5. **unclear** — post a one-sentence clarifying question. No state change. - -### 3. `bot-cleanup.yml` - -Two triggers: - -- `issues.closed` — delete `bot/fix-` if no PR was ever opened from it; delete `bot/artifacts-` unconditionally. -- Daily cron — list all `bot/artifacts-*` branches; for each whose newest commit is older than 90 days, delete it. Catches stale artifacts on issues that stay open forever. - -## Token model - -Two distinct tokens, mirroring Astro's two-token split. - -| Token | Scope | Used by | Holds what | -| ------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------- | -| Sandbox token | Default `secrets.GITHUB_TOKEN`; job permissions: `contents: read, issues: read` | The agent's `local()` sandbox, exposed as `GH_TOKEN` to `bash` | Inside the agent's shell only | -| Orchestrator token | App installation token via `actions/create-github-app-token`, scoped to `emdash` repo with `issues: write, contents: write, pull-requests: write` | The workflow YAML and any TS orchestrator code that does writes (label changes, comments, branch pushes, PR creation) | Workflow `process.env`; never passed to `local({ env })` | - -A jailbroken agent's bash gets only the sandbox token — can read issues, can clone, can run `gh issue view`. Cannot comment, label, push branches, or open PRs. The orchestrator token never crosses into the sandbox. - -## Branches - -Two branches per investigation, both managed by the orchestrator: - -- **`bot/fix-`** — code changes only. This becomes a PR if the reporter verifies. Lives until either the PR merges or the issue closes. -- **`bot/artifacts-`** — orphan branch with screenshots only. Never merged. Referenced from comment markdown as `https://raw.githubusercontent.com/emdash-cms/emdash/bot/artifacts-/.png`. Deleted when the issue closes. - -The artifact branch is created with `git checkout --orphan` so it shares no history with `main`; its only commits are screenshot uploads. - -## Skills - -Seven markdown files. Bundled as Flue 0.8 imported skills, so they ship with the build and work identically in CI (Node) and any future deploy. - -| Skill | Purpose | -| ---------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------ | -| `investigate/SKILL.md` | Parent skill. Reads the issue, classifies type, dispatches to the right sub-skill, orchestrates the 4 stages. | -| `repro-api.md` | Reproduce API/CLI/MCP/migration bugs. No browser. Uses `pnpm test`, direct API hits via `gh` or `curl localhost:4321`, fixture setup via CLI. | -| `repro-admin.md` | Reproduce admin UI bugs. Starts `pnpm dev` via `bgproc`, uses dev-bypass for auth, drives `agent-browser`, captures screenshots, observes console + network. | -| `repro-public.md` | Reproduce public-page rendering bugs. Seeds content via CLI, then `agent-browser open http://localhost:4321/`. | -| `diagnose.md` | Read the code paths that explain the reproduction. Output `rootCause` (file:line plus prose) + `confidence`. | -| `verify.md` | Decide whether the reproduced behaviour is actually a bug. Compare against documented intent. | -| `fix.md` | Write the fix. Run the failing test. Confirm it now passes. Commit to `bot/fix-`. | - -The parent skill includes the dispatch logic: classify the issue into `kind: bug/enhancement/documentation/question` and `area: api/admin/public/migration/build/other`, then read the appropriate sub-skill. - -## Classifier - -A shared `createAgent(...)` factory in `lib/classifier.ts` used by: - -- The parent `investigate` skill, to decide which `repro-*` sub-skill to load. -- The `reporter-reply.yml` workflow, to classify reporter replies as positive/negative/unclear. -- The local prototype runner, to iterate on the classification prompt. - -Model: `cloudflare/@cf/moonshotai/kimi-k2.6` routed through our AI Gateway. Cheap and consistent for a structured classification task. - -## Screenshots - -The bot saves screenshots locally during reproduction (`./.bot-artifacts/.png`). After the run completes, the workflow YAML: - -1. Checks out the orphan `bot/artifacts-` branch (creating if absent). -2. Copies `.bot-artifacts/*` over the working tree. -3. Commits, force-pushes with the App token. -4. The agent's structured output already includes a `screenshots: [{ filename, description }]` array; the orchestrator interpolates `![desc](raw URL)` into the final comment body. - -This means the agent doesn't need to know GitHub's URL format. It just writes files and describes them. The orchestrator does the URL construction. - -## File layout - -``` -.flue/ -├── lib/ -│ ├── classifier.ts # createAgent factory + classifyReply helper -│ └── github.ts # Octokit wrappers (comment, label, branch push, PR open) -├── skills/ -│ ├── investigate/SKILL.md # Parent -│ ├── repro-api.md -│ ├── repro-admin.md -│ ├── repro-public.md -│ ├── diagnose.md -│ ├── verify.md -│ └── fix.md -├── workflows/ -│ └── investigate.ts # Single workflow with 4 stages -├── scripts/ -│ └── run-local.ts # Local prototype runner; spawns `flue run investigate` with a fixture payload -├── fixtures/ # 5 real issues (#1021, #1042, #1046, #1049, #1080) -├── package.json # Flue 0.8 -├── pnpm-workspace.yaml -├── pnpm-lock.yaml -├── tsconfig.json -└── README.md - -.github/workflows/ -├── investigate.yml # bot:repro → investigate workflow -├── reporter-reply.yml # author replies → classify + act -└── bot-cleanup.yml # issues.closed + daily cron - -scripts/ # repo-level (new) -└── setup-bot-labels.mjs # One-shot: idempotent gh label create for the 8 bot:* labels -``` - -Removed from the current PR #1090 tree: - -- `.flue/agents/triage-label.ts`, `.flue/agents/triage-issue.ts`, `.flue/agents/repro-issue.ts` -- `.flue/app.ts` -- `.flue/wrangler.jsonc` -- `.flue/lib/verify-signature.ts` (no webhook → no HMAC) -- `.github/workflows/auto-repro.yml` (replaced) -- `skills/reproduce/SKILL.md` at the repo root (replaced by `.flue/skills/repro-*.md`) - -## Local prototype runner - -`scripts/run-local.ts` keeps the shape it has now. Spawns `flue run investigate` with a fixture payload and prints the structured output. No GitHub writes (no token in env), no branches pushed; the orchestrator's role is simulated by just dumping the result. - -Required env, same as today: - -``` -CLOUDFLARE_ACCOUNT_ID -CLOUDFLARE_GATEWAY_ID -CLOUDFLARE_API_TOKEN -``` - -## Build order - -Three phases. Each phase committed separately so the diff is reviewable. - -### Phase 1 — tear down + restructure (~15 min) - -- Delete `agents/`, `app.ts`, `wrangler.jsonc`, `lib/verify-signature.ts`, `alchemy.run.ts` (already gone), the existing `auto-repro.yml`. -- Move `skills/reproduce/SKILL.md` from repo root into `.flue/skills/`. -- Create empty `workflows/`, expanded `skills/`. -- Bump `package.json` to `@flue/runtime@0.8.0` and `@flue/cli@0.8.0`. -- Update `tsconfig.json` includes. - -Commit: `refactor(triage): tear down phase 1 scaffolding for full pipeline rewrite` - -### Phase 2 — build the Flue workflow + skills (~90 min) - -- Write `workflows/investigate.ts` with the 4-stage pipeline. Stage-to-stage handoff via structured valibot schemas. -- Write `lib/classifier.ts` with the shared classifier factory + `classifyReply` helper. -- Write the 7 skill markdown files. -- Update `scripts/run-local.ts` to invoke `investigate` against fixtures. -- Verify `pnpm typecheck` + `flue build --target node` + `flue build --target cloudflare` all clean. - -Commit: `feat(triage): four-stage investigate workflow with classifier-dispatched sub-skills` - -### Phase 3 — wire GH Actions (~60 min) - -- Write `investigate.yml` with App-token minting, sandbox-token isolation, label state transitions, branch pushes. -- Write `reporter-reply.yml` with the kimi classification step and retry logic. -- Write `bot-cleanup.yml` with the two triggers. -- Write `scripts/setup-bot-labels.mjs` (idempotent label creation). -- Document the one-time GitHub Project board setup in `README.md`. - -Commit: `feat(ci): investigate workflows with reporter-reply loop and artifact cleanup` - -### Phase 4 — README, PR description (~30 min) - -- Rewrite `.flue/README.md` for the new architecture -- Rewrite the PR #1090 description -- Add a section explaining the one-time setup (App, labels, Project board) - -Commit: `docs(triage): rewrite README and PR description for the investigate flow` - -Total estimate: ~3.5 hours focused work. - -## Open questions before starting - -1. **Where does the retry counter live?** Options: hidden HTML comment on the issue (works, ugly), a per-issue gist (one extra API call), a custom Project board field. Will decide during Phase 3 — leaning toward HTML comment for simplicity, can upgrade later. - -2. **Should `verify.yml` (the PR-side check) ship in this rewrite?** I left it out of the plan. Could be a small follow-up PR — same workflow, different ingress (`bot:verify` label on a PR), checks out the PR's head, runs the reproduce stage against the PR's branch. Adds value but is independent. - -3. **Naming for the artifact branch.** `bot/artifacts-` is what I've used. Alternatives: `bot/screenshots-` (more specific), `_bot-artifacts/` (underscore prefix makes it sort separately). Slight preference for `bot/artifacts-` but happy to change. - -4. **First test issue.** Once the bot is live, we need to add `bot:repro` to a real issue and watch what happens. Candidates from the AUTO bucket appraisal: #994 (mirror 6 endpoints, very predictable), #1188 (CLI envelope mismatch, well diagnosed), or #1062 (one-line template fix). I'd start with #1062 — lowest blast radius, fastest feedback. diff --git a/.flue/README.md b/.flue/README.md index 371b056eb..a27514870 100644 --- a/.flue/README.md +++ b/.flue/README.md @@ -2,7 +2,7 @@ Experimental Flue-powered investigation bot for `emdash-cms/emdash` issues. Runs as a GitHub Actions workflow when a maintainer applies the `bot:repro` label. Not deployed as a Cloudflare Worker. -For the design rationale, see [PLAN.md](./PLAN.md) and the [PR description](https://github.com/emdash-cms/emdash/pull/1090). Astro's analogous setup (`.flue/agents/issue-triage.ts` in `withastro/astro`) is the closest reference. +For the design rationale, see the [PR description](https://github.com/emdash-cms/emdash/pull/1090). Astro's analogous setup (`.flue/agents/issue-triage.ts` in `withastro/astro`) is the closest reference. ## What it does @@ -13,24 +13,25 @@ When a maintainer adds `bot:repro` to an issue: - `repro-api` — `pnpm test`, CLI commands, direct API hits, no browser - `repro-admin` — `agent-browser` against `pnpm dev` with the dev-bypass auth shortcut - `repro-public` — `agent-browser` against the rendered public site -3. **Diagnose** — read the source paths that explain the symptom, rate confidence honestly. +3. **Diagnose** — read the source paths that explain the symptom, rate confidence in the root cause, choose a fix approach (`mechanical` / `clear-best-option` / `needs-design-decision`), and write a concrete proposed fix. 4. **Verify** — decide whether the behaviour is a bug or intended-by-design. Gates the fix stage. -5. **Fix** — conditional on `verdict=bug` AND `confidence=high`. Writes the change, runs the reproduce test, runs the broader package tests, typecheck, lint, format. Stages but does not commit. +5. **Fix** — conditional on `verdict=bug`, `confidence!=low`, and `fixApproach!=needs-design-decision`. Runs on a cheaper model (kimi-k2.6) in its own session — diagnose already produced the plan, so this stage is guided implementation. Writes the change, runs the reproduce test, the broader package tests, typecheck, lint, format. Stages but does not commit. The orchestrator (`.github/workflows/investigate.yml`) reads the structured JSON output and performs all GitHub writes — labels, comments, branch pushes, PR creation. The agent itself has no write access to GitHub. ## Trigger and label state -| Label | Set by | Meaning | -| -------------------------- | ---------- | ------------------------------------------------ | -| `bot:repro` | Maintainer | Investigation requested | -| `triage/reproducing` | Bot | Investigation in progress | -| `triage/reproduced` | Bot | Reproduced; no fix attempted (or fix abandoned) | -| `triage/awaiting-reporter` | Bot | Fix pushed; reporter asked to verify | -| `triage/verified` | Bot | Reporter confirmed; PR opened | -| `triage/not-reproduced` | Bot | Could not observe the reported behaviour | -| `triage/skipped` | Bot | Declined (non-bug, requires external data, etc.) | -| `triage/failed` | Bot | Gave up after retries | +| Label | Set by | Meaning | +| -------------------------- | ---------- | ------------------------------------------------------------ | +| `bot:repro` | Maintainer | Investigation requested | +| `triage/reproducing` | Bot | Investigation in progress | +| `triage/reproduced` | Bot | Confirmed bug; needs a maintainer (no fix, or fix abandoned) | +| `triage/by-design` | Bot | Reproduced, but the behaviour appears intentional | +| `triage/awaiting-reporter` | Bot | Fix pushed; reporter asked to verify | +| `triage/verified` | Bot | Reporter confirmed; PR opened | +| `triage/not-reproduced` | Bot | Could not observe the reported behaviour | +| `triage/skipped` | Bot | Declined (non-bug, requires external data, etc.) | +| `triage/failed` | Bot | Gave up after retries | The bot owns every label except `bot:repro`. Maintainers don't manage state directly — they trigger by adding `bot:repro` and re-trigger by removing/re-adding it. diff --git a/.flue/skills/_INVESTIGATE.md b/.flue/skills/_INVESTIGATE.md index 09042014e..d2023f5a1 100644 --- a/.flue/skills/_INVESTIGATE.md +++ b/.flue/skills/_INVESTIGATE.md @@ -53,7 +53,7 @@ If the reproduce stage returns `skipped: true`, do not run diagnose or fix. Run ### 3. Diagnose -Follow `../diagnose.md`. Feed it the reproduce notes. It returns a root cause (file plus approximate line plus prose), a confidence rating, and hypothesis notes if confidence is lower than `high`. +Follow `../diagnose.md`. Feed it the reproduce notes. It returns a root cause (file plus approximate line plus prose), a confidence rating in that cause, a fix approach (`mechanical`, `clear-best-option`, or `needs-design-decision`), a concrete proposed fix, and hypothesis notes covering alternative causes. Confidence rates the _cause_; fix approach rates the _fix_ -- the two are independent, so a confidently-located bug whose fix is one clear backwards-compatible change is `high` + `clear-best-option`, not `medium`. If the reproduce stage failed to reproduce (`reproduced: false`, not skipped), still run diagnose -- often the issue text alone is enough to identify the code path, and the bot's comment is more useful with a guess than without one. Diagnose should lower its own confidence accordingly. @@ -63,14 +63,15 @@ Follow `../verify.md`. It looks at the diagnosed code, the surrounding documenta ### 5. Fix (conditional) -Only run `../fix.md` when **both** of the following hold: +Only run `../fix.md` when **all** of the following hold: - `verify.verdict === 'bug'` -- `diagnose.confidence === 'high'` +- `diagnose.confidence !== 'low'` (the cause is pinned with at least medium confidence) +- `diagnose.fixApproach !== 'needs-design-decision'` (the fix is `mechanical` or `clear-best-option`) -Any other combination: skip fix. The bot will post the diagnosis and verify reasoning as a comment, and a human takes it from there. Attempting a fix at medium or low confidence wastes runner minutes and produces noisy diffs that have to be thrown away. +Any other combination: skip fix. The bot posts the diagnosis (including the proposed fix or, for a design decision, the options) and verify reasoning as a comment, and a human takes it from there. The gate is deliberately broader than the old `confidence === 'high'` rule, which conflated "is the cause certain?" with "is the fix obvious?" and starved the fix stage of real, fixable bugs. The output is not a merge -- it is a candidate branch the reporter is asked to verify and a maintainer reviews -- so a clear, test-backed fix is worth attempting even when it is more than a one-line change. -When you do invoke fix, carry its result forward. Fix returns whether the change actually built and tested clean, a conventional-commit-style message, the list of files changed, and notes. The orchestrator is responsible for committing and pushing -- you do not. +The fix stage runs on a cheaper model than the reasoning stages: diagnose has already produced a concrete plan, so fix is guided implementation rather than open-ended investigation. Carry its result forward. Fix returns whether the change actually built and tested clean, a conventional-commit-style message, the list of files changed, and notes. The orchestrator is responsible for committing and pushing -- you do not. ## Output diff --git a/.flue/skills/diagnose/SKILL.md b/.flue/skills/diagnose/SKILL.md index 719114859..412dd6219 100644 --- a/.flue/skills/diagnose/SKILL.md +++ b/.flue/skills/diagnose/SKILL.md @@ -35,19 +35,27 @@ You read code. You do not modify it. No edits, no test runs, no demo boots. The - Lingui `t` called at module scope. - Physical Tailwind class (`ml-*`, `text-left`) where a logical class belongs. 5. **Pin the location.** Identify the file and the smallest range of lines that contain the bug. A single line is ideal; a function-sized range is acceptable when the bug is structural. If you cannot get below file-level, you do not yet have a diagnosis -- search more. -6. **Rate confidence honestly.** - - **High** -- the root cause is mechanical and obvious. There is one line or a tightly-scoped block that, when changed in a specific way, would fix the bug without ambiguity. A junior engineer pointed at this code would arrive at the same fix. - - **Medium** -- you have identified the right code, but the correct fix involves design choices (which behaviour is the right one, whether to add a new parameter, whether to change the contract). A maintainer needs to decide before code is written. - - **Low** -- there are multiple plausible causes and you cannot rule them out without instrumentation or further testing. Or the candidate code is the right area but no specific bug is visible in it. - Rate down, not up. The fix stage only runs at `high`; over-rating produces wasted runs and rejected diffs. -7. **Write hypothesis notes when confidence is below high.** What else might be going on? What would you test to find out? This is the most valuable part of the comment for a maintainer reading a `medium` or `low` diagnosis. +6. **Rate your confidence in the root cause.** This axis is only about how sure you are that you have found the code responsible -- _not_ about how easy the fix is. Keep the two separate; the next step rates the fix. + - **High** -- you traced the symptom to a specific file and line range and can explain the mechanism end to end. Another engineer reading your diagnosis would agree this is the cause. + - **Medium** -- you have the right area and a strong candidate, but you could not fully confirm the mechanism (reproduce was skipped or failed, or there is a second plausible cause you cannot rule out by reading alone). + - **Low** -- multiple plausible causes you cannot distinguish without instrumentation, or the candidate code is the right area but no specific defect is visible in it. + Rate honestly in both directions. The fix stage does not run at `low`, but it _does_ run at `medium` when the fix is clear, so do not reflexively rate down -- a confidently-located cause is `high` even when the fix involves choosing between options. That choice is the next field's job, not this one's. +7. **Choose a fix approach.** This is independent of confidence. Judge how clear the _fix_ is, given the cause: + - **mechanical** -- there is one obviously-correct change: a single line or tightly-scoped block, no judgement calls. (A missing `await`, a wrong comparison operator, a missing `locale` filter.) + - **clear-best-option** -- the fix is bigger than a one-liner, or several shapes exist, but one is clearly the right call: it is backwards-compatible, matches patterns already in the codebase, and the reproduce test can confirm it. Name that option and say why it beats the alternatives. (Example: issue #1178 hard-codes `c.title` in a SELECT; probing the column list and selecting `title` only when it exists is backwards-compatible and matches the bug's shape, whereas every alternative either breaks the documented API or is a larger redesign. The sibling code in the same file is often direct evidence of intended behaviour -- if one branch already does the right thing, mirroring it is `clear-best-option`, not a design decision.) + - **needs-design-decision** -- choosing correctly requires a judgement only a maintainer should make: a new public API or option, a shared component that does not exist yet, a behavioural-contract change, or a security / performance tradeoff. Do not guess; lay out the options. + The fix stage runs for `mechanical` and `clear-best-option` and defers `needs-design-decision` to a human. Do not retreat to `needs-design-decision` just because more than one fix is conceivable -- reserve it for when the _right_ choice genuinely belongs to a maintainer. +8. **Write the proposed fix, always.** For `mechanical` / `clear-best-option`: describe the specific change -- which file, what to add/remove/change, and how the reproduce test proves it -- in enough detail that the fix stage can implement it directly without re-deriving your reasoning. (A cheaper model implements it; the more concrete your plan, the better the result.) For `needs-design-decision`: lay out the viable options and the tradeoff that distinguishes them, and name your recommendation if you have one. This becomes the maintainer's starting point. +9. **Write hypothesis notes for alternative _causes_.** Distinct from the proposed fix (which is about the remedy): what _other_ root causes did you consider, and how did you rule them in or out? Empty only when the cause is genuinely unambiguous. This is the most valuable part of the comment for a maintainer reading a `medium` or `low` diagnosis. ## Output Return: - A root cause: the file path with approximate line number (e.g. `packages/core/src/api/handlers/menus.ts:142`), followed by prose explaining what is wrong and why it produces the reported symptom. -- A confidence rating: `high`, `medium`, or `low`. -- Hypothesis notes: empty if confidence is `high`; otherwise a short paragraph listing the alternative causes you considered and what would distinguish them. +- A confidence rating in the root cause: `high`, `medium`, or `low`. +- A fix approach: `mechanical`, `clear-best-option`, or `needs-design-decision`. +- A proposed fix: the concrete change to make (`mechanical` / `clear-best-option`) or the options a maintainer must choose between (`needs-design-decision`). Never empty. +- Hypothesis notes: the alternative _causes_ you considered and what distinguishes them; empty only when the cause is unambiguous. Be specific. "Probably in the menu code somewhere" is not a diagnosis. "`resolveContentUrl` in `packages/core/src/menus/index.ts:87` issues three queries per item and the third is the missing-locale fallback path -- on a primary-locale request it is dead code, but it still runs" is. diff --git a/.flue/skills/fix/SKILL.md b/.flue/skills/fix/SKILL.md index ee6e31624..4df9e2a50 100644 --- a/.flue/skills/fix/SKILL.md +++ b/.flue/skills/fix/SKILL.md @@ -5,7 +5,11 @@ description: Write the fix when verify says bug and diagnose says high confidenc # Fix -You are only here because verify returned `bug` and diagnose returned `high` confidence. The orchestrator decided this is worth attempting an automated fix. Your job is to write that fix, prove it works, leave the working tree in a state the orchestrator can commit and push, and report what you did. +You are here because verify returned `bug`, diagnose pinned the cause with at least `medium` confidence, and diagnose rated the fix `mechanical` or `clear-best-option`. Diagnose handed you a **proposed fix** -- a concrete plan naming the file and the change. Your job is to implement that plan, prove it works, leave the working tree in a state the orchestrator can commit, and report what you did. The hard reasoning is already done; do not re-litigate the diagnosis unless reading the code convinces you it is wrong (in which case abandon -- see below). + +Read diagnose's proposed fix first and treat it as your spec. Implement that change. If, once you are in the code, the plan turns out to be wrong or incomplete, do not improvise a different large change -- abandon with `fixed: false` and say why, so a human can re-diagnose. + +**What your output is, and is not.** You are not merging anything, and you are not even opening a PR. The orchestrator pushes your staged change to a `bot/fix-` branch and asks the original reporter to install a preview build and confirm it resolves their issue. A maintainer reviews before anything lands on `main`. So the bar is "a correct, conventions-respecting change that makes the reproduce test pass" -- not "a perfect, unimprovable patch." A clear, test-backed fix is worth shipping for verification even when it is more than a one-liner. Equally: do not gold-plate, do not expand scope, do not refactor beyond the diagnosed bug. You can edit source. You can run tests, lint, typecheck, and format. You cannot commit, push, open a PR, or touch any GitHub state. @@ -23,7 +27,7 @@ You can edit source. You can run tests, lint, typecheck, and format. You cannot 1. **Re-read the diagnose root cause.** That is your target. The fix should land in the file and approximate line diagnose named. If your work drifts to a different file, stop and reconsider -- diagnose may have been wrong, in which case the right answer is to abandon, not to wander. 2. **Establish a regression test where one is feasible.** Reproduce confirmed the bug through agent-browser, not a test, so there is usually no failing test on disk yet. If the bug is unit- or integration-testable (a handler, a query, a pure function, an API route), write a `vitest` test now that fails for the reported reason -- run it with `pnpm --filter test ` and confirm it fails before you touch the fix. A bug with a testable surface and no regression test is not fixed. If the bug only manifests in the browser (admin UI interaction, rendered output), do not write a browser test -- the bot cannot run one reliably here; instead verify the fix through agent-browser and describe the manual verification in your notes so the maintainer can add a durable test when landing it. -3. **Write the smallest fix that resolves the bug.** Follow EmDash's conventions: +3. **Implement diagnose's proposed fix -- the smallest change that fully resolves the bug.** Start from the plan diagnose gave you; the change should land in the file and approximate line it named. Follow EmDash's conventions: - Internal imports end with `.js`. Type-only imports use `import type`. - Routes that change state start with `export const prerender = false;`. - Never interpolate values into SQL. Use Kysely's `sql` tagged template; use `sql.ref()` for identifiers; validate dynamic identifiers with `validateIdentifier()` before any `sql.raw()`. diff --git a/.flue/skills/verify/SKILL.md b/.flue/skills/verify/SKILL.md index 225944791..abcd7e32f 100644 --- a/.flue/skills/verify/SKILL.md +++ b/.flue/skills/verify/SKILL.md @@ -45,4 +45,4 @@ Return: - A verdict: `bug`, `intended-behavior`, or `unclear`. - Reasoning: the prose that supports the verdict, with paths to the comments, docs, or tests you relied on. -The orchestrator uses your verdict as a gate. `bug` plus a `high`-confidence diagnose triggers the fix stage. Anything else stops here and produces a comment-only outcome. +The orchestrator uses your verdict as a gate. `bug` triggers the fix stage when diagnose also pinned the cause (confidence not `low`) and rated the fix `mechanical` or `clear-best-option`. A `bug` whose fix `needs-design-decision`, an `unclear` verdict, or `intended-behavior` all stop here and produce a comment-only outcome for a maintainer. diff --git a/.flue/workflows/investigate.ts b/.flue/workflows/investigate.ts index 4dfecf127..960576147 100644 --- a/.flue/workflows/investigate.ts +++ b/.flue/workflows/investigate.ts @@ -15,10 +15,12 @@ // and rate confidence. // 4. Verify -- decide whether the diagnosed behaviour is actually a // bug or intended. Gates the fix stage. -// 5. Fix -- only when verify=='bug' AND diagnose.confidence=='high'. -// Writes the change, runs the reproduce test, runs the broader -// package tests, typecheck, lint, format. Stages but does not -// commit -- the YAML orchestrator does that. +// 5. Fix -- runs when verify=='bug', diagnose.confidence!='low', and +// diagnose.fixApproach!='needs-design-decision'. Runs on a cheaper +// model in its own session (the reasoning is already done; it +// implements diagnose's proposedFix). Writes the change, runs the +// reproduce test, the broader package tests, typecheck, lint, +// format. Stages but does not commit -- the YAML orchestrator does. // // Every stage uses session.skill() with a valibot result schema. The // orchestrator (the GH Actions workflow) reads the final JSON via jq @@ -95,9 +97,23 @@ const reproduceResultSchema = v.object({ }); type ReproduceResult = v.InferOutput; +// `confidence` rates certainty in the *root cause* (have we found the +// code responsible?). `fixApproach` rates clarity of the *fix*, an +// independent axis -- a bug can have an unambiguous cause but a fix +// shape that needs a maintainer's design call, or a clearly-correct +// fix that happens to be larger than one line. The old single-axis +// `high` rating conflated the two and starved the fix stage of real, +// fixable bugs (see issues #1178, #1199). `as const` preserves the +// literal unions under valibot's inference, same reason as verify. const diagnoseResultSchema = v.object({ rootCause: v.pipe(v.string(), v.minLength(10), v.maxLength(2000)), - confidence: v.picklist(["high", "medium", "low"]), + confidence: v.picklist(["high", "medium", "low"] as const), + fixApproach: v.picklist(["mechanical", "clear-best-option", "needs-design-decision"] as const), + // Always populated: the concrete change to make (mechanical / + // clear-best-option) or the options a maintainer must choose + // between (needs-design-decision). Fed into the fix stage as its + // target, and surfaced in the maintainer comment when fix defers. + proposedFix: v.pipe(v.string(), v.minLength(10), v.maxLength(2000)), hypothesisNotes: v.pipe(v.string(), v.maxLength(2000)), }); type DiagnoseResult = v.InferOutput; @@ -174,36 +190,74 @@ const classifierAgent = createAgent(() => ({ ].join("\n"), })); -// Investigator: opus + local() sandbox + the six stage skills registered. -// The sandbox cwd is pinned to GITHUB_WORKSPACE so skill resolution and -// shell commands land in the EmDash checkout, not in .flue/. +// Shared local() sandbox config. Both the investigator and the fix +// agent run shell commands against the same EmDash checkout, so they +// use identical sandbox settings -- cwd pinned to GITHUB_WORKSPACE (so +// skill resolution and bash land in the checkout, not in .flue/) and a +// read-only GH token. Because both sandboxes point at the same cwd, +// edits the fix agent stages on disk are exactly what the orchestrator +// later commits, even though fix runs in its own session. +function investigateSandbox(cwd: string) { + return local({ + cwd, + env: { + // Read-only token. The agent can clone and read issues; it + // cannot comment, label, or push. The orchestrator owns + // every write. + GH_TOKEN: process.env.AGENT_GH_TOKEN, + CI: "true", + NODE_ENV: "test", + // Used by bgproc when the repro-admin or repro-public skill + // boots `pnpm dev`. Standard Node convention. + NODE_OPTIONS: process.env.NODE_OPTIONS, + }, + }); +} + +// Investigator: opus + local() sandbox. Runs the reasoning-heavy +// stages -- reproduce, diagnose, verify. The fix stage runs on a +// separate, cheaper agent (below), so fix is intentionally NOT in this +// agent's skill set. const investigatorAgent = createAgent(() => { const cwd = process.env.GITHUB_WORKSPACE ?? process.cwd(); return { model: process.env.FLUE_INVESTIGATE_MODEL ?? "cloudflare-ai-gateway/claude-opus-4-7", cwd, - sandbox: local({ - cwd, - env: { - // Read-only token. The agent can clone and read issues; it - // cannot comment, label, or push. The orchestrator owns - // every write. - GH_TOKEN: process.env.AGENT_GH_TOKEN, - CI: "true", - NODE_ENV: "test", - // Used by bgproc when the repro-admin or repro-public skill - // boots `pnpm dev`. Standard Node convention. - NODE_OPTIONS: process.env.NODE_OPTIONS, - }, - }), + sandbox: investigateSandbox(cwd), instructions: [ "You are EmDash's investigation bot.", - "You walk a four-stage pipeline (reproduce -> diagnose -> verify -> fix) on one GitHub issue at a time.", + "You walk the reasoning stages (reproduce -> diagnose -> verify) on one GitHub issue at a time.", "You return read-only on GitHub: no comments, no labels, no branch pushes. The orchestrator does all writes after you finish.", "At every stage you obey the skill's hard prohibitions and produce strictly schema-conformant output.", "When you guess, say you guessed; when you skip, say why.", ].join(" "), - skills: [reproApi, reproAdmin, reproPublic, diagnose, verify, fix], + skills: [reproApi, reproAdmin, reproPublic, diagnose, verify], + }; +}); + +// Fix implementer: a cheaper model (Kimi) is enough here because the +// expensive reasoning is already done -- diagnose hands over a concrete +// `proposedFix`, and this stage only runs for `mechanical` / +// `clear-best-option` approaches. Its job is guided implementation: +// write the change, make the reproduce test pass, run lint / typecheck +// / format, and `git add`. It runs in its own session (fresh context, +// fed the diagnosis explicitly via args) with the same local() sandbox +// so it has real pnpm / git / gh on PATH and the EmDash checkout as cwd. +const fixAgent = createAgent(() => { + const cwd = process.env.GITHUB_WORKSPACE ?? process.cwd(); + return { + model: + process.env.FLUE_FIX_MODEL ?? "cloudflare-ai-gateway/workers-ai/@cf/moonshotai/kimi-k2.6", + cwd, + sandbox: investigateSandbox(cwd), + instructions: [ + "You are EmDash's fix implementer.", + "Diagnose has already found the root cause and written a proposed fix; your job is to implement that plan, not to re-investigate from scratch.", + "You return read-only on GitHub: no comments, no labels, no commits, no branch pushes. You stage changes with `git add` and stop; the orchestrator commits and pushes.", + "Obey the fix skill's hard prohibitions and produce strictly schema-conformant output.", + "If reading the code convinces you the proposed fix is wrong, abandon with `fixed: false` and explain why in notes rather than forcing a change you don't believe in.", + ].join(" "), + skills: [fix], }; }); @@ -430,11 +484,33 @@ async function runImpl({ }; } - // --- Stage 4: fix (gated on bug verdict + high diagnose confidence) --- - - const shouldFix = verifyOut.verdict === "bug" && diagnoseOut.confidence === "high"; + // --- Stage 4: fix (conditional) --- + // + // Gate on two independent axes, not the old single `confidence === + // "high"`: + // - verify says it's a bug, + // - diagnose pinned the root cause with at least medium confidence + // (a `low` cause is too shaky to write code against), and + // - the fix is `mechanical` or `clear-best-option` -- i.e. there + // is a correct change to make that doesn't require a maintainer's + // design call. + // `needs-design-decision` defers to a human even when the cause is + // certain (e.g. the fix needs a new public API or a component that + // doesn't exist yet). + const shouldFix = + verifyOut.verdict === "bug" && + diagnoseOut.confidence !== "low" && + diagnoseOut.fixApproach !== "needs-design-decision"; if (!shouldFix) { + // Explain precisely why no fix was attempted, since the reason + // now varies (unclear verdict / shaky cause / design decision). + const notAttemptedReason = + verifyOut.verdict !== "bug" + ? "The bot could not conclusively confirm this is a bug (`unclear` verdict), so it did not attempt an automated fix." + : diagnoseOut.confidence === "low" + ? "The root cause is not pinned down with enough confidence to write a fix against it." + : "The fix needs a design decision a maintainer should make, so the bot did not attempt it automatically. The proposed options are above."; return { skipped: false, reproduced: true, @@ -445,13 +521,15 @@ async function runImpl({ notes: [ `**Root cause (\`${diagnoseOut.confidence}\` confidence):** ${diagnoseOut.rootCause}`, "", - diagnoseOut.confidence === "high" - ? "" - : `**Hypotheses considered:** ${diagnoseOut.hypothesisNotes}`, + `**Proposed fix:** ${diagnoseOut.proposedFix}`, + "", + diagnoseOut.hypothesisNotes + ? `**Alternative causes considered:** ${diagnoseOut.hypothesisNotes}` + : "", "", `**Verdict:** \`${verifyOut.verdict}\` — ${verifyOut.reasoning}`, "", - "The bot reproduced the bug but did not attempt a fix. The fix stage requires `verdict: bug` AND `confidence: high`.", + notAttemptedReason, ] .filter(Boolean) .join("\n"), @@ -465,7 +543,13 @@ async function runImpl({ }; } - const { data: fixOut } = await investigatorSession.skill(fix, { + // Fix runs on its own (cheaper) agent and a fresh session. It is fed + // the diagnosis -- including the concrete `proposedFix` -- via args, + // and operates on the same on-disk checkout, so its staged edits are + // what the orchestrator commits. + const fixHarness = await init(fixAgent, { name: "fix" }); + const fixSession = await fixHarness.session(); + const { data: fixOut } = await fixSession.skill(fix, { args: { issueContext: issueContext(payload), classification, diff --git a/.github/workflows/investigate.yml b/.github/workflows/investigate.yml index 0b69a8845..02ac199de 100644 --- a/.github/workflows/investigate.yml +++ b/.github/workflows/investigate.yml @@ -132,7 +132,7 @@ jobs: run: | set -euo pipefail # Remove any existing bot:* label; swallow 404s (label may not be present). - for L in bot:repro triage/reproducing triage/reproduced triage/awaiting-reporter triage/verified triage/not-reproduced triage/skipped triage/failed; do + for L in bot:repro triage/reproducing triage/reproduced triage/by-design triage/awaiting-reporter triage/verified triage/not-reproduced triage/skipped triage/failed; do gh issue edit "$ISSUE_NUMBER" --repo emdash-cms/emdash --remove-label "$L" >/dev/null 2>&1 || true done gh issue edit "$ISSUE_NUMBER" --repo emdash-cms/emdash --add-label "triage/reproducing" @@ -354,7 +354,12 @@ jobs: run: | set -euo pipefail NOTES="$(jq -r '.notes // ""' /tmp/agent-result.json)" - gh issue edit "$ISSUE_NUMBER" --repo emdash-cms/emdash --remove-label "triage/reproducing" --add-label "triage/reproduced" + # `triage/by-design` (not `triage/reproduced`): the bot + # reproduced the described behavior but believes it is + # intentional. This is a "likely close / convert to discussion" + # signal, the opposite follow-up from a confirmed bug, so it + # gets its own label rather than sharing triage/reproduced. + gh issue edit "$ISSUE_NUMBER" --repo emdash-cms/emdash --remove-label "triage/reproducing" --add-label "triage/by-design" { echo "The investigation bot reproduced the described behavior, but it appears to be intended." echo @@ -443,8 +448,8 @@ jobs: # Configure git identity and a GIT_ASKPASS shim so the app # token is never visible on a process command line. - git config --global user.name "emdash-bot[bot]" - git config --global user.email "bot@emdashcms.com" + git config --global user.name "emdashbot[bot]" + git config --global user.email "emdashbot[bot]@users.noreply.github.com" export GIT_ASKPASS="$RUNNER_TEMP/git-askpass.sh" printf '#!/bin/sh\necho "%s"\n' "$APP_TOKEN" > "$GIT_ASKPASS" chmod +x "$GIT_ASKPASS" @@ -480,8 +485,8 @@ jobs: ( cd "$ART_TMP" git init -q -b "$ART_BRANCH" - git config user.name "emdash-bot[bot]" - git config user.email "bot@emdashcms.com" + git config user.name "emdashbot[bot]" + git config user.email "emdashbot[bot]@users.noreply.github.com" git add .bot-artifacts git commit -q -m "screenshots for #${ISSUE_NUMBER}" git remote add origin "$ORIGIN_URL" diff --git a/.github/workflows/preview-releases.yml b/.github/workflows/preview-releases.yml index abcf0c08d..d211dc553 100644 --- a/.github/workflows/preview-releases.yml +++ b/.github/workflows/preview-releases.yml @@ -33,4 +33,13 @@ jobs: cache: pnpm - run: pnpm install --frozen-lockfile - run: pnpm build - - run: pnpm dlx pkg-pr-new publish --pnpm './packages/core' './packages/admin' './packages/auth' './packages/blocks' './packages/cloudflare' './packages/create-emdash' './packages/gutenberg-to-portable-text' './packages/x402' './packages/plugins/ai-moderation' './packages/plugins/atproto' './packages/plugins/audit-log' './packages/plugins/color' './packages/plugins/embeds' './packages/plugins/forms' './packages/plugins/webhook-notifier' + # Publish a preview for every public package via globs rather than a + # hardcoded list. This keeps preview installs self-consistent: when + # `emdash`'s preview references a sibling like @emdash-cms/registry-client + # via `workspace:*`, pkg.pr.new can only rewrite that to a matching + # preview URL if the sibling is published in the same run. Omitting one + # makes the dep fall back to npm's released version, which breaks when the + # source has drifted (e.g. a new exports subpath added without a release). + # pkg.pr.new skips `private: true` packages automatically, so the test + # fixtures under packages/plugins/* are excluded without enumerating them. + - run: pnpm exec pkg-pr-new publish --pnpm './packages/*' './packages/plugins/*' diff --git a/.github/workflows/triage-project-sync.yml b/.github/workflows/triage-project-sync.yml index 93b3fde5d..1e7a58893 100644 --- a/.github/workflows/triage-project-sync.yml +++ b/.github/workflows/triage-project-sync.yml @@ -61,6 +61,7 @@ jobs: "bot:repro": "Queued", "triage/reproducing": "Reproducing", "triage/reproduced": "Reproduced", + "triage/by-design": "By design", "triage/awaiting-reporter": "Awaiting reporter", "triage/verified": "Verified", "triage/not-reproduced": "Not reproduced", @@ -72,6 +73,7 @@ jobs: "triage/verified", "triage/awaiting-reporter", "triage/reproduced", + "triage/by-design", "triage/not-reproduced", "triage/failed", "triage/skipped", diff --git a/package.json b/package.json index caf9eddbf..e6822491a 100644 --- a/package.json +++ b/package.json @@ -47,6 +47,7 @@ "oxfmt": "^0.34.0", "oxlint": "^1.66.0", "oxlint-tsgolint": "^0.23.0", + "pkg-pr-new": "^0.0.75", "prettier": "^3.8.1", "prettier-plugin-astro": "^0.14.1", "typescript": "6.0.0-beta" diff --git a/pnpm-lock.yaml b/pnpm-lock.yaml index 8c60fa566..04065352e 100644 --- a/pnpm-lock.yaml +++ b/pnpm-lock.yaml @@ -309,6 +309,9 @@ importers: oxlint-tsgolint: specifier: ^0.23.0 version: 0.23.0 + pkg-pr-new: + specifier: ^0.0.75 + version: 0.0.75 prettier: specifier: ^3.8.1 version: 3.8.1 @@ -375,7 +378,7 @@ importers: devDependencies: '@cloudflare/vite-plugin': specifier: 'catalog:' - version: 1.36.3(vite@8.0.11(@types/node@24.10.13)(esbuild@0.27.3)(jiti@2.6.1)(yaml@2.9.0))(workerd@1.20260526.1)(wrangler@4.95.0) + version: 1.36.3(vite@8.0.11(@types/node@24.10.13)(esbuild@0.27.3)(jiti@2.6.1)(yaml@2.9.0))(workerd@1.20260507.1)(wrangler@4.95.0) '@cloudflare/vitest-pool-workers': specifier: 'catalog:' version: 0.16.3(@vitest/runner@4.1.5)(@vitest/snapshot@4.1.5)(vitest@4.1.5) @@ -9724,6 +9727,10 @@ packages: resolution: {integrity: sha512-wQ0b/W4Fr01qtpHlqSqspcj3EhBvimsdh0KlHhH8HRZnMsEa0ea2fTULOXOS9ccQr3om+GcGRk4e+isrZWV8qQ==} engines: {node: '>=16.20.0'} + pkg-pr-new@0.0.75: + resolution: {integrity: sha512-u9mdErTewKSMsr+ceCt8VcNuNP0ro5AXiPXhUVApuEyqr2Zlvt+DdCFBcm+yGWN8mhOdZJ27meIDbnoZgfzpOw==} + hasBin: true + pkg-types@1.3.1: resolution: {integrity: sha512-/Jm5M4RvtBFVkKWRu2BLUTNP8/M2a+UwuAX+ae4770q1qVGtfjG+WTCupoZixokjmHiry8uI+dlY8KXYV5HVVQ==} @@ -13362,9 +13369,9 @@ snapshots: - utf-8-validate - workerd - '@cloudflare/vite-plugin@1.36.3(vite@8.0.11(@types/node@24.10.13)(esbuild@0.27.3)(jiti@2.6.1)(yaml@2.9.0))(workerd@1.20260526.1)(wrangler@4.95.0)': + '@cloudflare/vite-plugin@1.36.3(vite@8.0.11(@types/node@24.10.13)(esbuild@0.27.3)(jiti@2.6.1)(yaml@2.9.0))(workerd@1.20260507.1)(wrangler@4.95.0)': dependencies: - '@cloudflare/unenv-preset': 2.16.1(unenv@2.0.0-rc.24)(workerd@1.20260526.1) + '@cloudflare/unenv-preset': 2.16.1(unenv@2.0.0-rc.24)(workerd@1.20260507.1) miniflare: 4.20260507.1 unenv: 2.0.0-rc.24 vite: 8.0.11(@types/node@24.10.13)(esbuild@0.27.3)(jiti@2.6.1)(yaml@2.9.0) @@ -16781,7 +16788,7 @@ snapshots: sirv: 3.0.2 tinyglobby: 0.2.16 tinyrainbow: 3.1.0 - vitest: 4.1.5(@opentelemetry/api@1.9.0)(@types/node@24.10.13)(@vitest/browser-playwright@4.1.5)(@vitest/ui@4.1.5)(jsdom@26.1.0)(vite@8.0.11(@types/node@24.10.13)(esbuild@0.27.3)(jiti@2.6.1)(yaml@2.9.0)) + vitest: 4.1.5(@opentelemetry/api@1.9.0)(@types/node@24.10.13)(@vitest/browser-playwright@4.1.5)(@vitest/ui@4.1.5)(jsdom@26.1.0)(vite@8.0.14(@types/node@24.10.13)(esbuild@0.27.3)(jiti@2.6.1)(yaml@2.9.0)) '@vitest/utils@4.1.5': dependencies: @@ -20679,6 +20686,8 @@ snapshots: pkce-challenge@5.0.1: {} + pkg-pr-new@0.0.75: {} + pkg-types@1.3.1: dependencies: confbox: 0.1.8