Skip to content

feat(agents): add PR Walkthrough narrative orientation agent#1947

Open
dfinson wants to merge 11 commits into
microsoft:mainfrom
dfinson:dfinson/feat-pr-review-narrative-walkthrough
Open

feat(agents): add PR Walkthrough narrative orientation agent#1947
dfinson wants to merge 11 commits into
microsoft:mainfrom
dfinson:dfinson/feat-pr-review-narrative-walkthrough

Conversation

@dfinson

@dfinson dfinson commented Jun 14, 2026

Copy link
Copy Markdown

Summary

Adds a PR Walkthrough agent that produces narrative-driven PR orientations. After reading the output, a reviewer understands what changed, why, how the pieces connect, which files carry architectural weight, and where human judgment is required.

This is not a findings tool — it builds the reviewer's mental model so they can review efficiently and notice what matters.

Motivation

As agent-generated code becomes the norm, PRs are growing larger (10–50+ files) and the bottleneck has shifted from writing code to reviewing it. A narrative walkthrough — structured like a tech blog rather than a robotic file list — makes large diffs tractable by establishing a mental model before the reviewer opens the diff.

This agent distills ~2 months of personal experimentation with 'review as narrative' into a generalizable flow that works for PRs of any size.

What's included

File Purpose
\.github/agents/hve-core/pr-walkthrough.agent.md\ The agent definition
\collections/hve-core.collection.yml\ Registration in hve-core collection
\collections/hve-core-all.collection.yml\ Registration in hve-core-all collection
\plugins/\ (generated) Regenerated plugin outputs

Modes

  • Standalone: invoke directly with a base branch comparison
  • Orchestrated: reads \diff-state.json\ when called as a subagent of PR Review

Key design decisions

  • Follows the idea of the change, not the file list
  • Every claim anchored to quoted code fragments
  • Proportional output (small PRs get brief treatment)
  • Surfaces design forks and implicit bets for human judgment without prescribing answers
  • Mandatory contextual research step with self-verification gate

Testing

Tested across 10 PRs of varying sizes (3 lines to 1074 lines) across hve-core, VS Code, and TypeScript repos. See example outputs in issue #1946.

Relates to #1946

@dfinson dfinson requested a review from a team as a code owner June 14, 2026 13:04
@dfinson dfinson requested a review from Copilot June 14, 2026 13:47

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds a new “PR Walkthrough” agent to the HVE Core ecosystem and wires it into plugin packaging and collection indexes, alongside broad markdown table reformatting in collection docs.

Changes:

  • Added .github/agents/hve-core/pr-walkthrough.agent.md and registered it in hve-core / hve-core-all collections.
  • Added plugin agent pointer files for pr-walkthrough in both hve-core and hve-core-all.
  • Reformatted agent/prompt/instruction/skill tables across multiple collection markdown files (likely to improve rendering/consistency).

Reviewed changes

Copilot reviewed 20 out of 20 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
plugins/hve-core/agents/hve-core/pr-walkthrough.md Adds plugin-level pointer to the central pr-walkthrough agent definition.
plugins/hve-core/README.md Documents the new pr-walkthrough agent in plugin README tables.
plugins/hve-core-all/agents/hve-core/pr-walkthrough.md Adds plugin-level pointer to the central pr-walkthrough agent definition.
plugins/hve-core-all/README.md Documents the new pr-walkthrough agent in plugin README tables.
collections/security.collection.md Reformats auto-generated tables (agents/prompts/instructions/skills).
collections/project-planning.collection.md Reformats auto-generated tables (agents/prompts/instructions/skills).
collections/jira.collection.md Reformats auto-generated tables (agents/prompts/instructions/skills).
collections/installer.collection.md Reformats auto-generated tables (instructions/skills).
collections/hve-core.collection.yml Registers the new PR Walkthrough agent in the core collection.
collections/hve-core.collection.md Adds pr-walkthrough to the core collection markdown listing + table reformat.
collections/hve-core-all.collection.yml Registers the new PR Walkthrough agent in the “all” collection.
collections/hve-core-all.collection.md Adds pr-walkthrough to the “all” collection markdown listing + table reformat.
collections/gitlab.collection.md Reformats auto-generated tables (instructions/skills).
collections/github.collection.md Reformats auto-generated tables (agents/prompts/instructions/skills).
collections/experimental.collection.md Reformats auto-generated tables (agents/prompts/instructions/skills).
collections/design-thinking.collection.md Reformats auto-generated tables (agents/prompts/instructions/skills).
collections/data-science.collection.md Reformats auto-generated tables (agents/prompts/instructions).
collections/coding-standards.collection.md Reformats auto-generated tables (agents/prompts/instructions/skills).
collections/ado.collection.md Reformats auto-generated tables (agents/prompts/instructions/skills).
.github/agents/hve-core/pr-walkthrough.agent.md Introduces the PR Walkthrough agent’s full instruction set and workflow.
Comments suppressed due to low confidence (1)

.github/agents/hve-core/pr-walkthrough.agent.md:1

  • This line contains an em dash character () while also stating they are banned. If the repository-style rule is meant to apply to authored markdown artifacts as well (not just the agent’s generated output), this file violates it. Consider replacing the literal em dash character with a textual description (e.g., “em dash”) to avoid introducing the banned glyph into the repo.
---

Comment thread .github/agents/hve-core/pr-walkthrough.agent.md
Comment thread .github/agents/hve-core/pr-walkthrough.agent.md Outdated
Comment thread .github/agents/hve-core/pr-walkthrough.agent.md Outdated
@dfinson dfinson force-pushed the dfinson/feat-pr-review-narrative-walkthrough branch from abb6356 to 5f942cb Compare June 14, 2026 15:49
@codecov-commenter

codecov-commenter commented Jun 14, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 80.81%. Comparing base (a847cfa) to head (1c9edc2).

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1947      +/-   ##
==========================================
- Coverage   80.82%   80.81%   -0.01%     
==========================================
  Files         117      117              
  Lines       19095    19095              
==========================================
- Hits        15433    15432       -1     
- Misses       3662     3663       +1     
Flag Coverage Δ
pester 84.63% <ø> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.
see 1 file with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@dfinson dfinson force-pushed the dfinson/feat-pr-review-narrative-walkthrough branch 3 times, most recently from 248d457 to 80a7a1a Compare June 14, 2026 19:32
@jkim323

jkim323 commented Jun 15, 2026

Copy link
Copy Markdown
Collaborator

I love how this pr-walkthrough agent focuses on building the reviewer’s mental model. That output feels genuinely valuable.

One suggestion: thoughts on consider modeling it as a subagent of pr-review rather than as a new top-level peer agent. To me, the strongest value here is orientation before inspection: helping the reviewer understand the shape of the diff, triage important files, and surface design forks before pr-review produces findings.

Keeping it under pr-review would also make the product boundary clearer: pr-review remains the user-facing coordinator for review findings and merge readiness, while pr-walkthrough provides the narrative orientation artifact. It could also reuse the existing diff/CI/tracking pipeline instead of duplicating that setup.

If we go this route, I’d suggest marking the subagent with maturity: experimental in the collection manifest since this capability is still new and being validated.

With this said, I would like to invite @agreaves-ms and @WilliamBerryiii for other perspectives regarding this thought! Thank you!

@dfinson

dfinson commented Jun 15, 2026

Copy link
Copy Markdown
Author

Thanks for the feedback @jkim323. I tested this and want to share some context on this exact design tension I've been wrestling with.

I ran the walkthrough against 9 merged PRs and fed the output to a model acting as pr-review for focus-zone extraction. 9/9 high-confidence extractions, so the subagent model works mechanically. But there's a philosophical problem. This agent exists in its current form (i.e. 385 lines of attitude - professional, focused attitude, but still designed as a strong personality) because without the opinionated voice and strong point of view, the model immediately regresses to gluing English prose between diff hunks, using the hunks themselves as narrative scaffolding. IMO that kind of output isn't worth very much because it doesn't capture human attention, doesn't abstract by ideas, and doesn't structure around decisions. That style makes more sense if the agent is passing judgement (which pr review already does), less so if it needs to act as a lens for human attention. The personality itself seems to be what forces architectural thinking rather than line-by-line summarization.

Which means this thing embodies a fundamentally different philosophy than classical AI code review: PR Review scans the diff and formulates its own judgments, while this agent explicitly bans itself from judgment. Its job is to focus the human on what needs their judgment, not to replace it. It's for the person staring down a 45-file diff who needs orientation before they can effectively form their own opinions.

The problem with combining them sequentially is that the walkthrough says "here are the design forks, you decide" and then PR Review immediately says "this is wrong, fix it." In about 4 of 9 test cases the neutrality reads as theater once the verdict follows. They serve different audiences at different moments in the review lifecycle.

I considered three options: (1) standalone peer, different audiences, already interoperable via diff-state.json; (2) subagent of code-review-full, technically works but creates voice-whiplash in security/governance PRs; (3) dual registration, subagent AND standalone, users choose invocation path.

Based on @katriendg's review feedback, the branch now reflects option 1: standalone peer. The subagent registration and coding-standards collection entry have been removed. The agent stays in the hve-core collection only, marked maturity: experimental, with documentation at docs/agents/pr-walkthrough/. Pipeline interop via diff-state.json and the shared pr-reference skill remains wired for future orchestration if the team decides to revisit.

@bindsi bindsi left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved: the PR Walkthrough agent registration and generated artifacts are consistent with existing agent patterns. No actionable issues found.

@katriendg katriendg left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @dfinson, this is a very interesting addition to the platform, and I especially appreciate the fact you have been experimenting with this before submitting the contribution.
I look forward to fully testing it once it's been added to the repo and merged in.
Experimental is the right maturity fit for user testing.

I've left a few comments inline, where I think first there is a confusion about Code Review and PR Review - this new agent goes more along with PR Review, not Code Review which is an orchestrator for coding/programming, not a more generic PR reviewer. Let's keep this new one outside of Coding standards.

One important addition needed to merge, with this new agent we must document, add it to CUSTOM-AGENTS.md, and more importantly document its own dedicated page under ./docs/agents/README.md docs

Comment thread .github/agents/coding-standards/code-review-full.agent.md Outdated
Comment thread collections/coding-standards.collection.yml Outdated
Comment thread .github/agents/hve-core/pr-walkthrough.agent.md Outdated
Comment thread .github/agents/hve-core/pr-walkthrough.agent.md Outdated
Comment thread .github/agents/hve-core/pr-walkthrough.agent.md Outdated
Comment thread .github/agents/hve-core/pr-walkthrough.agent.md Outdated
@dfinson dfinson force-pushed the dfinson/feat-pr-review-narrative-walkthrough branch 2 times, most recently from 0db99c2 to dbe46e6 Compare June 15, 2026 14:39
@jkim323

jkim323 commented Jun 16, 2026

Copy link
Copy Markdown
Collaborator

Please ensure you ran these checks:

AI Artifact Contributions

  • Used /prompt-analyze to review contribution
  • Addressed all feedback from prompt-builder review
  • Verified contribution follows common standards and type-specific requirements

@bindsi bindsi left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved: the PR Walkthrough agent, documentation, and collection/plugin wiring are now consistent. I did not find actionable packaging or documentation issues in the current head.

@dfinson dfinson force-pushed the dfinson/feat-pr-review-narrative-walkthrough branch from 45d39a1 to bfb5283 Compare June 16, 2026 12:22
@dfinson

dfinson commented Jun 16, 2026

Copy link
Copy Markdown
Author

AI Artifact Contribution Checks

Re: @jkim323's checklist:

  • Used /prompt-analyze\ to review contribution — Ran a full prompt-builder quality audit. Found 4 critical + 2 major issues.
  • Addressed all feedback from \prompt-builder\ review — Fixed across 4 commits, each A/B tested against 5 PRs with independent scoring subagents:
  • Verified contribution follows common standards — Agent description under 120 chars, asterisk bullets, standard section naming (## Required Steps), no run-together paragraphs.

Remaining ALL CAPS in the file are legitimate: \BAD:/\GOOD:\ (example labels), \WEAKEN/\KILL/\COUNTER\ (enum action labels), and git/code placeholders (\MERGE_BASE, \HEAD, \AUTHOR).

dfinson and others added 10 commits June 16, 2026 16:30
- Add pr-walkthrough.agent.md for narrative-driven PR review orientation
- Register in hve-core and hve-core-all collections
- Add generated plugin symlinks

Relates to microsoft#1946

🚀 - Generated by Copilot

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…nd orchestrated path

- Replace angle-bracket placeholders with shell-safe variable patterns
- Clarify orchestrated mode still performs Step 1 hunk analysis
- Use command substitution for merge-base in fallback diff commands

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- add value proposition sentence establishing the agent's core purpose
- add BAD/GOOD editorial example demonstrating tradeoff presentation
- add stage-aware calibration for scaffold vs production code
- add COUNTER as 4th self-verification verdict for author-pushback prediction
- add quantity/softening refusal items
- add 'What Done Looks Like' 11-item completion checklist

🔧 - Generated by Copilot

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…d mark experimental

- Add PR Walkthrough to code-review-full agents list
- Add maturity: experimental to hve-core and hve-core-all collection entries
- Register pr-walkthrough in coding-standards collection as subagent dependency
- Regenerate plugins

🔧 - Generated by Copilot

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Remove subagent registration from code-review-full (not a code review tool)
- Remove from coding-standards collection (standalone agent, not a subagent)
- Fix description to remove subagent-of-PR-Review claim
- Fix shell placeholder (use literal MERGE_BASE variable, not prompt input syntax)
- Add voice convention note explaining why output voice differs from repo style
- Add documentation page in docs/agents/pr-walkthrough/
- Add entry to CUSTOM-AGENTS.md
- Regenerate plugins and extension manifests

🔧 - Generated by Copilot

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Replace dash bullets with asterisk bullets per repo convention
- Rename Pipeline section to Required Steps per protocol patterns
- Trim description to under 120 chars
- Fix run-together paragraphs (missing line breaks between bold items)
- Add sentence breaks between concatenated prose blocks

🔧 - Generated by Copilot

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Reword 'has opinions' and 'never editorialize' to remove contradiction
- Soften one ALL CAPS instance to bold emphasis
- A/B tested across 5 PRs: avg delta -0.10 (within noise)

🔧 - Generated by Copilot

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ules

- Convert '* **Title.** Description' to plain '* Description' format
- A/B tested across 5 PRs: avg delta +0.37 (net improvement)

🔧 - Generated by Copilot

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Convert THIS IS A BLOG POST to bold emphasis
- Replace NOT/YOUR/ISOLATING/PRESENTING with lowercase or italic
- Keep BAD/GOOD (example labels) and WEAKEN/KILL/COUNTER (enum labels)
- A/B tested across 5 PRs: avg delta +0.20 (no regression)

🔧 - Generated by Copilot

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…tions.md

- Move ~70 lines of voice/wit/rhetoric guidance from agent to instructions file
- Agent file references extracted instructions via auto-attach (applyTo pattern)
- Register new instructions in hve-core and hve-core-all collections
- Regenerate plugins and extension manifests
- A/B experiment (10 PRs) confirmed no quality regression from extraction

🔧 - Generated by Copilot

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@dfinson dfinson force-pushed the dfinson/feat-pr-review-narrative-walkthrough branch from ace3988 to 7bb8797 Compare June 16, 2026 13:33
The hve-core-all regenerator dropped maturity: experimental from
sssc-planner.instructions.md and supply-chain-security skill entries
during rebase conflict resolution.

🔧 - Generated by Copilot

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants