docs: update CHANGELOG.md for 0.1.1#1
Merged
Merged
Conversation
AI Code Review SummaryPR: #1 (docs: update CHANGELOG.md for 0.1.1) Overall AssessmentNo blocking issue was detected in the reviewed diff; keep focused regression testing before merge. Major Findings by SeverityNo major issues identified from the reviewed diff. Actionable Suggestions
Potential Risks
Test Suggestions
File-Level Coverage NotesNo file-level notes. Inline Downgraded Items (processed but not inline)
Coverage Status
Uncovered list:
No-patch covered list:
Runtime/Budget
|
jorben
added a commit
that referenced
this pull request
Jun 12, 2026
…llback Add four integration tests in agent_session_execution.rs for the Judge-prompt context builders (build_task_board_summary, build_process_compliance_summary) covering absent boards, active/abandoned board filtering, review-only helper filtering, status symbol mapping, and 200-char input truncation. Add six unit tests in runtime-thread-surface-state.test.ts for mapRunSummaryToContextUsage covering null input, explicit contextSize precedence, fallback to per-bucket sum, and full-field passthrough. Addresses review feedback from PR #227 (round 4): - #1 New DB-backed Judge summary functions lack tests - #4/#8 contextSize parsing/fallback logic lack unit tests
jorben
added a commit
that referenced
this pull request
Jun 17, 2026
… Judge subagent (#227) * feat(goal): ✨ replace self-attestation goal_scored with independent Judge acceptance agent Remove the `goal_scored` tool that allowed the main agent to self-attest goal completion, replacing it with an `agent_judge` built-in subagent that independently verifies goal attainment against the project's current state. Key changes: - Add `SubagentProfile::Judge` with read-only file tools and diagnostic-only shell (soft constraint via prompt) - Add `JudgeReport` structured contract (passed, completeness_pct, findings, summary) with safe fallback parsing - Add `agent_judge` tool injection only for the main agent when an unverified goal exists; runtime gate blocks subagent/parallel recursion into Judge - Add DB migration for `judge_passed`, `judge_completeness`, `judge_findings`, `judge_summary`, `judge_evaluated_run_id` columns with backfill for legacy `status='complete'` goals - Replace continuation stop condition: `Complete && judge_passed` instead of `goal_scored`-driven status flip - Rewrite continuation prompt to instruct main agent to call `agent_judge` and follow findings on rejection - Add Judge prompt surface, templates, and output contract - Update `active_goal.tpl.md` to reflect Judge acceptance flow - Extend goal lifecycle tests for Judge pass/fail/legacy compat * refactor(goal): ♻️ remove mark_complete and complete verdict Remove the mark_complete pathway from goals as completion will be handled through a different mechanism: - Remove mark_complete method from GoalManager - Remove "complete" from GoalEvaluateResult verdict type - Remove mark_complete test cases (evidence validation, etc.) - Update subagent surface comments to include judge BREAKING CHANGE: GoalEvaluateResult.verdict no longer includes "complete" * docs: 📝 update and reorder README feature list Update the feature descriptions and reorder the bullet points in both README.md and README_zh.md to better reflect the current product capabilities and improve readability. Changes include: - Reordering features to highlight persistent goal management, real-time streaming, and extensibility earlier in the list - Updating descriptions for several features to be more accurate - Maintaining consistency between English and Chinese versions - Keeping the overall structure while improving flow These are documentation-only changes that do not affect functionality. * refactor(goal): ♻️ extract resolveGoalStatusKey for testability - Extract inline status key resolution into a pure exported function so the complete→verified (judgePassed) branch can be unit-tested without mounting the component - Add unit tests covering all status mappings and judgePassed variants - Add test for skipped verdict passthrough in goalEvaluate * refactor(subagent): 🔧 increase builtin default max delegation depth to 5 Raise `BUILTIN_DEFAULT_MAX_DELEGATION_DEPTH` from 3 to 5 to match the existing `GLOBAL_MAX_DELEGATION_DEPTH`, allowing built-in subagents (explore/review) to be delegated to the same depth as custom profiles. Update delegation validation tests to reflect the new depth limits. * docs: 📝 remove obsolete design document * docs(judge): 📝 add size-first verification strategy and delegation guidelines * refactor(goal): ♻️ remove goal-level time_used_seconds in favor of run-level elapsed tracking * feat(judge): ✨ redesign Judge evaluation for independence and completeness * fix(subagent): 🐛 make task field optional and fix UTF-8 safe truncation - Downgrade Judge prompt versions from 2 to 1 (likely a revert of unintended bump) - Change `task` field from required to optional in Judge tool schema, with updated description clarifying it is an optional note - Replace byte-based truncation with character-safe truncation to avoid panicking on multi-byte UTF-8 in process compliance summary - Simplify Judge request validation to only check input validity, discarding the parsed result used only for backward compatibility - Skip abandoned task boards when building summary to focus on relevant goal state * chore(deps): 🔧 align tiycore to 0.2.10-rc.2 and adopt Usage::context_size() Cherry-pick the master commit (a03d9ba) that bumps tiycore from 0.2.9 to 0.2.10-rc.2 and unifies context_size semantics across RunUsageDto / frontend badge / auto-compression, removing the old initial_context_calibration heuristic path. No file conflict with the Judge work in this branch — the 25 files touched here do not overlap with the 6 Judge files resolved in the previous merge. * refactor(goal): ♻️ centralize status transitions to explicit commands and Judge verdicts * fix(agent): 🐛 fix timestamp slicing panic and add has_process_requirements tests Replace byte-index slicing with char-aware truncation to prevent panics on multi-byte UTF-8 boundaries in timestamp formatting. Add unit tests for `has_process_requirements()` covering English and CJK keywords, substring match behaviour, edge cases, and case-insensitive matching. * feat(compression): ✨ reserve 20% context window for auto-compression trigger Backend: replace fixed 16,384 token reserve with 20% of model context window (min floor 16,384). Small-window models keep the floor; GPT-4o class windows reserve ~25.6K, Claude-class ~40K, 1M-window ~200K. Frontend: add dashed threshold marker at 80% position in the thread header context pill so users can see when auto-compression will fire. * fix(run): 🐛 record elapsed running time when interrupting active runs * test: cover Judge summary builders and mapRunSummaryToContextUsage fallback Add four integration tests in agent_session_execution.rs for the Judge-prompt context builders (build_task_board_summary, build_process_compliance_summary) covering absent boards, active/abandoned board filtering, review-only helper filtering, status symbol mapping, and 200-char input truncation. Add six unit tests in runtime-thread-surface-state.test.ts for mapRunSummaryToContextUsage covering null input, explicit contextSize precedence, fallback to per-bucket sum, and full-field passthrough. Addresses review feedback from PR #227 (round 4): - #1 New DB-backed Judge summary functions lack tests - #4/#8 contextSize parsing/fallback logic lack unit tests
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Auto-generated changelog update for release 0.1.1.
Comparing changes from
0.0.1to0.1.1.This PR was automatically created by the changelog workflow.