docs: update CHANGELOG.md for 0.1.1 by jorben · Pull Request #1 · tiylabs/tiycode

jorben · 2026-04-07T00:29:32Z

Auto-generated changelog update for release 0.1.1.
Comparing changes from 0.0.1 to 0.1.1.

This PR was automatically created by the changelog workflow.

github-actions · 2026-04-07T00:29:59Z

AI Code Review Summary

PR: #1 (docs: update CHANGELOG.md for 0.1.1)
Preferred language: English

Overall Assessment

No blocking issue was detected in the reviewed diff; keep focused regression testing before merge.

Major Findings by Severity

No major issues identified from the reviewed diff.

Actionable Suggestions

Address the highest severity findings first and add targeted tests for changed logic.

Potential Risks

Potential hidden risks remain in edge cases not covered by the current diff context.

Test Suggestions

Add happy-path + boundary + failure-path tests for touched modules.

File-Level Coverage Notes

No file-level notes.

Inline Downgraded Items (processed but not inline)

None

Coverage Status

Target files: 0
Covered files: 0
Uncovered files: 0
No-patch/binary covered as file-level: 0
Findings with unknown confidence (N/A): 0

Uncovered list:

None

No-patch covered list:

None

Runtime/Budget

Rounds used: 0/4
Planned batches: 0
Executed batches: 0
Sub-agent runs: 0
Planner calls: 0
Reviewer calls: 0
Model calls: 0/64
Structured-output summary-only degradation: NO

github-actions

Automated PR review completed.

Findings kept: 0
Findings with unknown confidence: 0
Inline comments attempted: 0
Target files: 0
Covered files: 0
Uncovered files: 0
See the summary comment for detailed analysis and coverage details.

…llback Add four integration tests in agent_session_execution.rs for the Judge-prompt context builders (build_task_board_summary, build_process_compliance_summary) covering absent boards, active/abandoned board filtering, review-only helper filtering, status symbol mapping, and 200-char input truncation. Add six unit tests in runtime-thread-surface-state.test.ts for mapRunSummaryToContextUsage covering null input, explicit contextSize precedence, fallback to per-bucket sum, and full-field passthrough. Addresses review feedback from PR #227 (round 4): - #1 New DB-backed Judge summary functions lack tests - #4/#8 contextSize parsing/fallback logic lack unit tests

… Judge subagent (#227) * feat(goal): ✨ replace self-attestation goal_scored with independent Judge acceptance agent Remove the `goal_scored` tool that allowed the main agent to self-attest goal completion, replacing it with an `agent_judge` built-in subagent that independently verifies goal attainment against the project's current state. Key changes: - Add `SubagentProfile::Judge` with read-only file tools and diagnostic-only shell (soft constraint via prompt) - Add `JudgeReport` structured contract (passed, completeness_pct, findings, summary) with safe fallback parsing - Add `agent_judge` tool injection only for the main agent when an unverified goal exists; runtime gate blocks subagent/parallel recursion into Judge - Add DB migration for `judge_passed`, `judge_completeness`, `judge_findings`, `judge_summary`, `judge_evaluated_run_id` columns with backfill for legacy `status='complete'` goals - Replace continuation stop condition: `Complete && judge_passed` instead of `goal_scored`-driven status flip - Rewrite continuation prompt to instruct main agent to call `agent_judge` and follow findings on rejection - Add Judge prompt surface, templates, and output contract - Update `active_goal.tpl.md` to reflect Judge acceptance flow - Extend goal lifecycle tests for Judge pass/fail/legacy compat * refactor(goal): ♻️ remove mark_complete and complete verdict Remove the mark_complete pathway from goals as completion will be handled through a different mechanism: - Remove mark_complete method from GoalManager - Remove "complete" from GoalEvaluateResult verdict type - Remove mark_complete test cases (evidence validation, etc.) - Update subagent surface comments to include judge BREAKING CHANGE: GoalEvaluateResult.verdict no longer includes "complete" * docs: 📝 update and reorder README feature list Update the feature descriptions and reorder the bullet points in both README.md and README_zh.md to better reflect the current product capabilities and improve readability. Changes include: - Reordering features to highlight persistent goal management, real-time streaming, and extensibility earlier in the list - Updating descriptions for several features to be more accurate - Maintaining consistency between English and Chinese versions - Keeping the overall structure while improving flow These are documentation-only changes that do not affect functionality. * refactor(goal): ♻️ extract resolveGoalStatusKey for testability - Extract inline status key resolution into a pure exported function so the complete→verified (judgePassed) branch can be unit-tested without mounting the component - Add unit tests covering all status mappings and judgePassed variants - Add test for skipped verdict passthrough in goalEvaluate * refactor(subagent): 🔧 increase builtin default max delegation depth to 5 Raise `BUILTIN_DEFAULT_MAX_DELEGATION_DEPTH` from 3 to 5 to match the existing `GLOBAL_MAX_DELEGATION_DEPTH`, allowing built-in subagents (explore/review) to be delegated to the same depth as custom profiles. Update delegation validation tests to reflect the new depth limits. * docs: 📝 remove obsolete design document * docs(judge): 📝 add size-first verification strategy and delegation guidelines * refactor(goal): ♻️ remove goal-level time_used_seconds in favor of run-level elapsed tracking * feat(judge): ✨ redesign Judge evaluation for independence and completeness * fix(subagent): 🐛 make task field optional and fix UTF-8 safe truncation - Downgrade Judge prompt versions from 2 to 1 (likely a revert of unintended bump) - Change `task` field from required to optional in Judge tool schema, with updated description clarifying it is an optional note - Replace byte-based truncation with character-safe truncation to avoid panicking on multi-byte UTF-8 in process compliance summary - Simplify Judge request validation to only check input validity, discarding the parsed result used only for backward compatibility - Skip abandoned task boards when building summary to focus on relevant goal state * chore(deps): 🔧 align tiycore to 0.2.10-rc.2 and adopt Usage::context_size() Cherry-pick the master commit (a03d9ba) that bumps tiycore from 0.2.9 to 0.2.10-rc.2 and unifies context_size semantics across RunUsageDto / frontend badge / auto-compression, removing the old initial_context_calibration heuristic path. No file conflict with the Judge work in this branch — the 25 files touched here do not overlap with the 6 Judge files resolved in the previous merge. * refactor(goal): ♻️ centralize status transitions to explicit commands and Judge verdicts * fix(agent): 🐛 fix timestamp slicing panic and add has_process_requirements tests Replace byte-index slicing with char-aware truncation to prevent panics on multi-byte UTF-8 boundaries in timestamp formatting. Add unit tests for `has_process_requirements()` covering English and CJK keywords, substring match behaviour, edge cases, and case-insensitive matching. * feat(compression): ✨ reserve 20% context window for auto-compression trigger Backend: replace fixed 16,384 token reserve with 20% of model context window (min floor 16,384). Small-window models keep the floor; GPT-4o class windows reserve ~25.6K, Claude-class ~40K, 1M-window ~200K. Frontend: add dashed threshold marker at 80% position in the thread header context pill so users can see when auto-compression will fire. * fix(run): 🐛 record elapsed running time when interrupting active runs * test: cover Judge summary builders and mapRunSummaryToContextUsage fallback Add four integration tests in agent_session_execution.rs for the Judge-prompt context builders (build_task_board_summary, build_process_compliance_summary) covering absent boards, active/abandoned board filtering, review-only helper filtering, status symbol mapping, and 200-char input truncation. Add six unit tests in runtime-thread-surface-state.test.ts for mapRunSummaryToContextUsage covering null input, explicit contextSize precedence, fallback to per-bucket sum, and full-field passthrough. Addresses review feedback from PR #227 (round 4): - #1 New DB-backed Judge summary functions lack tests - #4/#8 contextSize parsing/fallback logic lack unit tests

docs: update CHANGELOG.md for 0.1.1

88f46e7

jorben added documentation Improvements or additions to documentation automated labels Apr 7, 2026

github-actions Bot reviewed Apr 7, 2026

View reviewed changes

jorben merged commit ab9c46e into master Apr 7, 2026
3 of 4 checks passed

jorben deleted the changelog/0.1.1 branch April 7, 2026 02:53

HayWolf mentioned this pull request Apr 27, 2026

perf(core): optimize agent runtime event processing #137

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

docs: update CHANGELOG.md for 0.1.1#1

docs: update CHANGELOG.md for 0.1.1#1
jorben merged 1 commit into
masterfrom
changelog/0.1.1

jorben commented Apr 7, 2026

Uh oh!

github-actions Bot commented Apr 7, 2026

Uh oh!

github-actions Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

jorben commented Apr 7, 2026

Uh oh!

github-actions Bot commented Apr 7, 2026

AI Code Review Summary

Overall Assessment

Major Findings by Severity

Actionable Suggestions

Potential Risks

Test Suggestions

File-Level Coverage Notes

Inline Downgraded Items (processed but not inline)

Coverage Status

Runtime/Budget

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant