Skip to content

docs: update CHANGELOG.md for 0.1.1#1

Merged
jorben merged 1 commit into
masterfrom
changelog/0.1.1
Apr 7, 2026
Merged

docs: update CHANGELOG.md for 0.1.1#1
jorben merged 1 commit into
masterfrom
changelog/0.1.1

Conversation

@jorben

@jorben jorben commented Apr 7, 2026

Copy link
Copy Markdown
Contributor

Auto-generated changelog update for release 0.1.1.
Comparing changes from 0.0.1 to 0.1.1.

This PR was automatically created by the changelog workflow.

@jorben jorben added documentation Improvements or additions to documentation automated labels Apr 7, 2026
@github-actions

github-actions Bot commented Apr 7, 2026

Copy link
Copy Markdown

AI Code Review Summary

PR: #1 (docs: update CHANGELOG.md for 0.1.1)
Preferred language: English

Overall Assessment

No blocking issue was detected in the reviewed diff; keep focused regression testing before merge.

Major Findings by Severity

No major issues identified from the reviewed diff.

Actionable Suggestions

  • Address the highest severity findings first and add targeted tests for changed logic.

Potential Risks

  • Potential hidden risks remain in edge cases not covered by the current diff context.

Test Suggestions

  • Add happy-path + boundary + failure-path tests for touched modules.

File-Level Coverage Notes

No file-level notes.

Inline Downgraded Items (processed but not inline)

  • None

Coverage Status

  • Target files: 0
  • Covered files: 0
  • Uncovered files: 0
  • No-patch/binary covered as file-level: 0
  • Findings with unknown confidence (N/A): 0

Uncovered list:

  • None

No-patch covered list:

  • None

Runtime/Budget

  • Rounds used: 0/4
  • Planned batches: 0
  • Executed batches: 0
  • Sub-agent runs: 0
  • Planner calls: 0
  • Reviewer calls: 0
  • Model calls: 0/64
  • Structured-output summary-only degradation: NO

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated PR review completed.

  • Findings kept: 0
  • Findings with unknown confidence: 0
  • Inline comments attempted: 0
  • Target files: 0
  • Covered files: 0
  • Uncovered files: 0
    See the summary comment for detailed analysis and coverage details.

@jorben jorben merged commit ab9c46e into master Apr 7, 2026
3 of 4 checks passed
@jorben jorben deleted the changelog/0.1.1 branch April 7, 2026 02:53
jorben added a commit that referenced this pull request Jun 12, 2026
…llback

Add four integration tests in agent_session_execution.rs for the
Judge-prompt context builders (build_task_board_summary,
build_process_compliance_summary) covering absent boards, active/abandoned
board filtering, review-only helper filtering, status symbol mapping, and
200-char input truncation.

Add six unit tests in runtime-thread-surface-state.test.ts for
mapRunSummaryToContextUsage covering null input, explicit contextSize
precedence, fallback to per-bucket sum, and full-field passthrough.

Addresses review feedback from PR #227 (round 4):
- #1 New DB-backed Judge summary functions lack tests
- #4/#8 contextSize parsing/fallback logic lack unit tests
jorben added a commit that referenced this pull request Jun 17, 2026
… Judge subagent (#227)

* feat(goal): ✨ replace self-attestation goal_scored with independent Judge acceptance agent

Remove the `goal_scored` tool that allowed the main agent to
self-attest goal completion, replacing it with an `agent_judge`
built-in subagent that independently verifies goal attainment
against the project's current state.

Key changes:
- Add `SubagentProfile::Judge` with read-only file tools and
  diagnostic-only shell (soft constraint via prompt)
- Add `JudgeReport` structured contract (passed, completeness_pct,
  findings, summary) with safe fallback parsing
- Add `agent_judge` tool injection only for the main agent when
  an unverified goal exists; runtime gate blocks subagent/parallel
  recursion into Judge
- Add DB migration for `judge_passed`, `judge_completeness`,
  `judge_findings`, `judge_summary`, `judge_evaluated_run_id`
  columns with backfill for legacy `status='complete'` goals
- Replace continuation stop condition: `Complete && judge_passed`
  instead of `goal_scored`-driven status flip
- Rewrite continuation prompt to instruct main agent to call
  `agent_judge` and follow findings on rejection
- Add Judge prompt surface, templates, and output contract
- Update `active_goal.tpl.md` to reflect Judge acceptance flow
- Extend goal lifecycle tests for Judge pass/fail/legacy compat

* refactor(goal): ♻️ remove mark_complete and complete verdict

Remove the mark_complete pathway from goals as completion will be
handled through a different mechanism:

- Remove mark_complete method from GoalManager
- Remove "complete" from GoalEvaluateResult verdict type
- Remove mark_complete test cases (evidence validation, etc.)
- Update subagent surface comments to include judge

BREAKING CHANGE: GoalEvaluateResult.verdict no longer includes "complete"

* docs: 📝 update and reorder README feature list

Update the feature descriptions and reorder the bullet points in both
README.md and README_zh.md to better reflect the current product
capabilities and improve readability. Changes include:

- Reordering features to highlight persistent goal management, real-time
  streaming, and extensibility earlier in the list
- Updating descriptions for several features to be more accurate
- Maintaining consistency between English and Chinese versions
- Keeping the overall structure while improving flow

These are documentation-only changes that do not affect functionality.

* refactor(goal): ♻️ extract resolveGoalStatusKey for testability

- Extract inline status key resolution into a pure exported function
  so the complete→verified (judgePassed) branch can be unit-tested
  without mounting the component
- Add unit tests covering all status mappings and judgePassed variants
- Add test for skipped verdict passthrough in goalEvaluate

* refactor(subagent): 🔧 increase builtin default max delegation depth to 5

Raise `BUILTIN_DEFAULT_MAX_DELEGATION_DEPTH` from 3 to 5 to match the
existing `GLOBAL_MAX_DELEGATION_DEPTH`, allowing built-in subagents
(explore/review) to be delegated to the same depth as custom profiles.

Update delegation validation tests to reflect the new depth limits.

* docs: 📝 remove obsolete design document

* docs(judge): 📝 add size-first verification strategy and delegation guidelines

* refactor(goal): ♻️ remove goal-level time_used_seconds in favor of run-level elapsed tracking

* feat(judge): ✨ redesign Judge evaluation for independence and completeness

* fix(subagent): 🐛 make task field optional and fix UTF-8 safe truncation

- Downgrade Judge prompt versions from 2 to 1 (likely a revert of unintended bump)
- Change `task` field from required to optional in Judge tool schema, with updated description clarifying it is an optional note
- Replace byte-based truncation with character-safe truncation to avoid panicking on multi-byte UTF-8 in process compliance summary
- Simplify Judge request validation to only check input validity, discarding the parsed result used only for backward compatibility
- Skip abandoned task boards when building summary to focus on relevant goal state

* chore(deps): 🔧 align tiycore to 0.2.10-rc.2 and adopt Usage::context_size()

Cherry-pick the master commit (a03d9ba) that bumps tiycore from 0.2.9
to 0.2.10-rc.2 and unifies context_size semantics across
RunUsageDto / frontend badge / auto-compression, removing the old
initial_context_calibration heuristic path. No file conflict with
the Judge work in this branch — the 25 files touched here do not
overlap with the 6 Judge files resolved in the previous merge.

* refactor(goal): ♻️ centralize status transitions to explicit commands and Judge verdicts

* fix(agent): 🐛 fix timestamp slicing panic and add has_process_requirements tests

Replace byte-index slicing with char-aware truncation to prevent
panics on multi-byte UTF-8 boundaries in timestamp formatting.

Add unit tests for `has_process_requirements()` covering English
and CJK keywords, substring match behaviour, edge cases, and
case-insensitive matching.

* feat(compression): ✨ reserve 20% context window for auto-compression trigger

Backend: replace fixed 16,384 token reserve with 20% of model context
window (min floor 16,384).  Small-window models keep the floor; GPT-4o
class windows reserve ~25.6K, Claude-class ~40K, 1M-window ~200K.

Frontend: add dashed threshold marker at 80% position in the thread
header context pill so users can see when auto-compression will fire.

* fix(run): 🐛 record elapsed running time when interrupting active runs

* test: cover Judge summary builders and mapRunSummaryToContextUsage fallback

Add four integration tests in agent_session_execution.rs for the
Judge-prompt context builders (build_task_board_summary,
build_process_compliance_summary) covering absent boards, active/abandoned
board filtering, review-only helper filtering, status symbol mapping, and
200-char input truncation.

Add six unit tests in runtime-thread-surface-state.test.ts for
mapRunSummaryToContextUsage covering null input, explicit contextSize
precedence, fallback to per-bucket sum, and full-field passthrough.

Addresses review feedback from PR #227 (round 4):
- #1 New DB-backed Judge summary functions lack tests
- #4/#8 contextSize parsing/fallback logic lack unit tests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

automated documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant