v0.20 PR 5: Evolution (Phase A) + Memory dashboard tabs#75
Conversation
Adds scroll and countPoints primitives to QdrantClient. Mirrors the existing
request pattern for timeout, headers, and error handling. Scroll returns
normalized { points, nextOffset } so callers never see raw Qdrant wire shape.
EpisodicStore, SemanticStore, and ProceduralStore each gain scroll, getById,
deleteById, and count methods that wrap the new Qdrant primitives and handle
payload mapping via their existing payloadToX helpers.
MemorySystem exposes scrollEpisodes / scrollFacts / scrollProcedures,
getEpisodeById / getFactById / getProcedureById, deleteEpisode / deleteFact /
deleteProcedure, and countEpisodes / countFacts / countProcedures so the
Memory explorer API can talk to MemorySystem instead of reaching into each
store or the Qdrant client.
Tests cover scroll happy path, pagination, filter and orderBy passthrough,
Qdrant error propagation, and countPoints shape.
Three read-only handlers for the Evolution dashboard tab (Phase A). Wraps EvolutionEngine.getCurrentVersion/getEvolutionLog/getMetrics/getEvolutionConfig plus EvolutionQueue.listPoisonPile for the poison banner. Reads diff file content from phantom-config/ with a 64 KB preview cap; current_size is the real byte length so the UI can flag truncated previews. No rollback endpoint. No snapshot storage. Those ship in Phase B. Tests cover the three endpoints including the metrics reshape, poison queue integration, historical version synthesis of parent = n - 1, large-file preview truncation, and every error path (400, 404, 405, 422).
Four handlers under /ui/api/memory. When the list endpoint has a query it runs hybrid recall via MemorySystem.recall*; without a query it falls back to the new Qdrant scroll ordered by recency. Detail and delete route through the new MemorySystem get*ById / delete* helpers. Detail is checked before delete so unknown ids 404 instead of silently succeeding. Ids are validated to reject control characters. Health reports counts per type and tolerates individual count failures via Promise.allSettled so one bad collection doesn't blank the whole strip. Tests cover health variants (qdrant down, ollama down, count failure), list scroll vs recall, type validation, offset passthrough, detail happy path and 404, delete happy path and pre-check 404, and method restrictions.
Evolution timeline tab for the operator dashboard. Version cards expand in place to show the per-file diff (summary, rationale, current content preview, source-session pills that link back to the Sessions tab). Metrics strip surfaces current version, total sessions, success rate 7d, drains with tier mix, reflection cost, and invariant fails. Poison banner appears when the queue has rows. Sparkline of drains per day over the loaded timeline window. Phase A is pure read. No rollback button, no write actions. The "Phase A read-only" chip sits in the header so the operator knows rollback is a future PR. Wires EvolutionEngine / EvolutionQueue into src/ui/serve.ts via new setEvolutionEngine / setEvolutionQueue seams, wired from src/index.ts after the engine and queue are constructed. Route name is now live in dashboard.js and index.html. Shared CSS primitives .dash-timeline*, .dash-diff*, .dash-poison-banner, .dash-session-pill, .dash-tab-switcher, .dash-chip land in dashboard.css. All operator-controlled fields flow through esc() for inline text and textContent for file-content previews inside <pre> nodes.
Memory explorer tab: split pane over episodes, facts, and procedures. Left rail has the type switcher (tabs: Episodes / Facts / Procedures), a debounced search input, and a recency-ordered scroll list with type- specific row layouts. Right pane paints the full record with a copy-as-JSON button and a delete button that routes through an explicit confirmation modal. Cancel has initial focus so Enter never deletes by accident. Hybrid recall when the search box has a query, Qdrant scroll when it's empty. Contradicted facts are greyed out and sorted last. Source episode IDs become cross-tab session pills that navigate to #/sessions/<key>. Responsive: below 720px the split collapses to a single full-width pane with a back button that returns to the list. Global / focuses the search input whenever the hash starts with #/memory. Delete confirmation uses ctx.openModal with aria-dialog semantics. Every operator-authored text field (summary, detail, trigger, natural language, lessons, step action, expected outcome) renders via textContent inside <pre class="dash-memory-text"> nodes because payloads may contain any characters. Shorter identifiers flow through esc().
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a41b4f77da
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| // entries as the outer window so `before_version` pagination works without a | ||
| // full disk walk. | ||
| function readTimelineWindow(engine: EvolutionEngine): EvolutionLogEntry[] { | ||
| const log = engine.getEvolutionLog(TIMELINE_SCAN_CAP); |
There was a problem hiding this comment.
Page timeline against full evolution log
buildTimeline computes pagination from readTimelineWindow, but that helper only reads the last 500 rows. In deployments with more than 500 evolution entries, before_version requests eventually return has_more=false even though older rows still exist, so the dashboard cannot load older generations and history is silently truncated.
Useful? React with 👍 / 👎.
| const allLog = deps.engine.getEvolutionLog(TIMELINE_SCAN_CAP); | ||
| const match = allLog.find((e) => e.version === versionNumber) ?? null; | ||
|
|
||
| if (!match && versionNumber !== current.version) { | ||
| return json({ error: "Version not found" }, { status: 404 }); |
There was a problem hiding this comment.
Resolve version lookups beyond the 500-entry window
The version detail route searches only getEvolutionLog(TIMELINE_SCAN_CAP) and then returns 404 when the target version is not in that slice. Once the log grows beyond 500 rows, valid historical versions outside the latest window are reported as missing, breaking direct links and inspection of older generations.
Useful? React with 👍 / 👎.
| const absolute = join(configDir, relPath); | ||
| if (!existsSync(absolute)) return { content: "", size: 0 }; |
There was a problem hiding this comment.
Constrain diff preview paths to config_dir
readFilePreview joins config_dir with a log-provided relative path without checking for traversal. Because details[].file is accepted from evolution-log.jsonl rows, a malformed or poisoned row containing ../ can escape config_dir and expose arbitrary local files through /ui/api/evolution/version/:n previews.
Useful? React with 👍 / 👎.
…ination caps) P1 (reviewer): CSS duplicate .dash-chip at dashboard.css:2589 overrode the primary-tinted chip at :748 used by Settings/Hooks/Skills/Subagents. Renamed the memory-specific neutral pill to .dash-memory-chip and updated the four call sites in memory.js so the shared primitive is not clobbered. P1 (reviewer): path traversal in evolution.ts readFilePreview. Log entries carry an agent-written 'file' path which was joined onto config_dir without normalization, so a poisoned row like '../../etc/passwd' could disclose arbitrary host files up to 64 KB through GET /ui/api/evolution/version/:n. Now resolve both sides and reject when the relative path escapes config_dir. Added a regression test that plants a sentinel file outside config_dir and confirms it is not returned. P1 (Codex): timeline pagination and version lookup were capped at 500 entries (TIMELINE_SCAN_CAP). After ~500 drains, before_version requests silently returned has_more=false and historical versions outside the window 404'd. Raised to 100_000 since entries are ~1-2 KB JSONL rows; an agent would take years to cross this ceiling. If it ever does, switch to a streaming reader. P1 (reviewer): memory scroll Load More never rendered because Qdrant's order_by disables the cursor API (next_page_offset always null). Raised LIST_DEFAULT_LIMIT from 30 to 100 (matches LIST_MAX_LIMIT) so the browse view surfaces the freshest 100 without requiring Load More. Proper cursor-style pagination over order_by is a documented follow-up.
CI caught a pre-existing flake that only fires when the test runs
near midnight UTC: hoursAgo(3) at 02:53 UTC resolves to 23:53 the
previous day, so SQLite's date(created_at) = date('now') buckets it
as yesterday and the 'today sum' assertion receives 1.0 instead of
3.0.
Clamp hoursAgo to 5 minutes ago whenever h < 24 would cross into a
different UTC date. daysAgo intentionally crosses the boundary and
stays as-is.
Bumps the version to 0.20.0 in every place it's referenced: - package.json (1) - src/core/server.ts VERSION constant - src/mcp/server.ts MCP server identity - src/cli/index.ts phantom --version output - README.md version + tests badges - CLAUDE.md tagline + bun test count - CONTRIBUTING.md test count Tests: 1,799 pass / 10 skip / 0 fail. Typecheck and lint clean. No 0.19.1 or 1,584-tests references remain in source, docs, or badges. v0.20 shipped eight PRs on top of v0.19.1: #71 entrypoint dashboard sync + / redirect + /health HTML #72 Sessions dashboard tab #73 Cost dashboard tab #74 Scheduler tab + create-job + Sonnet describe-assist #75 Evolution Phase A + Memory explorer tabs #76 Settings page restructure (phantom.yaml, 6 sections) #77 Agent avatar upload across 14 identity surfaces #79 Landing page redesign (hero, starter tiles, live pages list)
Summary
Two dashboard tabs shipped together, both built on the shared primitives that landed with PRs 2-4.
Evolution (Phase A, read-only)
phantom-config/meta/evolution-log.jsonl, newest first.#/sessions/<key>.Memory explorer
#/sessions/<key>./focuses the search input whenever the hash starts with#/memory.New infrastructure
QdrantClient.scroll(collection, { limit, offset?, filter?, orderBy?, withPayload? })returns{ points, nextOffset }.QdrantClient.countPoints(collection)returns an exact count for the health strip.scroll,getById,deleteById, andcountmethods on Episodic / Semantic / Procedural stores.MemorySystem.scroll* / get*ById / delete* / count*facade for the handler.Files
public/dashboard/evolution.jspublic/dashboard/memory.jssrc/ui/api/evolution.tssrc/ui/api/memory.tssrc/ui/api/__tests__/evolution.test.tssrc/ui/api/__tests__/memory.test.tssrc/memory/qdrant-client.ts(delta)src/memory/__tests__/qdrant-client.test.ts(delta)The CSS delta is larger than the target because the two tabs share several new primitives (
.dash-tab-switcher,.dash-session-pill,.dash-chip) that are genuinely reusable and.dash-timeline*/.dash-diff*/.dash-memory-*/.dash-split-paneare all first-class layouts with responsive rules. Under the 280 CSS ceiling individually nothing fits, so the blocks got promoted to shared territory.JS modules both under their individual ceilings (700 / 600). Backend handlers both well under ceiling (350 / 400).
Test plan
bun test src/memory/__tests__/qdrant-client.test.ts- 17 pass (8 new scroll/count tests, 9 prior)bun test src/ui/api/__tests__/evolution.test.ts- 20 passbun test src/ui/api/__tests__/memory.test.ts- 23 passbun testfull suite - 1744 pass, 0 new failures. One pre-existing flake incost.test.tsunrelated to this PR (date-boundary test, reproducible on main).bun run lint- cleanbun run typecheck- cleanctx.esc()(inline) ortextContent(multi-KB previews inside<pre>).innerHTMLis only used to stamp trusted template strings.ctx.openModal's handler.#/sessions/<url-encoded-key>on both tabs.var(--color-*)tokens./focuses memory search, Enter/Space toggles evolution cards, Escape clears search.What's deferred
Phase B for Evolution:
writeSnapshot/readSnapshot/rollbackToinversioning.ts,POST /ui/api/evolution/rollbackendpoint, rollback confirmation modal in the UI. Keeping them in a separate PR because they cross the "dashboard changes files on disk" boundary and deserve their own review pass.CSS classes
.dash-sidebar-item-soonand.dash-sidebar-soon-pillare now unused in markup but retained indashboard.cssin case future tabs need the soon-label affordance.Co-Authored-By: Claude Opus 4.7 (1M context) noreply@anthropic.com