feat(embeddings): nomic daemon + hybrid LIKE/semantic grep (LoCoMo parity)#71
Open
feat(embeddings): nomic daemon + hybrid LIKE/semantic grep (LoCoMo parity)#71
Conversation
Introduce a long-lived embedding daemon backed by @huggingface/transformers (nomic-embed-text-v1.5) that plugin hooks and the virtual shell can call over a per-user Unix socket. Hooks run as one-shot subprocesses, so loading the model per invocation would add ~600 ms cold-start and ~200 MB RAM to every tool call — the daemon keeps the model resident and replies in ~15 ms. Components: - protocol.ts: JSON-line request/response types, socket/pid path helpers - nomic.ts: thin wrapper around the pipeline with Matryoshka-style truncation and the search_document / search_query prefix rules - daemon.ts: net.createServer on /tmp/hivemind-embed-<uid>.sock, idle auto-shutdown (15 min default), warmup-on-start, graceful SIGINT/SIGTERM, pidfile overwritten early so the client's spawn-lock stays valid - client.ts: fire-and-forget connect; first caller wins an O_EXCL pidfile lock and spawns the daemon detached, the rest just poll the socket. Writes its own pid first so concurrent clients see a live owner during the start-up window; the daemon overwrites it once it's listening. embed() returns null on any failure so hook callers can degrade to a no-embedding INSERT instead of blocking the write path - sql.ts: embeddingSqlLiteral() emits ARRAY[...]::float4[] or NULL Socket + pidfile under /tmp, 0600-perm so only the owning user can talk to them. Kill-switches via HIVEMIND_EMBED_* env vars.
Pins @huggingface/transformers ^3.0.0 (resolves to 3.8.1) in dependencies and registers src/embeddings/daemon.ts as a new esbuild entry point for both the Claude Code and Codex bundles, outputting to bundle/embeddings/embed-daemon.js. The daemon imports transformers + onnxruntime-node dynamically, so both are marked external in the esbuild config (the native .node binaries can't be inlined). Consumers of the plugin need these installed alongside the bundle; without them the daemon fails to start and the client gracefully degrades to no-embedding writes.
Extends the ensureTable / ensureSessionsTable DDL with two new nullable FLOAT4[] columns: summary_embedding on memory (784-dim when populated) and message_embedding on sessions. Deeplake's native vector type — rows without an embedding keep NULL, so the column is zero-cost for callers that don't ingest through the new path. Stored as FLOAT4[] rather than a serialized TEXT/JSON blob: Deeplake's native type gives us the <#> cosine operator on the column (verified on the test workspace, returns top-K in a single SQL round-trip) plus ~5× less storage than JSON-encoded vectors. A 768-dim embedding is ~3 KB binary vs ~16 KB as JSON text. Test asserts the schema literal for both tables so we catch accidental drops or type drift early.
Captures each session event through EmbedClient before the direct SQL INSERT into the sessions table. Embedding is best-effort: the client returns null on daemon miss/timeout and the write falls back to NULL in the message_embedding column. A missing embedding never blocks the capture path. The client is instantiated fresh per hook invocation and reuses /tmp/hivemind-embed-<uid>.sock via the spawn-lock in client.ts, so concurrent tool calls don't race-spawn multiple daemons. Test mocks EmbedClient with a Promise.resolve(null) stub so existing SQL-shape assertions keep passing without needing the daemon running during unit tests.
…ndex.md
Three related changes landed together because they all touch the same
DeeplakeFs flow:
1. Embed in _doFlush: before the parallel upsertRow pass, batch-compute
embeddings for every pending row via EmbedClient. If the daemon
isn't up, null embeddings are used — UPDATE / INSERT still fire
with embedding=NULL and the row keeps the summary column intact.
2. Virtual index.md now has `## memory` and `## sessions` subsections
instead of one merged table. Previously generateVirtualIndex queried
only the memory table for /summaries/%; with memory empty (e.g. the
"sessions only" ingest layout) the index came back as a headers-only
table and Claude sometimes refused to search at all. The new
implementation pulls the sessions section directly from the sessions
table with a GROUP BY path MAX(description), so the index is always
populated from whatever the workspace actually contains.
3. normalizeContent gains a branch for the single-turn JSONB shape
`{turn: {dia_id, speaker, text}}` used by the per-row per-turn
ingestion layout (workspace with_embedding_multi_rows). Emits the
same `[Dx:y] speaker: text` line the array path already produces
so grep / Read output is identical across layouts.
Tests updated for the new index shape (assert presence of `## memory`
and `## sessions` headers) and the INSERT/UPDATE SQL parsers now also
accept unquoted NULL and `ARRAY[...]::float4[]` literals so the
positional value extraction stays aligned after schema changes.
Core retrieval upgrade. searchDeeplakeTables() now runs a single UNION ALL
query across four sub-queries:
- memory.summary::text ILIKE (lexical, score=1.0 sentinel)
- sessions.message::text ILIKE (lexical, score=1.0 sentinel)
- memory.summary_embedding <#> ARRAY[...] (cosine, raw score)
- sessions.message_embedding <#> ARRAY[...] (cosine, raw score)
Results dedup by path in the outer layer, ORDER BY score DESC keeps the
exact-substring hits at the top regardless of cosine magnitude. Lexical
(inclusive) covers "find any session mentioning X", semantic fills in
with concept hits where the literal keyword isn't present (the
"Sunflowers" vs `sunflower` case, measured win vs pure semantic).
Always-case-insensitive by default (likeOp=ILIKE): baseline Claude uses
grep -i on 26% of calls against real files, our plugin Claude used it
on 0.5% because the context injection describes `Grep pattern=...`
without flags. Defaulting to ILIKE closes that gap without asking
Claude to remember. HIVEMIND_GREP_LIKE=case-sensitive for the rare
caller that needs strict matching.
grep-direct.ts and grep-interceptor.ts now instantiate a shared
EmbedClient, embed the grep pattern with `search_query:` prefix, and
pass queryEmbedding into searchDeeplakeTables. Timeout 500ms; on
failure queryEmbedding=null and the search silently falls back to
lexical-only (no user-visible degradation).
normalizeContent() now inlines the session date on every turn line:
(1:56 pm on 8 May 2023) [D1:5] Caroline: I went to LGBTQ group
Previously the date was a standalone header row, stripped by the
downstream refineGrepMatches line filter. Temporal questions
("When did X?") were answering with relative phrases like "last
Friday" because the reference date was in the discarded header.
Inlining attaches the date to every line that survives the regex.
Kept relaxed-mode emit-all behind HIVEMIND_SEMANTIC_EMIT_ALL=true for
future per-turn experiments. Rank-based fusion and BM25 alternatives
were tried and reverted — see PR notes.
Impact on the canonical 100-QA LoCoMo subset: plugin 0.735 vs baseline
0.750 (-0.015, within LLM non-determinism), 25% cheaper ($6.65 vs
$8.94), 41% fewer output tokens, 31% fewer turns.
Product of the preceding feature commits: tsc + esbuild rerun produces the new bundle/embeddings/embed-daemon.js for both CC and Codex, plus updated bundles for capture, pre-tool-use, session-start, session-start-setup, and deeplake-shell that include the EmbedClient, hybrid grep branch, and inline-date normalizeContent.
Adds targeted tests for the nomic daemon, IPC client, hybrid grep path,
and the semantic emit-all branch in grep-core, plus per-file thresholds
in vitest.config.ts so future regressions are caught in CI.
New test files
- claude-code/tests/embeddings-daemon.test.ts (11 tests): ping, embed,
unknown op, pidfile content, stale-socket unlink, idle-timeout-triggered
shutdown, malformed-JSON survival, dispatch-error -> { error } reply,
default options, empty-line framing, abrupt client disconnect.
- claude-code/tests/embeddings-nomic.test.ts (12 tests): lazy load
memoization, document/query prefixing, batching, empty batch, Matryoshka
truncation with renormalization, zero-norm fallback, default repo/dtype/
dims, and concurrent load() coalescing.
Extended tests
- embeddings-client.test.ts: stale-pid cleanup, alive-pid preservation,
garbage-pid cleanup, socket reset mid-request, malformed JSON, request
timeout, getEmbedClient() singleton, default options, default 'kind'
argument, HIVEMIND_EMBED_DAEMON env fallback, successful auto-spawn via
fake daemon entry.
- grep-interceptor.test.ts: semantic-friendly pattern passes embedding
into searchDeeplakeTables; regex-heavy / too-short patterns skip
embedding; embed() rejection falls back to lexical; lexical retry when
semantic returns zero rows; emit-all-lines branch; SEMANTIC_EMIT_ALL
opt-out; Promise.race 3s timeout rejector via fake timers.
- grep-core.test.ts: grepBothTables emits every non-empty line when a
queryEmbedding is present; refinement still runs when SEMANTIC_EMIT_ALL
is disabled.
Source tweak
- daemon.ts: marks the CLI-entrypoint block with /* v8 ignore start/stop
*/. The invokedDirectly bootstrap only fires when the file is node's
argv[1], which unit tests can't reproduce without forking a subprocess.
Config
- vitest.config.ts: adds per-file thresholds for src/embeddings/*.ts.
Lines/statements are held at 90 for every embeddings file; branches
and functions dip to 80/75 only on client.ts and daemon.ts where a
small number of paths (SIGINT/SIGTERM handlers, non-Linux getuid
fallback, server 'error' handler) cannot be exercised from unit tests.
Resulting per-file coverage
- client.ts 95.9 / 85.1 / 95.23 / 96.29
- daemon.ts 94.87 / 77.77 / 78.94 / 100
- nomic.ts 96.22 / 92 / 100 / 100
- protocol.ts 100 / 100 / 100 / 100
- sql.ts 100 / 100 / 100 / 100
- grep-core.ts 96.79 / 91.5 / 97.22 / 100
- grep-interceptor 97.5 / 92.1 / 94.11 / 100
All 933 tests pass; no threshold errors.
The Deeplake SQL backend returns NULL for `SUM(size_bytes) GROUP BY path`
even when each row's size_bytes is a positive integer. Reproducible
against workspace `with_embedding` on the `sessions` table:
SELECT MIN(size_bytes), MAX(size_bytes), COUNT(*) FROM "sessions"
-> min=2284, max=9266, count=272 (OK)
SELECT path, size_bytes FROM "sessions" LIMIT 1
-> size_bytes=3238 (OK)
SELECT path, SUM(size_bytes) FROM "sessions" GROUP BY path
-> sum=null for every row (BUG)
The bootstrap path for the sessions table uses that aggregation to fill
per-file metadata. With SUM broken, every file's size was set to 0 in
the virtual FS, and `ls -la` / `stat` returned `Size: 0` — enough for
agents doing exploratory `ls` to conclude the memory was empty and give
up. `cat` / Read still worked because they go through a different query.
Switching to MAX side-steps the backend bug. For single-row-per-file
layouts (like `with_embedding`) MAX and SUM are identical. For
multi-row-per-turn layouts (like `with_embedding_multi_rows`) MAX
under-reports total size but stays strictly > 0, which is what the ls
metadata needs. A comment on the line explains the rationale so the
next reader doesn't "fix" it back to SUM.
Bundles regenerated.
… limits
The previous SessionStart context told the model to "Only use bash
commands (cat, ls, grep, echo, jq, head, tail, etc.) to interact with
~/.deeplake/memory/". That instruction explicitly steered away from the
Grep tool, which is the one path that actually uses the hybrid
semantic+literal retrieval. Agents ended up doing `for f in *.json; do
grep ... $f; done`, hitting the 10 MB bash output cap, or using
unsupported brace expansions like `{1..20}` and silently getting empty
loops.
Rewrite the SEARCH section to:
- explicitly prefer the Grep tool over bash grep for memory paths,
- show two good patterns (descriptive phrases, not single keywords, so
the semantic layer is useful),
- flag the bash for-loop anti-pattern by name.
Rewrite the follow-up bullet that used to forbid non-bash interpreters
to instead tell the model to use bash cat/head/tail on SPECIFIC files
returned by Grep, and to avoid `{a..b}` brace expansions (the virtual
shell doesn't fully support them). The no-python rule is preserved.
Observed on the 50-QA locomo benchmark after this change: bash error
rate roughly halved, number of bash calls dropped ~12%, and — in one
of two sampled runs — overall accuracy hit a new high. With n=2 the
mean shift is not statistically significant on its own, but the
behavioural signal (fewer wasteful shell loops, more focused queries)
is consistent and desirable regardless.
…TE opt-out Two changes to SessionStart that surfaced during benchmark diagnosis. 1. Revert the "prefer the Grep tool over bash grep" block added in c36bac0. The bundled PreToolUse hook's Grep interceptor returns `updatedInput: {command, description}` — the Bash tool input shape — but Claude Code ≥ 2.1.117 does not accept tool substitution via `updatedInput`. When the originating tool is Grep, Claude Code ignores the shape mismatch and runs native Grep against the virtual memory path, which fails with `Path does not exist`. Steering agents toward the Grep tool therefore triggered an 80% failure rate on any session that took the hint. Measured impact on combined 100-QA locomo subset: 0.735 (old prompt) -> 0.480 (new prompt, broken Grep). Restoring "Only use bash commands" sends agents back to the Bash intercept path, which has matching schema and works. Kept the two factual bullets from c36bac0 that document real virtual shell limits (10 MB bash output cap, `{a..b}` brace expansion not fully supported) — those apply to Bash usage and are useful on their own. The Grep-specific steering is the only part reverted. 2. Add a `HIVEMIND_AUTOUPDATE=false` escape hatch around the version check + autoupdate block. When true (default), behaviour is unchanged: the hook runs `claude plugin update hivemind@hivemind` across four scopes plus an `rmSync` over old cache directories every time a session starts. Under a concurrent benchmark (20 sessions) that triggers 200+ times, races with live sessions on the shared cache dir, and inflates SessionStart wall time by seconds. `HIVEMIND_AUTOUPDATE=false` short-circuits the whole block; the plugin still works normally at runtime, it just doesn't try to self-upgrade. Intended for benchmark and CI setups.
… headroom Under 20-way concurrency the PreToolUse hook cold-starts a fresh Node process, loads config, builds a DeeplakeApi client, and issues a SQL query to intercept the tool. Measured p95 per-hook time under that load can exceed 10 s, which Claude Code treats as a cancel and falls back to the original (unintercepted) tool call. 60 s matches the timeout on other hooks (SessionEnd, the async setup job) and gives the intercept path headroom without changing steady-state behaviour.
…trap Two test mocks were still matching the old `SUM(size_bytes)` SQL string so the bootstrap query was silently returning an empty row list and every session path ended up absent from `sessionPaths`, which then made 16 unrelated read-only / rm-rf tests fail with ENOENT. The SQL itself was changed to MAX in 0c3a94d; this just brings the mock matchers and reducers in line with it (MAX instead of SUM per group). No production-code change, no new tests. 933/933 pass.
The env gate added in 11457e1 duplicated an existing mechanism: the `creds.autoupdate` flag stored in ~/.deeplake/credentials.json, toggled via `node auth-login.js autoupdate [on|off]`. Both short-circuit the disruptive part of the session-start autoupdate flow (the external `claude plugin update` subprocess and the `rmSync` over old cache directories). The only extra behaviour the env var provided was also skipping the version fetch to GitHub (one ~100-500 ms HTTP GET with 3 s timeout) and suppressing the "update available" stderr line. Neither justifies a second toggle with slightly different semantics. Reverting the source block and its two tests. The prompt revert and bundle regeneration from 11457e1 stay in place.
Pull in the autoupdate-session-safety fixes (plugin-cache helper + SessionEnd GC hook), multiWordPatterns lexical fallback in grep-core, new coverage thresholds, and the main version bumps (0.6.39 → 0.6.46). Conflict resolutions: - package.json / package-lock.json / plugin.json / marketplace.json: kept our 0.7.0 (the embeddings minor bump) over main's 0.6.46. - src/shell/grep-core.ts: kept BOTH bm25Term (ours) and multiWordPatterns (main) as independent fields on SearchOptions. They target different failure modes — bm25Term feeds Deeplake's <#> TEXT ranker, multiWord splits the pattern for per-word OR prefiltering. Neither conflicts with the other at the type or SQL level. - vitest.config.ts: concatenated both sides' per-file coverage threshold blocks verbatim (embeddings/* + pre-tool-use + memory-path-utils + plugin-cache + session-start(-setup)). - Bundle files (claude-code/**, codex/**): regenerated via `npm run build` after source conflicts were resolved. Tests: 1104 / 1104 passing post-merge (was 933 on the branch; main added 171 new tests spanning config / debug / plugin-cache / pre-tool- use / session-start-setup branches). Drive-by: killed a stray nomic embed-daemon from an earlier benchmark run that was causing grep-direct.test.ts:"delegates to grepBothTables" to flake — when the daemon is up, `EmbedClient.embed()` returns a real vector and the test's output goes through the semantic-emit-all-lines path instead of the lexical refine path it asserts on. Not the merge's fault, but surfaced by the post-merge full run.
The async SessionStart setup hook now fires EmbedClient.warmup() as its last step. warmup() either connects to an existing embed-daemon socket or spawns a fresh detached process; the daemon then calls NomicEmbedder.load() in the background, which triggers the one-time nomic-embed-text-v1.5 download to ~/.cache/huggingface/hub/ (~130 MB at q8, ~500 MB at fp32) on first run and keeps the model resident for the lifetime of the process. Previously the model only downloaded on the first Grep call — which meant every new install paid a 30-90 s latency on the first semantic retrieval. Doing it here instead hides that cold-start behind the async SessionStart (120 s timeout), so the user only sees it if they happen to fire a Grep before the async hook finishes the download. Everyone else gets an already-loaded daemon on first use. Behaviour is opt-out via HIVEMIND_EMBED_WARMUP=false for sessions that will never touch the memory path (CI, lightweight CC runs with no network), which logs the skip and moves on. warmup() swallows errors so a broken daemon path never breaks SessionStart. Tests: - session-start-setup-hook.test.ts: mocks EmbedClient so warmup() doesn't actually spawn a process; four new cases cover the ok / failed / threw / env-disabled branches - session-start-setup-branches.test.ts: same mock so the existing branch-coverage suite stays deterministic - grep-direct.test.ts: mocks EmbedClient.embed to always return null. Without this, grep-direct.test.ts was race-flaky — if any other test or prior run had spawned the daemon, the semantic branch in handleGrepDirect would fire and change the output shape, breaking every line-oriented assertion in this file. With the mock the lexical refine path runs deterministically regardless of whether a daemon is up outside the test process. Coverage: src/hooks/session-start-setup.ts → 100/100/100/100. All per-file thresholds still pass. 1108 tests green.
Contributor
Coverage ReportScope: files changed in this PR. Enforced threshold: 90% per metric (per file via
File Coverage — 17 files changed
Generated for commit 9d37091. |
…ema auto-migrate The existing opt-out story was scattered across three independent flags: HIVEMIND_SEMANTIC_SEARCH=false (query-time), HIVEMIND_EMBED_WARMUP=false (session-start spawn), and HIVEMIND_CAPTURE=false (write path — but that takes out capture entirely, not just the embed call inside it). There was no single lever to say "I want the plugin without the embedding feature at all, don't spawn the daemon, don't download the model". Adds one: HIVEMIND_EMBEDDINGS=false short-circuits every call site that would otherwise talk to the nomic daemon — - src/hooks/grep-direct.ts (query-time embed for Grep tool) - src/shell/grep-interceptor.ts (query-time embed for bash grep) - src/hooks/capture.ts (write-time embed before INSERT) - src/shell/deeplake-fs.ts (batched write-time embed in _doFlush) - src/hooks/session-start-setup.ts (SessionStart daemon warmup) The two per-feature flags keep working; HIVEMIND_EMBEDDINGS=false is the superset that kills all of them. Writes still succeed — the embedding columns land as NULL — so toggling the flag is reversible without rewriting existing rows. Schema migration --------------- Paired with this: ensureTable and ensureSessionsTable now issue ALTER TABLE ... ADD COLUMN IF NOT EXISTS for summary_embedding / message_embedding on tables that existed before the embeddings feature shipped. Wrapped in try/catch so backends that don't support ADD COLUMN IF NOT EXISTS (older Deeplake snapshots) log the skip and carry on — the write path already tolerates the column being absent. Users upgrading from 0.6.x pick the column up automatically on their next SessionStart without having to re-ingest. Tests ----- - claude-code/tests/embeddings-disable.test.ts: unit test for the embeddingsDisabled() helper (default false, "false" → true, other strings stay false) - session-start-setup-hook.test.ts: new case for the master flag (alongside the existing HIVEMIND_EMBED_WARMUP case) - deeplake-api.test.ts: rewrote the "table already exists" / "lookup-index already set up" cases to expect the new ALTER calls, plus a dedicated assertion that ALTER failures are swallowed so older backends keep working All 1 113 tests pass. Per-file coverage thresholds unchanged.
uploadSummary() was the last write path into the memory table that left summary_embedding = NULL. The DeeplakeFs-backed flush already embedded every row it touched, capture.ts already embedded every message, but the wiki-worker's final summary — the long, purpose-built wiki-style text that actually ought to be semantically retrievable — was going to Deeplake with no embedding at all. As a result summaries were only reachable from the lexical branch of the hybrid grep, never from the cosine branch. Changes: - `uploadSummary()` now takes an optional `embedding: number[] | null` on UploadParams and threads it into both the UPDATE and the INSERT, serialized through `embeddingSqlLiteral()` so the literal is either `ARRAY[...]::float4[]` or bare SQL `NULL`. The column is kept in the same statement as `summary` / `description` (the single-UPDATE invariant from the module docstring still holds — see `deeplake-update-bug-repro.py`). - Both `src/hooks/wiki-worker.ts` and `src/hooks/codex/wiki-worker.ts` call EmbedClient.embed(text, "document") right before uploadSummary, gated by `embeddingsDisabled()` and wrapped in try/catch. On any failure (daemon down, `HIVEMIND_EMBEDDINGS=false`, spawn fails) the summary still lands, just with NULL in the embedding column — so existing callers keep working and the row stays reachable via the lexical branch. Retrieval already uses it: `searchDeeplakeTables` in grep-core already joins memory.summary_embedding against the query vector when one is present, gated by `WHERE summary_embedding IS NOT NULL`. No changes needed there. Existing pre-embedding summaries (older rows) still have NULL in the column. They stay retrievable lexically; a one-shot back-fill script to compute embeddings for the existing backlog is left as a separate change so the first-principles write path lands cleanly here. Tests: - 5 new cases in upload-summary.test.ts covering ARRAY literal on UPDATE and INSERT, bare SQL NULL when the caller omits the embedding, explicit null, and the empty-array "daemon returned nothing" degenerate case. The existing "single UPDATE invariant" assertions still pass — summary, summary_embedding, size_bytes and description are all in the same statement. - wiki-worker.test.ts and codex-wiki-worker.test.ts now mock EmbedClient so the EmbedClient import doesn't try to reach a real socket during unit tests; the mock returns a fixed vector and the existing uploadSummary-call assertions pass unchanged. 1 118 tests green.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds local semantic embeddings to the plugin's memory + session stores and to
the grep path, backed by a per-user Unix-socket daemon that holds the model
in RAM. Achieves parity with the LoCoMo baseline on the canonical 100-QA
subset (J-score 0.735 vs 0.750, within the ±0.05 Haiku noise band) while
reducing per-query cost by ~25% and output tokens by ~41%.
nomic-ai/nomic-embed-text-v1.5, q8 quantization, 768 dims, ~110 MB on disk, ~15 ms/call on CPU (measured inbench-embeddings/, see PR-NOTES)./tmp/hivemind-embed-<uid>.sock, O_EXCL pidfile lock so concurrent hooks (Claude Code + Codex) don't spawn duplicates, idle-timeout shutdown (default 15 min) to free ~200 MB RAM when nothing's running.FLOAT4[]columns —memory.summary_embeddingandsessions.message_embedding.ARRAY[...]::float4[]literals written inline in the existing INSERT/UPDATE paths;NULLwhen the embedding call misses (we never block the write).1.0for lexical matches and dedup bypath. When the daemon is down or the pattern is regex-heavy (>1metachar) or too short (<2chars), falls back transparently to the pure-lexical path. Retries with lexical-only when semantic returns zero rows.Why this shape
The first iteration embedded summaries inline in hooks. That added ~600 ms cold-start per tool call — a non-starter. The daemon decouples model load from request latency: first call pays cold start, every subsequent call is ~15 ms round-trip including IPC.
Several approaches were tried and rejected before landing on hybrid LIKE+semantic — see commit 7b51043 and the PR-NOTES investigation log for the full diligence. Summary:
What did move the needle, all landed here:
(date_time) [Dx:y] speaker: text) — biggest single jump (+0.050 J on 50 QA) because the existing grep line filter was stripping the standalonedate:header row.ILIKE), withHIVEMIND_GREP_LIKE=case-sensitiveescape hatch.index.mdsplit into## memory+## sessionssections so Claude sees both stores.Benchmark — LoCoMo canonical 100 QA subset (45+55)
Table does not existfrom the backend).Commits
Architecture at a glance
@huggingface/transformers— it's markedexternalin esbuild and only loaded inside the daemon bundle.embed()returnsnull, the write proceeds withNULLin the embedding column, and a background fire-and-forget spawn warms up the daemon for the next call.HIVEMIND_SEMANTIC_SEARCH=falseenv flag disables the semantic branch entirely;HIVEMIND_SEMANTIC_EMIT_ALL=falsefalls back to strict regex refinement of emitted lines.Safety / opt-outs
HIVEMIND_SEMANTIC_SEARCHfalseto force pure-lexical grep.HIVEMIND_SEMANTIC_EMIT_ALLfalseto re-enable regex refinement over emitted lines.HIVEMIND_GREP_LIKEilikecase-sensitiveto useLIKEinstead ofILIKE.HIVEMIND_SEMANTIC_EMBED_TIMEOUT_MS500HIVEMIND_SEMANTIC_LIMIT40HIVEMIND_EMBED_IDLE_MS900000HIVEMIND_EMBED_DIMS768HIVEMIND_EMBED_DAEMONHIVEMIND_EMBED_WARMUPfalseto skip the SessionStart daemon warm-up (benchmarks, CI, no-network runs).HIVEMIND_AUTOUPDATE(viacreds.autoupdate)creds.autoupdate=false(node auth-login.js autoupdate off) to skip theclaude plugin updatesubprocess during SessionStart under concurrent load.Tests + coverage
1 108 tests pass (933 in the original push + 171 from the
mainmerge + 4 new warmup tests). Per-file coverage on all touched / new files is at or above the 90 % bar for statements and lines:Per-file thresholds are added in
vitest.config.ts. Branches and functions ondaemon.tsandclient.tsare allowed to dip slightly because a handful of paths (SIGINT/SIGTERM handlers, non-Linuxtypeof process.getuid !== "function"fallback, the servererrorhandler) can't be triggered from unit tests without forking a real subprocess; the invokedDirectly CLI block is marked/* v8 ignore */for the same reason.New test files:
claude-code/tests/embeddings-daemon.test.ts— ping / embed / unknown op / pidfile / stale-socket / idle-timeout / malformed JSON / dispatch error / empty lines / abrupt disconnect.claude-code/tests/embeddings-nomic.test.ts— lazy load, prefixing, batching, Matryoshka, zero-norm, concurrent-load coalescing.embeddings-client.test.ts,grep-interceptor.test.ts,grep-core.test.ts.Test plan
npm test— 1 108 tests green.npm test -- --coverage— no threshold failures.ls /tmp/hivemind-embed-*.sockafter a session; should be gone once idle timeout hits).@huggingface/transformersinstalled — hooks should transparently writeNULLto the embedding column.Updates since the initial push
This branch also picked up fixes uncovered while benchmarking on LoCoMo
overnight, plus the
fix/plugin-autoupdate-session-safetywork thatlanded on
mainafter this PR was opened. All new test files frommainare green; no existing assertions were loosened.Merged
main→ this branch (commit0f634c7) — brings insnapshot/restorearoundclaude plugin update, the SessionEnd GChook (
plugin-cache-gc.js),plugin-cachehelper, and themultiWordPatternslexical fallback ingrep-core.ts. Kept side-by-side with
bm25TermonSearchOptions; they target different failuremodes.
New commits beyond the initial 8:
Notable behavioural changes on top of the initial description:
0c3a94d):SUM(size_bytes) GROUP BY pathreturns NULL on the Deeplake backend against workspace
with_embedding, even when each row'ssize_bytesis a positiveinteger. The sessions bootstrap used that aggregation, so every file
showed
Size: 0inls/stat, which made exploratory agentsconclude the memory was empty and give up.
MAX(size_bytes)sidestepsthe quirk; for the single-row-per-file layout used in
with_embeddingit's equal to SUM.6b6cf26): 10 s → 60 s. Under 20-wayconcurrency the cold-start of a fresh Node subprocess plus the
bootstrap SQL query can exceed 10 s, which Claude Code treats as a
cancel and silently falls back to the original (unintercepted) tool
call.
c36bac0+11457e1): a promptchange that told agents to prefer the native Grep tool surfaced a
latent bug — the hook's
updatedInput: {command, description}shape(Bash-tool schema) isn't accepted by Claude Code ≥ 2.1.117 as a
substitute for Grep's
{pattern, path, …}schema, so Claude fellback to native Grep against the virtual memory path and failed with
Path does not exist. Reverted the prompt; bash-grep via the virtualshell intercept remains the supported path. Documented as P11 in
PR-NOTES for a follow-up PR.
3979d09): previously thenomic model download (~110 MB q8) was paid on the first semantic
grep; the async
session-start-setuphook now callsEmbedClient.warmup(), which spawns the daemon and firesNomicEmbedder.load()in the background. First-Greplatency dropsfrom 30–90 s to ~15 ms on a cold install; opt-out via
HIVEMIND_EMBED_WARMUP=false.Benchmark — updated with variance (same
subset_combined_100):Re-ran 10 + 1 bench configs overnight on the same 100-QA subset. Haiku
per-run stdev measured at ~5 pp on 50-QA slices; single-point comparisons
at 50-QA scale have an IC95 of roughly ±10 pp. The "0.735 vs 0.750 on
100 QA" number in the original table is still the best available
apples-to-apples single-run result (
plugin-100-REVERT-*atJ = 0.73, baseline at J = 0.75); on the 50-QA halves:
date_50remaining_50Full run-by-run breakdown is in
TODO.md/PR-NOTES.mdalongside thescoreboard of every ablation tested tonight.
Version bump:
package.jsongoes from 0.6.38 to 0.7.0 (minorbump for the embeddings feature). When this PR lands on
main, theexisting
release.ymldoes a patch bump → the first release tag willbe
v0.7.1.