Semantic memory and code intelligence as an MCP plugin for Claude Code agents. 9 tools that give Claude persistent memory, semantic code search, import graph traversal, and symbol-level navigation — all running locally.
| Tool | Description |
|---|---|
recall(query) |
Semantic search across stored memories |
remember(content) |
Store memory with type / scope / tags / importance |
search_code(query) |
Hybrid RAG over indexed codebase (4 modes, reranker, name filter) |
find_usages(symbol_id) |
Find callers/references of a symbol (lexical + semantic, self-excluded) |
get_file_context(file_path) |
Read file + list indexed symbols with UUIDs for find_usages |
get_dependencies(file_path) |
Import graph traversal (forward / reverse / transitive) |
project_overview() |
3-level directory tree, entry points, top imports |
forget(memory_id) |
Delete a memory permanently |
stats() |
Memory and index statistics |
- Qdrant — vector database (Rust, production-ready)
- Ollama — local embeddings (
embeddinggemma:300m) - tree-sitter — multi-language code parser (TypeScript, JavaScript, Go, Rust)
- MCP — Model Context Protocol (stdio transport)
Install: https://ollama.com/download
# Linux
curl -fsSL https://ollama.com/install.sh | sh
# macOS — download the app from:
# https://ollama.com/download/mac
# Windows — download the installer from:
# https://ollama.com/download/windowsPull the embedding model:
ollama pull embeddinggemma:300mOption A — Docker Compose (recommended)
A ready-to-use docker-compose.yml is included in this repo:
docker compose up -dExposes port 6333 (REST) and 6334 (gRPC). Data persists in a named volume qdrant-data.
Option B — Docker run
docker run -d --name qdrant \
-p 6333:6333 -p 6334:6334 \
-v qdrant-data:/qdrant/storage \
qdrant/qdrantOption C — Qdrant Cloud
https://cloud.qdrant.io/ — set qdrant-url in .memory.json to your cluster endpoint.
From npm (recommended):
npm install -g @13w/local-ragFrom source:
git clone https://github.com/13W/local-rag.git
cd local-rag
npm install && npm run buildOption A — claude mcp add with npx (no global install needed)
Per-project (stored in .mcp.json, shared with the team):
claude mcp add memory -- npx -y @13w/local-rag serve --config .memory.jsonGlobal — available in all projects on this machine:
claude mcp add memory -s user -- npx -y @13w/local-rag serve --config .memory.jsonOption B — .mcp.json directly
{
"mcpServers": {
"memory": {
"type": "stdio",
"command": "npx",
"args": ["-y", "@13w/local-rag", "serve", "--config", ".memory.json"]
}
}
}Option C — After global npm install -g
claude mcp add memory -- local-rag serve --config .memory.jsonRun init once in your project root after registering the MCP plugin.
It configures hooks and registers the MCP server. All protocol guidance is delivered via MCP server instructions on handshake — no files are written into .claude/rules/.
npx @13w/local-rag init
# If installed globally
local-rag initOutput:
configured .claude/settings.local.json
What each file does:
| File | Purpose |
|---|---|
settings.local.json |
Registers the MCP server and hooks in Claude Code |
Commit .claude/settings.json to share the MCP server configuration with your team.
Create .memory.json in your project root (auto-discovered if present):
{
"project-id": "my-project",
"project-root": ".",
"qdrant-url": "http://localhost:6333",
"embed-provider": "ollama",
"embed-model": "embeddinggemma:300m",
"ollama-url": "http://localhost:11434"
}| Key | Default | Description |
|---|---|---|
project-id |
"default" |
Isolates memories and code index per project |
project-root |
config file directory | Root path for code indexing |
qdrant-url |
http://localhost:6333 |
Qdrant REST API URL |
embed-provider |
"ollama" |
Embedding provider: ollama, openai, voyage |
embed-model |
provider default¹ | Embedding model name |
embed-dim |
1024 |
Embedding vector dimension |
embed-api-key |
"" |
API key for OpenAI / Voyage embed providers — falls back to OPENAI_API_KEY / VOYAGE_API_KEY env var |
embed-url |
"" |
Custom embedding API endpoint |
ollama-url |
http://localhost:11434 |
Ollama API URL |
agent-id |
"default" |
Agent identifier (for multi-agent setups) |
llm-provider |
"ollama" |
LLM provider: ollama, anthropic, openai |
llm-model |
provider default² | LLM model for reranking / description generation |
llm-api-key |
"" |
API key for Anthropic / OpenAI LLM providers — falls back to ANTHROPIC_API_KEY / OPENAI_API_KEY env var |
llm-url |
"" |
Custom LLM API endpoint |
include-paths |
[] |
Glob patterns to limit indexing scope (monorepos) |
generate-descriptions |
false |
Auto-generate LLM descriptions for code chunks (slow) |
dashboard |
true |
Enable the live dashboard HTTP server |
dashboard-port |
0 |
Dashboard HTTP port; 0 lets the OS pick a random port |
collection-prefix |
"" |
String prepended to all Qdrant collection names (useful on shared Qdrant instances) |
no-watch |
false |
Disable automatic file re-indexing when files change (applies during serve) |
¹
embed-modeldefaults:ollama→embeddinggemma:300m,openai→text-embedding-3-small,voyage→voyage-code-3²
llm-modeldefaults:ollama→gemma3n:e2b,anthropic→claude-haiku-4-5-20251001,openai→gpt-4o-miniResolution order (highest to lowest priority): CLI flag →
.memory.jsonvalue → environment variable → built-in default.API key environment variables are provider-specific:
Provider embed-api-keyenv varllm-api-keyenv varopenaiOPENAI_API_KEYOPENAI_API_KEYvoyageVOYAGE_API_KEY— anthropic— ANTHROPIC_API_KEYAll other keys can also be passed as CLI flags (e.g.
--project-id foo). CLI flags override config file values.include-pathsis config-file only.
search_code supports four modes via the search_mode parameter:
| Mode | Description |
|---|---|
hybrid (default) |
3-way RRF fusion: code vector + description vector + lexical text leg |
code |
Code vector only — exact structural similarity |
semantic |
Description vector only — conceptual search when you don't know the name |
lexical |
Text index filter — only chunks where query terms literally appear in name or content |
After vector retrieval, an optional cross-encoder pass (Xenova/bge-reranker-base) re-scores and reorders results for higher precision:
search_code("embedOne", rerank=true, rerank_k=50, top=5)
# Fetches 50 ANN candidates, scores all 50 with the cross-encoder, returns top 5
| Parameter | Default | Description |
|---|---|---|
rerank |
false |
Enable cross-encoder reranking |
rerank_k |
50 |
ANN candidates to fetch before reranking |
top |
limit |
Results to return after reranking |
search_code("embed vector", name_pattern="embed")
# Only returns chunks whose name contains "embed" (prefix-tokenized index)
Every symbol UUID surface (search_code, get_file_context) feeds directly into the two symbol tools:
# From search
search_code("parse imports typescript")
# → id: abc-123-... file: src/parser.ts name: extractImports
# From file listing
get_file_context("src/parser.ts")
# → function extractImports (lines 248–264) id: abc-123-...
# Find all callers / references
find_usages("abc-123-...", limit=20)
# Returns [lexical] hits (literal name match) + [semantic] hits (conceptual match), self-excluded
Before search_code and get_file_context tools return results, index the project:
# Index once
npx @13w/local-rag index . --config .memory.json
# Watch mode — re-indexes on file changes
npx @13w/local-rag watch . --config .memory.json
# If installed globally
local-rag index . --config .memory.json
local-rag watch . --config .memory.jsonOther indexer commands:
local-rag clear --config .memory.json # remove all indexed chunks
local-rag stats --config .memory.json # show collection statistics
local-rag file <abs-path> <root> # index a single file
local-rag repair . --config .memory.json # fix empty symbol names (payload-only, no re-embedding)repair is useful after updating to a version with improved parser extraction logic: it patches only the name field for affected chunks without regenerating embeddings or descriptions.
local-rag serve automatically opens a browser dashboard on a local HTTP port.
It displays real-time tool call statistics (calls, bytes, latency, errors per tool),
a scrolling request log, a server info bar (project, branch, version, watch status),
and an interactive tool playground for testing calls manually.
The port is OS-assigned by default (printed to stderr as [dashboard] http://localhost:PORT).
To use a fixed port or disable the dashboard:
{ "dashboard-port": 4242 }
{ "dashboard": false }| Type | Use for | Decay |
|---|---|---|
episodic |
Events, bugs, incidents | Time-decayed |
semantic |
Facts, architecture, decisions | Long-lived |
procedural |
Patterns, conventions, how-to | Long-lived |
Run local-rag init (see Agent workflow setup) to install the full
RECALL → SEARCH_CODE → THINK → ACT → REMEMBER protocol into your project.
The hooks fire automatically — no manual prompting required.