TanStack · AlemTuzlak · May 10, 2026 · May 10, 2026 · May 10, 2026 · May 10, 2026
diff --git a/.changeset/memory-middleware.md b/.changeset/memory-middleware.md
@@ -0,0 +1,24 @@
+---
+'@tanstack/ai': minor
+'@tanstack/ai-event-client': minor
+'@tanstack/ai-memory': minor
+---
+
+**Add server-side memory support via `memoryMiddleware`.**
+
+A new `memoryMiddleware` from `@tanstack/ai/memory` retrieves relevant memories at chat init and persists user/assistant turns + tool results at finish. The middleware injects a rendered system prompt before the model call and runs persistence via `ctx.defer` so streaming is never blocked.
+
+`@tanstack/ai`:
+
+- New subpath `@tanstack/ai/memory` exporting `memoryMiddleware`, the `MemoryAdapter` / `MemoryRecord` / `MemoryScope` types, the `MemoryOp` union, helpers (`scopeMatches`, `cosine`, `lexicalOverlap`, `recencyScore`, `defaultRenderMemory`, `defaultScoreHit`, `isExpired`).
+- Middleware extension hooks: `shouldRetrieve`, `rerank`, `shouldRemember`, `extractMemories`, `onToolResult`, `afterPersist`, plus app-level `events.*` callbacks and a `strict` mode.
+
+`@tanstack/ai-event-client`:
+
+- Five new events on `AIDevtoolsEventMap`: `memory:retrieve:started`, `memory:retrieve:completed`, `memory:persist:started`, `memory:persist:completed`, `memory:error`.
+
+`@tanstack/ai-memory` (new package):
+
+- `inMemoryMemoryAdapter()` — zero-dep adapter for dev/tests.
+- `redisMemoryAdapter({ redis, prefix? })` — production adapter for plain Redis (`redis` listed as optional peer dependency).
+- Both adapters pass a shared contract suite covering scope isolation, expiry, cursor pagination, kinds filtering, lexical-only ranking, semantic ranking with embeddings, and serialization round-trip (Redis).
diff --git a/docs/config.json b/docs/config.json
@@ -164,6 +164,24 @@
         }
       ]
     },
+    {
+      "label": "Middlewares",
+      "children": [
+        {
+          "label": "Memory",
+          "to": "middlewares/memory"
+        }
+      ]
+    },
+    {
+      "label": "Guides",
+      "children": [
+        {
+          "label": "Memory Quickstart",
+          "to": "guides/memory-quickstart"
+        }
+      ]
+    },
     {
       "label": "Advanced",
       "children": [

diff --git a/docs/guides/memory-quickstart.md b/docs/guides/memory-quickstart.md
@@ -0,0 +1,136 @@
+---
+title: Memory Quickstart
+id: memory-quickstart
+order: 1
+description: "Add cross-session memory to a TanStack AI chat() call in five steps — install the package, pick an adapter, wire memoryMiddleware, optionally add an embedder, and derive scope server-side."
+keywords:
+  - tanstack ai
+  - memory
+  - quickstart
+  - in-memory adapter
+  - redis adapter
+  - chat middleware
+---
+
+You have a working `chat()` call and you want it to remember context across turns or sessions. By the end of this guide, you'll have `memoryMiddleware` retrieving relevant records into the prompt and persisting new turns through a real adapter, with scope derived safely from your server-validated session.
+
+> **Want the full contract first?** See the [Memory Middleware](../middlewares/memory) concept page for the adapter interface, hooks, and devtools events.
+
+## Step 1 — Install the package
+
+`@tanstack/ai` is already installed. Add the adapter package:
+
+```bash
+pnpm add @tanstack/ai-memory
+```
+
+`@tanstack/ai-memory` exports the built-in `inMemoryMemoryAdapter` and `redisMemoryAdapter`. The middleware itself (`memoryMiddleware`) and the type contract (`MemoryAdapter`, `MemoryScope`, `MemoryRecord`, ...) live on the `@tanstack/ai/memory` subpath of the core package — no extra install required for those.
+
+## Step 2 — Pick an adapter
+
+> **In-memory** — `inMemoryMemoryAdapter()` is zero-dependency and stores records in a `Map`. Use it for local development, Vitest / Playwright tests, and single-process demos. Records vanish on process restart.
+
+> **Redis** — `redisMemoryAdapter({ redis })` persists across restarts and shares state across processes. Use it for production. Bring your own Redis client (`ioredis`, `redis`, Upstash, ...) — the adapter is BYO-client.
+
+Custom adapters implement the `MemoryAdapter` interface from `@tanstack/ai/memory`.
+
+## Step 3 — Wire `memoryMiddleware` into `chat()`
+
+Start with the in-memory adapter — it's the fastest path to a working setup:
+
+```ts
+import { chat } from '@tanstack/ai'
+import { openaiText } from '@tanstack/ai-openai'
+import { memoryMiddleware } from '@tanstack/ai/memory'
+import { inMemoryMemoryAdapter } from '@tanstack/ai-memory'
+
+const memory = inMemoryMemoryAdapter()
+
+const stream = chat({
+  adapter: openaiText('gpt-4o'),
+  messages,
+  middleware: [
+    memoryMiddleware({
+      adapter: memory,
+      scope: { tenantId: 'demo', userId: 'alice' },
+    }),
+  ],
+})
+```
+
+That's a working setup. Each turn, the middleware retrieves relevant records into the system prompt (lexical search by default), then deferred-persists the user message and the assistant response after the stream finishes.
+
+When you're ready to ship, swap the adapter and keep everything else the same:
+
+```ts
+import Redis from 'ioredis'
+import { redisMemoryAdapter } from '@tanstack/ai-memory'
+
+const redis = new Redis(process.env.REDIS_URL!)
+const memory = redisMemoryAdapter({ redis })
+
+memoryMiddleware({ adapter: memory, scope })
+```
+
+## Step 4 — Add an embedder (optional)
+
+The middleware accepts an `embedder` for semantic search. **Add one when you need it; skip it when you don't:**
+
+- **Skip** if your scopes are small (a few hundred records per user) — lexical scoring handles this fine and there is no embedding cost or latency.
+- **Add** when scopes grow large or queries don't share keywords with stored records, and your adapter supports vector search (Redis with vector ops, hosted vector DBs, custom adapters).
+
+```ts
+import { memoryMiddleware } from '@tanstack/ai/memory'
+
+memoryMiddleware({
+  adapter: memory,
+  scope,
+  embedder: {
+    async embed(text) {
+      // Use any embedding model — OpenAI, Cohere, a local model, etc.
+      const result = await embeddings.create({ input: text })
+      return result.data[0].embedding
+    },
+  },
+})
+```
+
+The embedder is invoked on the retrieval path (to embed the query) and may be invoked again on the persist path (to embed assistant text or extracted facts). Implementations should be idempotent.
+
+## Step 5 — Derive scope server-side
+
+`scope` is the isolation boundary. Static scopes are fine for fixtures, but in any real multi-tenant app you must derive scope per request from server-validated session data — never from the request body.
+
+```ts
+import { chat } from '@tanstack/ai'
+import { memoryMiddleware } from '@tanstack/ai/memory'
+
+type AppCtx = { session: { tenantId: string; userId: string; activeThreadId: string } }
+
+const stream = chat({
+  adapter: openaiText('gpt-4o'),
+  messages,
+  context: { session }, // attached by your auth middleware, not from req.body
+  middleware: [
+    memoryMiddleware({
+      adapter: memory,
+      scope: (ctx) => {
+        const { session } = ctx.context as AppCtx
+        return {
+          tenantId: session.tenantId,
+          userId: session.userId,
+          threadId: session.activeThreadId,
+        }
+      },
+    }),
+  ],
+})
+```
+
+If you accept `userId` or `tenantId` from the client, one user can read or overwrite another user's memory. The function form on `scope` is the safer default — it executes per request and only sees what your server attached to the chat context.
+
+## Where to go next
+
+- [Memory Middleware](../middlewares/memory) — adapter contract, hooks reference, devtools events, failure modes
+- [In-memory adapter skill](https://github.com/TanStack/ai) — `tanstack-ai-memory-in-memory` (when to use, capacity limits)
+- [Redis adapter skill](https://github.com/TanStack/ai) — `tanstack-ai-memory-redis` (vector search, key layout, ops)
- [In-memory adapter skill](https://github.com/TanStack/ai) — `tanstack-ai-memory-in-memory` (when to use, capacity limits)
- [Redis adapter skill](https://github.com/TanStack/ai) — `tanstack-ai-memory-redis` (vector search, key layout, ops)
+- [In-memory adapter skill](https://github.com/TanStack/ai/blob/main/packages/typescript/ai-memory/skills/tanstack-ai-memory-in-memory/SKILL.md) — `tanstack-ai-memory-in-memory` (when to use, capacity limits)
+- [Redis adapter skill](https://github.com/TanStack/ai/blob/main/packages/typescript/ai-memory/skills/tanstack-ai-memory-redis/SKILL.md) — `tanstack-ai-memory-redis` (vector search, key layout, ops)
- [In-memory adapter skill](https://github.com/TanStack/ai) — `tanstack-ai-memory-in-memory` (when to use, capacity limits)
- [Redis adapter skill](https://github.com/TanStack/ai) — `tanstack-ai-memory-redis` (vector search, key layout, ops)
+- [In-memory adapter skill](https://github.com/TanStack/ai/blob/main/packages/typescript/ai-memory/skills/tanstack-ai-memory-in-memory/SKILL.md) — `tanstack-ai-memory-in-memory` (when to use, capacity limits)
+- [Redis adapter skill](https://github.com/TanStack/ai/blob/main/packages/typescript/ai-memory/skills/tanstack-ai-memory-redis/SKILL.md) — `tanstack-ai-memory-redis` (vector search, key layout, ops)
diff --git a/docs/middlewares/memory.md b/docs/middlewares/memory.md
@@ -0,0 +1,170 @@
+---
+title: Memory Middleware
+id: memory-middleware
+order: 1
+description: "Persist and recall context across turns and sessions in TanStack AI — the memoryMiddleware retrieves relevant records into the prompt, then deferred-persists user, assistant, and tool turns through a pluggable adapter."
+keywords:
+  - tanstack ai
+  - memory
+  - long-term memory
+  - retrieval
+  - persistence
+  - middleware
+  - rag
+  - personalization
+---
+
+`memoryMiddleware` plugs server-side memory into a `chat()` run. It retrieves relevant records from a pluggable adapter into the system prompt before the model runs, then asynchronously persists what should be remembered after the run finishes. It is the right tool when you need recall **across turns or across sessions** — not for keeping recent messages in the same request.
+
+> **Want a copy-paste setup before reading the contract?** See the [Memory Quickstart](../guides/memory-quickstart) guide.
+
+## When to reach for it
+
+| Need | Use this |
+|------|----------|
+| "Remember what the user told me last week" | Memory middleware + persistent adapter |
+| "Each tenant or user has its own context" | Memory middleware with scoped adapter calls |
+| "Cache expensive tool results across requests" | Memory middleware with `onToolResult` + `kind: 'tool-result'` |
+| Keep the last N turns in the same request | Just pass them in `messages` — memory is overkill |
+
+Memory is for cross-turn / cross-session recall. The `messages` array on `chat()` already covers within-turn history.
+
+## Adapter contract
+
+Adapters are thin storage. They persist, fetch, search, and isolate by scope — they do not decide what to remember or how to render hits. Every backend implements the same seven methods:
+
+| Method | Purpose |
+|--------|---------|
+| `name` | Stable identifier used in logs and devtools. |
+| `add(records)` | Upsert one or many records by `id`. Same id replaces. |
+| `get(id, scope)` | Fetch a single record. Returns `undefined` for missing, out-of-scope, or expired records. |
+| `update(id, scope, patch)` | Patch a record in place. Preserves `id`/`scope`/`createdAt`, bumps `updatedAt`. |
+| `search(query)` | Relevance-ranked search within a scope. Strategy (lexical / semantic / hybrid) is adapter-defined. |
+| `list(scope, options)` | Non-relevance browsing — for inspectors, admin tools, exports. |
+| `delete(ids, scope)` | Remove ids within a scope. Out-of-scope ids are silently skipped. |
+| `clear(scope)` | Wipe everything matching a scope. Empty scope (`{}`) is treated as misuse. |
+
+Three invariants every adapter MUST uphold: **scope isolation** (no cross-scope reads or writes), **expiry filtering** (`expiresAt` records are excluded from reads), and **id uniqueness** across all scopes.
+
+Built-in adapters live in `@tanstack/ai-memory`:
+
+```ts
+import { inMemoryMemoryAdapter, redisMemoryAdapter } from '@tanstack/ai-memory'
+```
+
+Custom adapters implement `MemoryAdapter` from `@tanstack/ai/memory`.
+
+## Scope and security
+
+`MemoryScope` is the isolation boundary. Every key is optional and orthogonal — the adapter rejects cross-scope reads and writes:
+
+```ts
+import type { MemoryScope } from '@tanstack/ai/memory'
+
+type MemoryScope = {
+  tenantId?: string
+  userId?: string
+  sessionId?: string
+  threadId?: string
+  namespace?: string
+}
+```
+
+**Always derive scope server-side from trusted state.** Accepting `tenantId` or `userId` from the request body is how one user reads another user's memory. The function form on `scope` is the recommended pattern — it runs per request and has access to the validated chat context:
+
+```ts
+memoryMiddleware({
+  adapter,
+  scope: (ctx) => {
+    const session = (ctx.context as AppCtx).session // server-validated
+    return {
+      tenantId: session.tenantId,
+      userId: session.userId,
+      threadId: session.activeThreadId,
+    }
+  },
+})
+```
+
+Pass the validated session through `chat({ context: { session } })`. The static form (`scope: { tenantId: 'acme' }`) is fine for single-tenant or test fixtures, but the function form is safer in any multi-tenant deployment.
+
+## Retrieval flow
+
+Retrieval runs once per `chat()` invocation, during the `init` phase:
+
+1. `shouldRetrieve({ userText, scope })` — optional gate. Return `false` to skip retrieval entirely for this turn.
+2. `adapter.search({ scope, text, embedding?, topK, minScore, kinds })` — the adapter decides whether to use the embedding (semantic), the text (lexical), or both (hybrid).
+3. `rerank(hits, { scope, query, ctx })` — optional re-rank between search and render. Plug in MMR, RRF, or a cross-encoder.
+4. `render(hits)` — formats the final hit set into a string injected into the prompt. Defaults to `defaultRenderMemory`.
+
+An `embedder` is **optional**. Adapters that support semantic search (Redis with vector ops, hosted vector DBs) need one; lexical-only setups don't.
+
+## Persistence flow
+
+Persistence is **deferred** via `ctx.defer` — it runs after the chat stream finishes and never blocks the response:
+
+1. `shouldRemember({ message, responseText })` — optional gate on whether to write at all this turn.
+2. The middleware persists user and assistant turns as `kind: 'message'`.
+3. `extractMemories({ userText, responseText, scope, adapter })` — return a `MemoryOp[]` (mixed add/update/delete) or `MemoryRecord[]` (treated as all-add) to capture facts, preferences, or summaries.
+4. For each completed tool call, `onToolResult({ toolName, toolCallId, args, result, scope, adapter })` — same return shape, typically used to persist results as `kind: 'tool-result'`.
+5. `afterPersist({ newRecords, scope, adapter })` — fires after `adapter.add` commits, with newly-added records (not updates or deletes).
+
+## Extension hooks
+
+| Hook | Phase | Use for |
+|------|-------|---------|
+| `shouldRetrieve` | before search | Skip retrieval for cheap turns or content-gated requests |
+| `rerank` | between search and render | MMR, RRF, recency boosts, cross-encoder rerankers |
+| `shouldRemember` | before persist | Drop short, sensitive, or transient messages |
+| `extractMemories` | after model finishes | Mem0-style consolidation — extract facts and preferences |
+| `onToolResult` | per completed tool call | Persist tool outputs as `kind: 'tool-result'` |
+| `afterPersist` | after `adapter.add` commits | Background work — summarisation, eviction, indexing |
+
+`extractMemories` and `onToolResult` may return `MemoryRecord[]` (shorthand: all-add) or `MemoryOp[]` (mixed `add` / `update` / `delete`).
+
+## Devtools events
+
+The middleware emits five events on `aiEventClient` (from `@tanstack/ai-event-client`):
+
+| Event | When |
+|-------|------|
+| `memory:retrieve:started` | Retrieval path begins (after `shouldRetrieve` returns true) |
+| `memory:retrieve:completed` | Final hit set is ready (post-rerank, pre-render) |
+| `memory:persist:started` | Persist path is about to call `adapter.add` |
+| `memory:persist:completed` | `adapter.add` succeeded |
+| `memory:error` | Retrieval, persistence, or extraction threw |
+
+Hits and records carry a 200-character `preview` only — full text is never streamed by default, so devtools never leak full memory contents.
+
+For application telemetry that should not depend on devtools being installed, use the `events.*` callbacks on `MemoryMiddlewareOptions` (`onRetrieveStart`, `onRetrieveEnd`, `onPersistStart`, `onPersistEnd`, `onError`).
+
+## Failure modes
+
+By default `strict: false` — retrieval and persistence failures emit `memory:error` (and call `events.onError`), but the chat run continues with degraded memory. Set `strict: true` when memory correctness is more important than uptime, for example in compliance-sensitive deployments or in tests where a missed write is worse than a failed turn.
+
+## TypeScript types
+
+```ts
+import type {
+  MemoryAdapter,
+  MemoryRecord,
+  MemoryRecordPatch,
+  MemoryScope,
+  MemoryQuery,
+  MemorySearchResult,
+  MemoryListOptions,
+  MemoryListResult,
+  MemoryHit,
+  MemoryKind,
+  MemoryRole,
+  MemoryEmbedder,
+  MemoryOp,
+  MemoryMiddlewareOptions,
+} from '@tanstack/ai/memory'
+```
+
+## Next steps
+
+- [Memory Quickstart](../guides/memory-quickstart) — wire the middleware into a real `chat()` call in five steps
+- [Middleware](../advanced/middleware) — the underlying `chat()` middleware lifecycle and hooks
+- [Observability](../advanced/observability) — subscribe to `memory:*` events for tracing
diff --git a/knip.json b/knip.json
@@ -37,6 +37,9 @@
     "packages/typescript/ai-client": {
       "ignoreDependencies": ["@standard-schema/spec"]
     },
+    "packages/typescript/ai-memory": {
+      "ignoreDependencies": ["redis"]
+    },
     "packages/typescript/ai-react-ui": {
       "ignoreDependencies": ["react-dom"]
     },