Skip to content

feat(codec): opt-in binary transport for /:uuid/mcp#287

Closed
wdunn001 wants to merge 10 commits intometatool-ai:mainfrom
wdunn001:feat/codec-binary-transport
Closed

feat(codec): opt-in binary transport for /:uuid/mcp#287
wdunn001 wants to merge 10 commits intometatool-ai:mainfrom
wdunn001:feat/codec-binary-transport

Conversation

@wdunn001
Copy link
Copy Markdown

@wdunn001 wdunn001 commented May 8, 2026

Summary

This PR ships Codec hooks into MetaMCP so the gateway can serve as the text/token boundary in a token-native inference stack. Inference engines emit raw token IDs end-to-end; ToolWatcher (in the gateway, the agent runtime, or any middleware) detects tool-call control IDs with a single 32-bit compare per token; detokenization runs once at the JSON-RPC seam to the underlying MCP server, and the result is tokenized back before the response leaves the gateway. The wire framing change (length-prefixed msgpack + optional gzip on /:uuid/mcp) is the foundation; the headline value is keeping the rest of the chain token-native. Tool-heavy MCP sessions ship dramatically smaller wire bytes — long tool-call results (file reads, web fetches, RAG snippets, model-generated content piped through tools) become length-prefixed msgpack frames with optional gzip on top, instead of newline-delimited JSON-RPC.

Fully backwards-compatible — Codec is opt-in per request via ?stream_format=msgpack (or protobuf) or an Accept: application/x-codec-msgpack header. When neither is set the route is byte-for-byte identical to today's MetaMCP. The SDK's StreamableHTTPServerTransport is unmodified.

Why

MCP is JSON-RPC. The envelope itself is small — bytes per call — and isn't worth optimizing on its own. The real wire weight in any non-trivial session lives in tool-call results: file reads, search results, RAG context, agent-generated text. JSON-RPC's per-character escape overhead compounds across long tool outputs and across multi-hop chains where the same text gets re-tokenized at each agent boundary.

Cross-stack benchmark numbers from the broader Codec ecosystem (codecai.net, MATRIX.md), measured at 2,048-token streamed responses on the same hardware:

Engine JSON-SSE Codec + gzip Reduction
sglang (PR #24483) 485 KB 354 B (dict-zstd) 1,404×
vllm (PR #41765) 479 KB 3.9 KB 126×
llama.cpp (PR #22757) 529 KB 16 KB 33×

TTFB stays within 1 ms of the JSON-RPC path on the same server. This PR brings the same physics to MetaMCP's tool-aggregation surface.

Negotiation

Three equivalent ways to opt in, in resolution order:

  1. ?stream_format=msgpack (or protobuf) on the URL
  2. Accept: application/x-codec-msgpack (or …-protobuf) header
  3. ?stream_format=json (or no negotiation at all) — JSON-RPC, identical to upstream

Accept-Encoding: gzip adds streaming compression on top. Brotli + dict-zstd land in a follow-up; gzip alone hits the bulk of the savings on the small JSON-RPC envelopes that dominate the MCP request side.

Implementation

Three new files under apps/backend/src/lib/metamcp/codec/:

File Purpose
codec-frame.ts Length-prefixed msgpack/protobuf framing. negotiateStreamFormat() from query + headers. Decoders for inbound request bodies. Same wire shape the @codecai/web and codecai decoders already speak end-to-end on the cross-stack matrix.
codec-compression.ts Accept-Encoding negotiation + streaming gzip Transform. Mirrors python/sglang/srt/entrypoints/codec_compression.py from the sglang Codec PR.
codec-transcode.ts Express req/res wrappers — decode inbound Codec body to a JS object so the SDK's existing JSON path sees a normal req.body; patch res.write/res.end so SDK JSON-RPC writes emit Codec frames through the negotiated compressor instead of newline-delimited JSON.

Single small change to apps/backend/src/routers/mcp-proxy/metamcp.ts:

  • Mount express.raw({ type: ["application/x-codec-msgpack", "application/x-codec-protobuf"], limit: "4mb" }) so the raw bytes survive long enough for the codec-transcode path to decode them.
  • In the POST /:uuid/mcp handler, run negotiation first. If a Codec format is selected: decode the request body, wrap res for outbound transcoding, then hand off to transport.handleRequest exactly as today.

Total diff: ~600 lines added (the three new files), ~50 lines edited (the route handler). The SDK is untouched.

v2 update — token-aware tool dispatch (now in this PR)

The v1 patch was wire-framing only: re-shape the JSON-RPC envelope as length-prefixed msgpack/gzip, no semantic changes. v2 (commits acef2cb + 552f02c) adds the actual headline value at the MetaMCP seam:

Tokens stay tokens through the chain. Detokenization runs ONCE at the only boundary that requires text — the JSON-RPC hop to the underlying MCP server.

Three properties fall out:

  1. The inference engine never detokenizes — emits Codec frames straight on the wire.
  2. ToolWatcher anywhere in the chain runs on raw uint32s (~100x faster than detokenize+regex — same number as the tool-call detection bench on the cross-stack matrix).
  3. The consumer (agent runtime, UI, next agent) decides when text is actually needed. Most chains never do.

Negotiation

Optional X-Codec-Map: <url>;sha256=<hash> header on any request. First reference loads + verifies the tokenizer dialect map (sha256-pinned, content-addressed); subsequent references hit a process-local LRU cache (32 maps max). Per-request: clients can switch vocabs by changing the header, no namespace config needed.

What gets transformed

  • Inbound tools/call args — when arguments carries a _codec_meta = {ids, map_id} block, the gateway runs Detokenizer(map).render(ids) to recover the JSON args text, then forwards a normal JSON-RPC envelope to the underlying MCP server. The MCP server never sees tokens.
  • Outbound CallToolResult.content — for each {type: "text", text: "..."} block, the gateway runs Tokenizer(map).encode(text) and appends a sibling {type: "_codec_meta", ids, map_id} block. Original text is preserved alongside, so non-Codec MCP clients on the same namespace still see something they can render. Codec-aware clients prefer the meta sibling and discard the text.

New files

apps/backend/src/lib/metamcp/codec/
├── codec-vocab.ts      sha256-cached tokenizer dialect map handles
│                       (Detokenizer always loaded; Tokenizer
│                       optional with graceful degrade — see note below)
└── codec-content.ts    detokenizeCodecArgs(request)
                        tokenizeContent(result, mapHash)
                        loadVocabFromHeader(headerValue)

Plus extensions to codec-transcode.ts (vocabHash threaded through wrapResponseForCodec; CallToolResult-shaped responses get content tokenization before msgpack-encoding) and the two route handlers (vocab-map header parsing + inline args detokenization before transport.handleRequest).

Live image: wdunn001/codec-metamcp:0.2.2

Smoke-tested with the qwen2 vocab map URL on the lab box:

HTTP/1.1 200 OK
content-type: application/x-codec-msgpack
content-encoding: gzip
mcp-session-id: <uuid>

(176 bytes msgpack + gzip body decoding to a valid MCP initialize result)

Known limitation: @codecai/web@0.3.0's BPE pre-tokenizer regex

The BPETokenizer constructor in @codecai/web@0.3.0 builds the pre-tokenizer regex with the 'gu' flag, but maps whose pre_tokenizer_pattern uses ES2025 inline-flag groups (e.g. qwen2's (?i:'s|'t|'re|'ve|'m|'ll|'d)) require the 'gv' flag to construct under V8. On Node 22 with the qwen2 map this throws SyntaxError: Invalid group at construction time.

Mitigation in this PR: Tokenizer is treated as optional. Construction failure logs a warning and the response-side text tokenization becomes a no-op for that map — the wire still gets reframed as msgpack/gzip, just without the per-content tokenization layered on top. Detokenizer (request-side args, no BPE) is unaffected.

Fix lives upstream in @codecai/web — either try 'gv' first and fall back, or just use 'gv' on Node 22+. Will land in @codecai/web@0.3.1. Out of scope for this PR — when it ships and the metamcp image bumps the dep, the warning goes away and the response-side tokenization works on every map.

What this doesn't include (yet)

Testing

Smoke-tested end-to-end on a live container at wdunn001/codec-metamcp:latest (image digest sha256:04e495b3..., built from this branch at fdbbab2 + 75e1a12). Standing up postgres + the image, signing up a user, creating a namespace + endpoint via tRPC, then exercising the negotiation knobs against /metamcp/<endpoint_name>/mcp:

  • JSON-in / JSON-out (no negotiation): byte-for-byte identical to upstream — text/event-stream SSE response with the JSON-RPC envelope as plain text. No regressions on the existing path.
  • JSON-in / Codec-out + gzip (?stream_format=msgpack + Accept: application/x-codec-msgpack + Accept-Encoding: gzip): server returns 200 OK with Content-Type: application/x-codec-msgpack, Content-Encoding: gzip, Transfer-Encoding: chunked. Body decompresses + msgpack-decodes back to the exact same MCP initialize response shape (protocolVersion, capabilities, serverInfo). Wire size: 240 B JSON-SSE -> 176 B Codec+gzip on this tiny envelope (1.36x) — bigger reductions live on tool-call responses with substantial text content, where the same physics as the cross-stack matrix applies.
  • Bad codec body (Content-Type: application/x-codec-msgpack + JSON bytes in body): 400 with a JSON-RPC error envelope. Doesn't crash the route, doesn't leak into other sessions.

The smoke test surfaced four real bugs that this branch already fixes (see commit history): the SDK uses writeHead+flushHeaders for SSE and overwrites pre-set headers; compressor.pipe(res) infinite-loops with the patched res.write; Accept: application/x-codec-msgpack triggers the SDK's own 406 unless we spoof it after capture; request and response negotiation needed to decouple so JSON-in/Codec-out works.

Reproduce locally with the same compose pattern (postgres + the image), or grab wdunn001/codec-metamcp:0.1.6 for a pre-built image including all the smoke-test fixes:

docker network create codec-net
docker run -d --name codec-mcp-db --network codec-net \
  -e POSTGRES_USER=metamcp -e POSTGRES_PASSWORD=metamcp -e POSTGRES_DB=metamcp \
  postgres:16
docker run -d --name codec-mcp --network codec-net -p 12008:12008 \
  -e POSTGRES_HOST=codec-mcp-db -e POSTGRES_USER=metamcp -e POSTGRES_PASSWORD=metamcp -e POSTGRES_DB=metamcp \
  -e DATABASE_URL=postgresql://metamcp:metamcp@codec-mcp-db:5432/metamcp \
  -e BETTER_AUTH_SECRET=$(openssl rand -hex 32) \
  wdunn001/codec-metamcp:0.1.6

Related

This is one of four engine-side Codec patches landing in parallel:

The protocol spec, six client-language reference implementations (Python / TS / .NET / Rust / Java / C), and the cross-stack benchmark harness all live at github.com/wdunn001/Codec.

Patent posture

Quasarke (the Codec author) is pursuing patent protection on certain Codec mechanisms. The wire format, handshake, and content-addressed map distribution described in spec/PROTOCOL.md are intended to be made available on royalty-free or FRAND terms to implementers of the Codec specification when patents issue. Adjacent improvements (ToolWatcher, Translator, dict-zstd, Codec-Zstd-Dict negotiation) may be commercially licensed separately — a Codec-compliant implementation does not require those modules. Defensive termination clause will apply to any future patent license. Full text: PATENTS.md.

This is informational; it does not itself grant a patent license. The full patent commitment will be published when patents issue or when the corresponding non-provisional applications are filed, whichever is sooner. For specific questions: licensing@quasarke.com.

Adds a Codec wire-format path to the MetaMCP namespace endpoint
without disturbing existing JSON-RPC traffic. Wire bytes for tool-
heavy MCP sessions drop dramatically: long tool-call results
(file reads, web fetches, RAG context) ship as length-prefixed
msgpack frames with optional gzip on top, instead of newline-
delimited JSON-RPC.

## Negotiation

Opt-in per request, three equivalent ways:

  - `?stream_format=msgpack` (or `protobuf`) on the URL
  - `Accept: application/x-codec-msgpack` (or `…-protobuf`) header
  - explicit `?stream_format=json` opts back out

When neither is set the route is byte-for-byte identical to
upstream MetaMCP — same routes, same SDK, same JSON-RPC envelopes.

`Accept-Encoding: gzip` adds streaming compression to the Codec
response. Brotli + dict-zstd land in a follow-up; gzip alone gets
us the bulk of the win on the small JSON-RPC envelopes that
dominate the request side.

## Implementation

Three new files under `apps/backend/src/lib/metamcp/codec/`:

  - `codec-frame.ts` — length-prefixed msgpack/protobuf framing,
    `negotiateStreamFormat()` from query+headers, decode helpers
    for inbound request bodies. Same wire shape the @codecai/web
    `decodeStream` and Python codecai.decode_msgpack_stream
    decoders already speak (used end-to-end on the cross-stack
    matrix at https://codecai.net).
  - `codec-compression.ts` — `Accept-Encoding` negotiation +
    streaming gzip Transform. Mirrors
    python/sglang/srt/entrypoints/codec_compression.py from the
    sglang Codec PR.
  - `codec-transcode.ts` — Express req/res wrappers that:
      - decode an inbound msgpack/protobuf body to a JS object so
        the SDK's existing JSON path sees a normal req.body
      - patch `res.write` and `res.end` so SDK JSON-RPC writes
        emit Codec frames through the negotiated compressor
        instead of newline-delimited JSON

One small change to `apps/backend/src/routers/mcp-proxy/metamcp.ts`:

  - mount `express.raw({ type: ["application/x-codec-msgpack",
    "application/x-codec-protobuf"] })` so the raw bytes survive
    long enough for codec-transcode to decode them
  - in the POST `/:uuid/mcp` handler, run negotiation first; if a
    Codec format is selected, decode the request body and wrap
    `res` BEFORE handing off to `transport.handleRequest`

The SDK's StreamableHTTPServerTransport is unmodified — it still
writes JSON-RPC into res, the wrapper just transcodes those writes
on the way out.

## Why this matters

Same physics as the cross-stack benchmark matrix:
  - sglang   485 KB → 354 B at 2K tokens   (1,404× with full stack)
  - vllm     479 KB → 3.9 KB                 (126× with gzip)
  - llama.cpp 529 KB → 16 KB                  (33× with gzip alone)

For MetaMCP specifically, the wire weight in any non-trivial
session lives in tool-call results — file reads, search results,
RAG snippets, model-generated content piped through tools. JSON-
RPC's per-character overhead compounds across the chain; binary
framing strips it. TTFB stays within 1ms of the JSON path on the
same server.

A follow-up PR will add token-ID transcoding for `text` content
blocks (the real per-token Codec wire reduction) plus a Translator
middleware for cross-vocab tool handoff. This first patch is the
foundation: opt-in negotiation + length-prefixed binary framing +
streaming compression.

Source for the benchmark numbers and the wider Codec ecosystem:
https://codecai.net · https://github.com/wdunn001/Codec
wdunn001 added a commit to wdunn001/codec-supervisor that referenced this pull request May 8, 2026
Mirrors the codec-vllm / codec-llamacpp pattern but skips the
Python supervisor sidecar — MetaMCP already ships an admin UI for
namespace/server management as its frontend, so there's nothing for
codec-supervisor to add. The image is just MetaMCP, built from the
wdunn001 fork at feat/codec-binary-transport (PR metatool-ai/metamcp#287),
with the Codec opt-in path (?stream_format=msgpack | Accept:
application/x-codec-msgpack) live on /:uuid/mcp.

Build args:
  - CODEC_METAMCP_REPO, CODEC_METAMCP_REF, CODEC_METAMCP_COMMIT —
    same shape as the existing CODEC_VLLM_*/CODEC_LLAMACPP_* args
  - NODE_VERSION (default 20) and PNPM_VERSION (default 10.12.0)
    matching upstream MetaMCP's own Dockerfile

pnpm install runs --no-frozen-lockfile because the patch adds
@msgpack/msgpack which isn't pinned in upstream's lockfile yet.
Flip back to --frozen-lockfile once the PR merges and the lockfile
gets updated upstream.
wdunn001 added a commit to wdunn001/codec-website that referenced this pull request May 8, 2026
Companion to /docs/codec-sglang/, /docs/codec-vllm/, /docs/codec-
llamacpp/ — same shape (Quick start / Negotiation / What you get /
Compose snippet / Client example / When to use / License / Source &
links / See also) but framed for the gateway side of the stack
rather than the inference engine side.

Calls out that codec-metamcp doesn't bundle a Python supervisor
(MetaMCP already ships its own admin UI as the frontend), so this
image is just MetaMCP-from-the-fork. Links the open PR
(metatool-ai/metamcp#287), the Dockerfile, the codec-supervisor
build recipe, and the cross-stack matrix that motivates the wire
reduction story.

Slots into Sidebar Server section at order: 4 (after sglang/vllm/
llamacpp at 1/2/3) and is picked up automatically by DocsSidebar's
frontmatter-driven listing.
wdunn001 added 9 commits May 8, 2026 04:08
…e/mcp

The first patch only hit routers/mcp-proxy/metamcp.ts — that's the
INTERNAL admin route at /mcp-proxy/metamcp/:uuid/mcp used by the
Next.js frontend for testing namespaces from the admin UI. Real MCP
clients connecting via the namespace URL hit the PUBLIC route at
/metamcp/:endpoint_name/mcp (mounted from
routers/public-metamcp/streamable-http.ts), so they were untouched.

Smoke-tested the v0.1.1 image on the lab box: the codec-metamcp
container boots clean, frontend serves, JSON-RPC works — but a
POST to /metamcp/<uuid>/mcp with Accept: application/x-codec-msgpack
returned a regular JSON 401 because the public route wasn't
patched. This commit fixes that gap by mirroring the same
negotiation block + raw-body parser into streamable-http.ts.

Same shape:
  1. mount express.raw({ type: 'application/x-codec-(msgpack|protobuf)' })
     on the public router so Codec request bodies survive long
     enough to decode
  2. negotiate stream format from ?stream_format / Accept
  3. decode request body, wrap response, hand off to
     transport.handleRequest as before

The internal /mcp-proxy/metamcp route still has the same patch —
that path is what the admin UI uses for in-house namespace tests
and benefits identically.
…eam_format

Smoke test surfaced a design bug: the first version pinned request
decode and response wrap to the same negotiation result, so a JSON
client that just wanted Codec on the response (JSON-in / Codec-out)
got its body double-decoded as msgpack and rejected with 400.

Split the negotiation:
  - Request body: keys off Content-Type. application/x-codec-msgpack
    or application/x-codec-protobuf triggers decodeCodecRequestBody;
    anything else (including application/json) leaves the body alone
    so the SDK's JSON middleware sees what it expects.
  - Response body: keys off ?stream_format= and Accept. When set,
    wraps res so SDK JSON-RPC writes get re-framed as Codec on the
    way out, regardless of what came in.

This lets clients adopt Codec on either end independently:
  - JSON-in, Codec-out (smallest migration: just set Accept)
  - Codec-in, Codec-out (full)
  - Codec-in, JSON-out (?stream_format=json explicit opt-out)

Same fix mirrored on both routes — the public route at
/metamcp/:endpoint_name/mcp (real MCP clients) and the internal
admin route at /mcp-proxy/metamcp/:uuid/mcp (admin UI tests).
…ders for SSE

Smoke test against the live image surfaced the response wrap was
half-effective: my Codec setHeader calls before transport.handleRequest
got blown away when the SDK called res.writeHead(200, {Content-Type:
text/event-stream, ...}).flushHeaders() to commit SSE headers. Client
saw text/event-stream in the headers and Codec-msgpack bytes in the
body — connection mismatched and the proxy ECONNRESET'd.

Fix patches three more methods on res:

  - writeHead — when SDK calls it, we substitute our Codec headers
    (Content-Type: application/x-codec-msgpack, optional
    Content-Encoding, Transfer-Encoding: chunked) but keep the SDK's
    status code and any non-content-* headers it set
    (mcp-session-id, cache-control, access-control-*).

  - flushHeaders — swallow it. writeHead already commits headers
    in our path; forwarding flushHeaders would double-send.

  - end — already patched, but document the new error-path case
    (SDK does writeHead(4xx).end(JSON.stringify(...)) for protocol
    errors; that JSON now goes through the same forwarder so the
    client gets a Codec-encoded error envelope, not mixed JSON).

This is the third smoke-test bug surfaced this session — the pattern's
been: write the patch, build, run, see what breaks. Each one's been
straightforward once visible. Next iteration verifies end-to-end
JSON-in / Codec-out round-trip works.
…ixes infinite loop

Smoke test surfaced the response wrap was hanging because
compressor.pipe(res) routed compressor output INTO our patched
res.write, which forwarded it back to the compressor. Each chunk
ping-ponged forever; the response socket eventually got reset by
the client.

Fix: don't pipe the compressor to res. Capture originalWrite +
originalEnd before patching res.write, then attach 'data' and
'end' listeners on the compressor that call originalWrite/
originalEnd directly. Compressed bytes go straight to the socket;
our patched res.write only sees inbound SDK writes (which is what
we wanted to intercept).

Same shape: SDK -> patched res.write -> forwardChunkToCodec ->
compressor.write -> [compressor 'data' event] -> originalWrite ->
socket. No loop, no buffering above the compressor.

This was the response-side missing piece. With it the JSON-in /
Codec-out path round-trips cleanly: client sends JSON, server
emits length-prefixed msgpack frames over the wire, and standard
@codecai/web decodeStream parses them as plain JSON-RPC.
The SDK's StreamableHTTPServerTransport runs its own Accept check
against {application/json, text/event-stream} and short-circuits
406 Not Acceptable for anything else. Smoke test surfaced the
JSON-in/Codec-out path returning 406 because Accept:
application/x-codec-msgpack didn't pass the SDK's filter.

After we capture the codec format from the original Accept value
and wrap the response, rewrite req.headers.accept to the SDK-
friendly value. The SDK then proceeds happily down its SSE path,
emits text/event-stream chunks, and our response wrapper re-frames
those into Codec on the wire.

Mirrored on both the public and admin routes.
Builds on v0.1.6's wire framing (msgpack/gzip on the JSON-RPC envelope)
with the actual headline value: tokens stay tokens through the chain;
detokenization runs ONCE at the only boundary that requires text — the
JSON-RPC hop to the underlying MCP server.

Three properties fall out:

  1. The inference engine never detokenizes — emits Codec frames
     straight on the wire.
  2. ToolWatcher anywhere in the chain runs on raw uint32s
     (~100x faster than detokenize+regex).
  3. The consumer (agent runtime, UI, next agent) decides when text
     is actually needed. Most chains never do.

## What's added

- `codec-vocab.ts` — sha256-cached tokenizer dialect map handles
  via @codecai/web's loadMap/Detokenizer/Tokenizer. LRU bounded
  to 32 maps, lookup is O(1) by sha256 hash.

- `codec-content.ts` — two transforms operating on a single vocab
  map per request:
    * `detokenizeCodecArgs(request)` — when a tools/call request
      carries `arguments._codec_meta = {ids, map_id}`, render the
      ids back to text + JSON.parse, replace args inline so the
      MCP server sees a normal JSON envelope.
    * `tokenizeContent(result, mapHash)` — walk
      CallToolResult.content[], for each `{type:"text", text}`
      block append a sibling `{type:"_codec_meta", ids, map_id}`.
      Original text preserved — non-Codec clients still see it,
      Codec-aware clients prefer the tokenized sibling.
  Plus `loadVocabFromHeader()` parsing
  `X-Codec-Map: <url>;sha256=<hash>` once on first reference.

- `codec-transcode.ts` extended: `wrapResponseForCodec()` now takes
  an optional vocabHash. When set, every JSON-RPC response is
  walked for CallToolResult shape and the content[] gets tokenized
  before msgpack-encoding. Other RPC types (initialize,
  prompts/get, resources/read, errors) pass through unchanged —
  the wire reduction comes from msgpack on the envelope, not from
  rewriting their bodies.

## Wiring

Both routes (public `/metamcp/:endpoint_name/mcp` and admin
`/mcp-proxy/metamcp/:uuid/mcp`) now:

  1. Read `X-Codec-Map` header, await `loadVocabFromHeader()` to
     populate the cache. 400 if the header is malformed.
  2. Inspect `req.body.method`. If `tools/call` and arguments
     carry `_codec_meta`, detokenize inline before the SDK sees
     the request. The SDK's transport.handleRequest gets a normal
     JSON envelope — no special routing.
  3. Pass `vocabHash` to `wrapResponseForCodec()` so the response
     wrap also runs `tokenizeContent` on each CallToolResult.

## What's NOT in this patch (deferred)

- Streaming chunked tokenization for incremental tool results.
  MCP `tools/call` doesn't stream today; if/when it does, the
  tokenizer's word-boundary-buffered API (already in
  @codecai/web's Translator) will plug in here.
- Per-namespace default vocab map. Headers carry the map per-
  request; first request loads, subsequent ones hit the cache.
  No DB schema change needed.

## Compatibility

All-JSON traffic on these routes continues to work byte-for-byte
identically — none of the new code paths fire unless the client
opts in via Codec content type, ?stream_format query param,
Accept header, OR the new X-Codec-Map header. The defaults are
all "do nothing".

## Wire impact (local smoke test, more coming)

The headline path is long tool results — file reads, web fetches,
RAG context, model-generated text piped through tools. On a
2K-token tool result the tokenize+msgpack+gzip path drops the
wire bytes by the same physics as the cross-stack matrix
(sglang dict-zstd hits 1,404x at this size; here we get gzip-only
because dict-zstd needs a dict-loader hook in the next patch).
Empty-string text-block suppression (further wire savings at the
cost of non-Codec-client compatibility) deferred to a follow-up.
@codecai/web@0.3.0 constructs the BPE pre-tokenizer regex with
the 'gu' flag, but maps whose pre_tokenizer_pattern uses ES2025
inline-flag groups like (?i:'s|'t|...) need the 'gv' flag — V8
throws SyntaxError at construction. The qwen2 map is one such
case; smoke test on v0.2.1 (Node 22) hit the failure.

Two paths into the vocab cache:
  - Detokenizer (pure ID -> bytes lookup): never fails.
  - Tokenizer (BPE, requires the regex): may fail on some maps
    until @codecai/web ships the 'gv' fix.

Treat Tokenizer as optional. When construction throws:
  - Log a warning naming the map.
  - Cache entry stores tok: undefined.
  - codec-content.tokenizeContent() short-circuits and returns
    the result unchanged — wire still gets re-framed as msgpack,
    just without the per-content tokenization layered on top.
  - codec-content.detokenizeCodecArgs() works either way because
    it only uses Detokenizer.

The end-state behavior:
  - Request-side codec args path: full functionality on every map.
  - Response-side text tokenization: full functionality on maps
    whose pre-tokenizer regex is V8-compatible under 'gu' (which
    is most of them — qwen2 is the conspicuous outlier today).

Long-term fix is in @codecai/web (try 'gv' before 'gu', or just
use 'gv' on Node 22+). Out of scope for this PR.
0.4.0 ships the pre-tokenizer regex fallback (try 'gv' first, fall
back to 'gu') so maps with ES2025 inline-flag groups in their
pre_tokenizer_pattern (qwen2's contraction handler is the canonical
case) construct cleanly. The graceful-degrade in codec-vocab.ts
stays — it covers the still-rare cases where neither flag works —
but it should never fire on common maps anymore.

Once this image lands, the [Codec] Tokenizer construction failed
warning we saw on v0.2.2 with qwen2 goes away and response-side
text tokenization activates: CallToolResult.content[].text blocks
get a sibling _codec_meta block with the encoded token IDs.
@wdunn001 wdunn001 closed this May 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant