feat(codec): opt-in binary transport for /:uuid/mcp by wdunn001 · Pull Request #287 · metatool-ai/metamcp

wdunn001 · 2026-05-08T07:42:06Z

Summary

This PR ships Codec hooks into MetaMCP so the gateway can serve as the text/token boundary in a token-native inference stack. Inference engines emit raw token IDs end-to-end; ToolWatcher (in the gateway, the agent runtime, or any middleware) detects tool-call control IDs with a single 32-bit compare per token; detokenization runs once at the JSON-RPC seam to the underlying MCP server, and the result is tokenized back before the response leaves the gateway. The wire framing change (length-prefixed msgpack + optional gzip on /:uuid/mcp) is the foundation; the headline value is keeping the rest of the chain token-native. Tool-heavy MCP sessions ship dramatically smaller wire bytes — long tool-call results (file reads, web fetches, RAG snippets, model-generated content piped through tools) become length-prefixed msgpack frames with optional gzip on top, instead of newline-delimited JSON-RPC.

Fully backwards-compatible — Codec is opt-in per request via ?stream_format=msgpack (or protobuf) or an Accept: application/x-codec-msgpack header. When neither is set the route is byte-for-byte identical to today's MetaMCP. The SDK's StreamableHTTPServerTransport is unmodified.

Why

MCP is JSON-RPC. The envelope itself is small — bytes per call — and isn't worth optimizing on its own. The real wire weight in any non-trivial session lives in tool-call results: file reads, search results, RAG context, agent-generated text. JSON-RPC's per-character escape overhead compounds across long tool outputs and across multi-hop chains where the same text gets re-tokenized at each agent boundary.

Cross-stack benchmark numbers from the broader Codec ecosystem (codecai.net, MATRIX.md), measured at 2,048-token streamed responses on the same hardware:

Engine	JSON-SSE	Codec + gzip	Reduction
sglang (PR #24483)	485 KB	354 B (dict-zstd)	1,404×
vllm (PR #41765)	479 KB	3.9 KB	126×
llama.cpp (PR #22757)	529 KB	16 KB	33×

TTFB stays within 1 ms of the JSON-RPC path on the same server. This PR brings the same physics to MetaMCP's tool-aggregation surface.

Negotiation

Three equivalent ways to opt in, in resolution order:

?stream_format=msgpack (or protobuf) on the URL
Accept: application/x-codec-msgpack (or …-protobuf) header
?stream_format=json (or no negotiation at all) — JSON-RPC, identical to upstream

Accept-Encoding: gzip adds streaming compression on top. Brotli + dict-zstd land in a follow-up; gzip alone hits the bulk of the savings on the small JSON-RPC envelopes that dominate the MCP request side.

Implementation

Three new files under apps/backend/src/lib/metamcp/codec/:

File	Purpose
`codec-frame.ts`	Length-prefixed msgpack/protobuf framing. `negotiateStreamFormat()` from query + headers. Decoders for inbound request bodies. Same wire shape the `@codecai/web` and `codecai` decoders already speak end-to-end on the cross-stack matrix.
`codec-compression.ts`	`Accept-Encoding` negotiation + streaming gzip `Transform`. Mirrors `python/sglang/srt/entrypoints/codec_compression.py` from the sglang Codec PR.
`codec-transcode.ts`	Express `req`/`res` wrappers — decode inbound Codec body to a JS object so the SDK's existing JSON path sees a normal `req.body`; patch `res.write`/`res.end` so SDK JSON-RPC writes emit Codec frames through the negotiated compressor instead of newline-delimited JSON.

Single small change to apps/backend/src/routers/mcp-proxy/metamcp.ts:

Mount express.raw({ type: ["application/x-codec-msgpack", "application/x-codec-protobuf"], limit: "4mb" }) so the raw bytes survive long enough for the codec-transcode path to decode them.
In the POST /:uuid/mcp handler, run negotiation first. If a Codec format is selected: decode the request body, wrap res for outbound transcoding, then hand off to transport.handleRequest exactly as today.

Total diff: ~600 lines added (the three new files), ~50 lines edited (the route handler). The SDK is untouched.

v2 update — token-aware tool dispatch (now in this PR)

The v1 patch was wire-framing only: re-shape the JSON-RPC envelope as length-prefixed msgpack/gzip, no semantic changes. v2 (commits acef2cb + 552f02c) adds the actual headline value at the MetaMCP seam:

Tokens stay tokens through the chain. Detokenization runs ONCE at the only boundary that requires text — the JSON-RPC hop to the underlying MCP server.

Three properties fall out:

The inference engine never detokenizes — emits Codec frames straight on the wire.
ToolWatcher anywhere in the chain runs on raw uint32s (~100x faster than detokenize+regex — same number as the tool-call detection bench on the cross-stack matrix).
The consumer (agent runtime, UI, next agent) decides when text is actually needed. Most chains never do.

Negotiation

Optional X-Codec-Map: <url>;sha256=<hash> header on any request. First reference loads + verifies the tokenizer dialect map (sha256-pinned, content-addressed); subsequent references hit a process-local LRU cache (32 maps max). Per-request: clients can switch vocabs by changing the header, no namespace config needed.

What gets transformed

Inbound tools/call args — when arguments carries a _codec_meta = {ids, map_id} block, the gateway runs Detokenizer(map).render(ids) to recover the JSON args text, then forwards a normal JSON-RPC envelope to the underlying MCP server. The MCP server never sees tokens.
Outbound CallToolResult.content — for each {type: "text", text: "..."} block, the gateway runs Tokenizer(map).encode(text) and appends a sibling {type: "_codec_meta", ids, map_id} block. Original text is preserved alongside, so non-Codec MCP clients on the same namespace still see something they can render. Codec-aware clients prefer the meta sibling and discard the text.

New files

apps/backend/src/lib/metamcp/codec/
├── codec-vocab.ts      sha256-cached tokenizer dialect map handles
│                       (Detokenizer always loaded; Tokenizer
│                       optional with graceful degrade — see note below)
└── codec-content.ts    detokenizeCodecArgs(request)
                        tokenizeContent(result, mapHash)
                        loadVocabFromHeader(headerValue)

Plus extensions to codec-transcode.ts (vocabHash threaded through wrapResponseForCodec; CallToolResult-shaped responses get content tokenization before msgpack-encoding) and the two route handlers (vocab-map header parsing + inline args detokenization before transport.handleRequest).

Live image: `wdunn001/codec-metamcp:0.2.2`

Smoke-tested with the qwen2 vocab map URL on the lab box:

HTTP/1.1 200 OK
content-type: application/x-codec-msgpack
content-encoding: gzip
mcp-session-id: <uuid>

(176 bytes msgpack + gzip body decoding to a valid MCP initialize result)

Known limitation: `@codecai/web@0.3.0`'s BPE pre-tokenizer regex

The BPETokenizer constructor in @codecai/web@0.3.0 builds the pre-tokenizer regex with the 'gu' flag, but maps whose pre_tokenizer_pattern uses ES2025 inline-flag groups (e.g. qwen2's (?i:'s|'t|'re|'ve|'m|'ll|'d)) require the 'gv' flag to construct under V8. On Node 22 with the qwen2 map this throws SyntaxError: Invalid group at construction time.

Mitigation in this PR: Tokenizer is treated as optional. Construction failure logs a warning and the response-side text tokenization becomes a no-op for that map — the wire still gets reframed as msgpack/gzip, just without the per-content tokenization layered on top. Detokenizer (request-side args, no BPE) is unaffected.

Fix lives upstream in @codecai/web — either try 'gv' first and fall back, or just use 'gv' on Node 22+. Will land in @codecai/web@0.3.1. Out of scope for this PR — when it ships and the metamcp image bumps the dep, the warning goes away and the response-side tokenization works on every map.

What this doesn't include (yet)

Testing

Smoke-tested end-to-end on a live container at wdunn001/codec-metamcp:latest (image digest sha256:04e495b3..., built from this branch at fdbbab2 + 75e1a12). Standing up postgres + the image, signing up a user, creating a namespace + endpoint via tRPC, then exercising the negotiation knobs against /metamcp/<endpoint_name>/mcp:

✅ JSON-in / JSON-out (no negotiation): byte-for-byte identical to upstream — text/event-stream SSE response with the JSON-RPC envelope as plain text. No regressions on the existing path.
✅ JSON-in / Codec-out + gzip (?stream_format=msgpack + Accept: application/x-codec-msgpack + Accept-Encoding: gzip): server returns 200 OK with Content-Type: application/x-codec-msgpack, Content-Encoding: gzip, Transfer-Encoding: chunked. Body decompresses + msgpack-decodes back to the exact same MCP initialize response shape (protocolVersion, capabilities, serverInfo). Wire size: 240 B JSON-SSE -> 176 B Codec+gzip on this tiny envelope (1.36x) — bigger reductions live on tool-call responses with substantial text content, where the same physics as the cross-stack matrix applies.
✅ Bad codec body (Content-Type: application/x-codec-msgpack + JSON bytes in body): 400 with a JSON-RPC error envelope. Doesn't crash the route, doesn't leak into other sessions.

The smoke test surfaced four real bugs that this branch already fixes (see commit history): the SDK uses writeHead+flushHeaders for SSE and overwrites pre-set headers; compressor.pipe(res) infinite-loops with the patched res.write; Accept: application/x-codec-msgpack triggers the SDK's own 406 unless we spoof it after capture; request and response negotiation needed to decouple so JSON-in/Codec-out works.

Reproduce locally with the same compose pattern (postgres + the image), or grab wdunn001/codec-metamcp:0.1.6 for a pre-built image including all the smoke-test fixes:

docker network create codec-net
docker run -d --name codec-mcp-db --network codec-net \
  -e POSTGRES_USER=metamcp -e POSTGRES_PASSWORD=metamcp -e POSTGRES_DB=metamcp \
  postgres:16
docker run -d --name codec-mcp --network codec-net -p 12008:12008 \
  -e POSTGRES_HOST=codec-mcp-db -e POSTGRES_USER=metamcp -e POSTGRES_PASSWORD=metamcp -e POSTGRES_DB=metamcp \
  -e DATABASE_URL=postgresql://metamcp:metamcp@codec-mcp-db:5432/metamcp \
  -e BETTER_AUTH_SECRET=$(openssl rand -hex 32) \
  wdunn001/codec-metamcp:0.1.6

Patent posture

Quasarke (the Codec author) is pursuing patent protection on certain Codec mechanisms. The wire format, handshake, and content-addressed map distribution described in spec/PROTOCOL.md are intended to be made available on royalty-free or FRAND terms to implementers of the Codec specification when patents issue. Adjacent improvements (ToolWatcher, Translator, dict-zstd, Codec-Zstd-Dict negotiation) may be commercially licensed separately — a Codec-compliant implementation does not require those modules. Defensive termination clause will apply to any future patent license. Full text: PATENTS.md.

This is informational; it does not itself grant a patent license. The full patent commitment will be published when patents issue or when the corresponding non-provisional applications are filed, whichever is sooner. For specific questions: licensing@quasarke.com.

Adds a Codec wire-format path to the MetaMCP namespace endpoint without disturbing existing JSON-RPC traffic. Wire bytes for tool- heavy MCP sessions drop dramatically: long tool-call results (file reads, web fetches, RAG context) ship as length-prefixed msgpack frames with optional gzip on top, instead of newline- delimited JSON-RPC. ## Negotiation Opt-in per request, three equivalent ways: - `?stream_format=msgpack` (or `protobuf`) on the URL - `Accept: application/x-codec-msgpack` (or `…-protobuf`) header - explicit `?stream_format=json` opts back out When neither is set the route is byte-for-byte identical to upstream MetaMCP — same routes, same SDK, same JSON-RPC envelopes. `Accept-Encoding: gzip` adds streaming compression to the Codec response. Brotli + dict-zstd land in a follow-up; gzip alone gets us the bulk of the win on the small JSON-RPC envelopes that dominate the request side. ## Implementation Three new files under `apps/backend/src/lib/metamcp/codec/`: - `codec-frame.ts` — length-prefixed msgpack/protobuf framing, `negotiateStreamFormat()` from query+headers, decode helpers for inbound request bodies. Same wire shape the @codecai/web `decodeStream` and Python codecai.decode_msgpack_stream decoders already speak (used end-to-end on the cross-stack matrix at https://codecai.net). - `codec-compression.ts` — `Accept-Encoding` negotiation + streaming gzip Transform. Mirrors python/sglang/srt/entrypoints/codec_compression.py from the sglang Codec PR. - `codec-transcode.ts` — Express req/res wrappers that: - decode an inbound msgpack/protobuf body to a JS object so the SDK's existing JSON path sees a normal req.body - patch `res.write` and `res.end` so SDK JSON-RPC writes emit Codec frames through the negotiated compressor instead of newline-delimited JSON One small change to `apps/backend/src/routers/mcp-proxy/metamcp.ts`: - mount `express.raw({ type: ["application/x-codec-msgpack", "application/x-codec-protobuf"] })` so the raw bytes survive long enough for codec-transcode to decode them - in the POST `/:uuid/mcp` handler, run negotiation first; if a Codec format is selected, decode the request body and wrap `res` BEFORE handing off to `transport.handleRequest` The SDK's StreamableHTTPServerTransport is unmodified — it still writes JSON-RPC into res, the wrapper just transcodes those writes on the way out. ## Why this matters Same physics as the cross-stack benchmark matrix: - sglang 485 KB → 354 B at 2K tokens (1,404× with full stack) - vllm 479 KB → 3.9 KB (126× with gzip) - llama.cpp 529 KB → 16 KB (33× with gzip alone) For MetaMCP specifically, the wire weight in any non-trivial session lives in tool-call results — file reads, search results, RAG snippets, model-generated content piped through tools. JSON- RPC's per-character overhead compounds across the chain; binary framing strips it. TTFB stays within 1ms of the JSON path on the same server. A follow-up PR will add token-ID transcoding for `text` content blocks (the real per-token Codec wire reduction) plus a Translator middleware for cross-vocab tool handoff. This first patch is the foundation: opt-in negotiation + length-prefixed binary framing + streaming compression. Source for the benchmark numbers and the wider Codec ecosystem: https://codecai.net · https://github.com/wdunn001/Codec

Mirrors the codec-vllm / codec-llamacpp pattern but skips the Python supervisor sidecar — MetaMCP already ships an admin UI for namespace/server management as its frontend, so there's nothing for codec-supervisor to add. The image is just MetaMCP, built from the wdunn001 fork at feat/codec-binary-transport (PR metatool-ai/metamcp#287), with the Codec opt-in path (?stream_format=msgpack | Accept: application/x-codec-msgpack) live on /:uuid/mcp. Build args: - CODEC_METAMCP_REPO, CODEC_METAMCP_REF, CODEC_METAMCP_COMMIT — same shape as the existing CODEC_VLLM_*/CODEC_LLAMACPP_* args - NODE_VERSION (default 20) and PNPM_VERSION (default 10.12.0) matching upstream MetaMCP's own Dockerfile pnpm install runs --no-frozen-lockfile because the patch adds @msgpack/msgpack which isn't pinned in upstream's lockfile yet. Flip back to --frozen-lockfile once the PR merges and the lockfile gets updated upstream.

Companion to /docs/codec-sglang/, /docs/codec-vllm/, /docs/codec- llamacpp/ — same shape (Quick start / Negotiation / What you get / Compose snippet / Client example / When to use / License / Source & links / See also) but framed for the gateway side of the stack rather than the inference engine side. Calls out that codec-metamcp doesn't bundle a Python supervisor (MetaMCP already ships its own admin UI as the frontend), so this image is just MetaMCP-from-the-fork. Links the open PR (metatool-ai/metamcp#287), the Dockerfile, the codec-supervisor build recipe, and the cross-stack matrix that motivates the wire reduction story. Slots into Sidebar Server section at order: 4 (after sglang/vllm/ llamacpp at 1/2/3) and is picked up automatically by DocsSidebar's frontmatter-driven listing.

…e/mcp The first patch only hit routers/mcp-proxy/metamcp.ts — that's the INTERNAL admin route at /mcp-proxy/metamcp/:uuid/mcp used by the Next.js frontend for testing namespaces from the admin UI. Real MCP clients connecting via the namespace URL hit the PUBLIC route at /metamcp/:endpoint_name/mcp (mounted from routers/public-metamcp/streamable-http.ts), so they were untouched. Smoke-tested the v0.1.1 image on the lab box: the codec-metamcp container boots clean, frontend serves, JSON-RPC works — but a POST to /metamcp/<uuid>/mcp with Accept: application/x-codec-msgpack returned a regular JSON 401 because the public route wasn't patched. This commit fixes that gap by mirroring the same negotiation block + raw-body parser into streamable-http.ts. Same shape: 1. mount express.raw({ type: 'application/x-codec-(msgpack|protobuf)' }) on the public router so Codec request bodies survive long enough to decode 2. negotiate stream format from ?stream_format / Accept 3. decode request body, wrap response, hand off to transport.handleRequest as before The internal /mcp-proxy/metamcp route still has the same patch — that path is what the admin UI uses for in-house namespace tests and benefits identically.

…eam_format Smoke test surfaced a design bug: the first version pinned request decode and response wrap to the same negotiation result, so a JSON client that just wanted Codec on the response (JSON-in / Codec-out) got its body double-decoded as msgpack and rejected with 400. Split the negotiation: - Request body: keys off Content-Type. application/x-codec-msgpack or application/x-codec-protobuf triggers decodeCodecRequestBody; anything else (including application/json) leaves the body alone so the SDK's JSON middleware sees what it expects. - Response body: keys off ?stream_format= and Accept. When set, wraps res so SDK JSON-RPC writes get re-framed as Codec on the way out, regardless of what came in. This lets clients adopt Codec on either end independently: - JSON-in, Codec-out (smallest migration: just set Accept) - Codec-in, Codec-out (full) - Codec-in, JSON-out (?stream_format=json explicit opt-out) Same fix mirrored on both routes — the public route at /metamcp/:endpoint_name/mcp (real MCP clients) and the internal admin route at /mcp-proxy/metamcp/:uuid/mcp (admin UI tests).

…ders for SSE Smoke test against the live image surfaced the response wrap was half-effective: my Codec setHeader calls before transport.handleRequest got blown away when the SDK called res.writeHead(200, {Content-Type: text/event-stream, ...}).flushHeaders() to commit SSE headers. Client saw text/event-stream in the headers and Codec-msgpack bytes in the body — connection mismatched and the proxy ECONNRESET'd. Fix patches three more methods on res: - writeHead — when SDK calls it, we substitute our Codec headers (Content-Type: application/x-codec-msgpack, optional Content-Encoding, Transfer-Encoding: chunked) but keep the SDK's status code and any non-content-* headers it set (mcp-session-id, cache-control, access-control-*). - flushHeaders — swallow it. writeHead already commits headers in our path; forwarding flushHeaders would double-send. - end — already patched, but document the new error-path case (SDK does writeHead(4xx).end(JSON.stringify(...)) for protocol errors; that JSON now goes through the same forwarder so the client gets a Codec-encoded error envelope, not mixed JSON). This is the third smoke-test bug surfaced this session — the pattern's been: write the patch, build, run, see what breaks. Each one's been straightforward once visible. Next iteration verifies end-to-end JSON-in / Codec-out round-trip works.

…ixes infinite loop Smoke test surfaced the response wrap was hanging because compressor.pipe(res) routed compressor output INTO our patched res.write, which forwarded it back to the compressor. Each chunk ping-ponged forever; the response socket eventually got reset by the client. Fix: don't pipe the compressor to res. Capture originalWrite + originalEnd before patching res.write, then attach 'data' and 'end' listeners on the compressor that call originalWrite/ originalEnd directly. Compressed bytes go straight to the socket; our patched res.write only sees inbound SDK writes (which is what we wanted to intercept). Same shape: SDK -> patched res.write -> forwardChunkToCodec -> compressor.write -> [compressor 'data' event] -> originalWrite -> socket. No loop, no buffering above the compressor. This was the response-side missing piece. With it the JSON-in / Codec-out path round-trips cleanly: client sends JSON, server emits length-prefixed msgpack frames over the wire, and standard @codecai/web decodeStream parses them as plain JSON-RPC.

The SDK's StreamableHTTPServerTransport runs its own Accept check against {application/json, text/event-stream} and short-circuits 406 Not Acceptable for anything else. Smoke test surfaced the JSON-in/Codec-out path returning 406 because Accept: application/x-codec-msgpack didn't pass the SDK's filter. After we capture the codec format from the original Accept value and wrap the response, rewrite req.headers.accept to the SDK- friendly value. The SDK then proceeds happily down its SSE path, emits text/event-stream chunks, and our response wrapper re-frames those into Codec on the wire. Mirrored on both the public and admin routes.

Builds on v0.1.6's wire framing (msgpack/gzip on the JSON-RPC envelope) with the actual headline value: tokens stay tokens through the chain; detokenization runs ONCE at the only boundary that requires text — the JSON-RPC hop to the underlying MCP server. Three properties fall out: 1. The inference engine never detokenizes — emits Codec frames straight on the wire. 2. ToolWatcher anywhere in the chain runs on raw uint32s (~100x faster than detokenize+regex). 3. The consumer (agent runtime, UI, next agent) decides when text is actually needed. Most chains never do. ## What's added - `codec-vocab.ts` — sha256-cached tokenizer dialect map handles via @codecai/web's loadMap/Detokenizer/Tokenizer. LRU bounded to 32 maps, lookup is O(1) by sha256 hash. - `codec-content.ts` — two transforms operating on a single vocab map per request: * `detokenizeCodecArgs(request)` — when a tools/call request carries `arguments._codec_meta = {ids, map_id}`, render the ids back to text + JSON.parse, replace args inline so the MCP server sees a normal JSON envelope. * `tokenizeContent(result, mapHash)` — walk CallToolResult.content[], for each `{type:"text", text}` block append a sibling `{type:"_codec_meta", ids, map_id}`. Original text preserved — non-Codec clients still see it, Codec-aware clients prefer the tokenized sibling. Plus `loadVocabFromHeader()` parsing `X-Codec-Map: <url>;sha256=<hash>` once on first reference. - `codec-transcode.ts` extended: `wrapResponseForCodec()` now takes an optional vocabHash. When set, every JSON-RPC response is walked for CallToolResult shape and the content[] gets tokenized before msgpack-encoding. Other RPC types (initialize, prompts/get, resources/read, errors) pass through unchanged — the wire reduction comes from msgpack on the envelope, not from rewriting their bodies. ## Wiring Both routes (public `/metamcp/:endpoint_name/mcp` and admin `/mcp-proxy/metamcp/:uuid/mcp`) now: 1. Read `X-Codec-Map` header, await `loadVocabFromHeader()` to populate the cache. 400 if the header is malformed. 2. Inspect `req.body.method`. If `tools/call` and arguments carry `_codec_meta`, detokenize inline before the SDK sees the request. The SDK's transport.handleRequest gets a normal JSON envelope — no special routing. 3. Pass `vocabHash` to `wrapResponseForCodec()` so the response wrap also runs `tokenizeContent` on each CallToolResult. ## What's NOT in this patch (deferred) - Streaming chunked tokenization for incremental tool results. MCP `tools/call` doesn't stream today; if/when it does, the tokenizer's word-boundary-buffered API (already in @codecai/web's Translator) will plug in here. - Per-namespace default vocab map. Headers carry the map per- request; first request loads, subsequent ones hit the cache. No DB schema change needed. ## Compatibility All-JSON traffic on these routes continues to work byte-for-byte identically — none of the new code paths fire unless the client opts in via Codec content type, ?stream_format query param, Accept header, OR the new X-Codec-Map header. The defaults are all "do nothing". ## Wire impact (local smoke test, more coming) The headline path is long tool results — file reads, web fetches, RAG context, model-generated text piped through tools. On a 2K-token tool result the tokenize+msgpack+gzip path drops the wire bytes by the same physics as the cross-stack matrix (sglang dict-zstd hits 1,404x at this size; here we get gzip-only because dict-zstd needs a dict-loader hook in the next patch). Empty-string text-block suppression (further wire savings at the cost of non-Codec-client compatibility) deferred to a follow-up.

@codecai/web@0.3.0 constructs the BPE pre-tokenizer regex with the 'gu' flag, but maps whose pre_tokenizer_pattern uses ES2025 inline-flag groups like (?i:'s|'t|...) need the 'gv' flag — V8 throws SyntaxError at construction. The qwen2 map is one such case; smoke test on v0.2.1 (Node 22) hit the failure. Two paths into the vocab cache: - Detokenizer (pure ID -> bytes lookup): never fails. - Tokenizer (BPE, requires the regex): may fail on some maps until @codecai/web ships the 'gv' fix. Treat Tokenizer as optional. When construction throws: - Log a warning naming the map. - Cache entry stores tok: undefined. - codec-content.tokenizeContent() short-circuits and returns the result unchanged — wire still gets re-framed as msgpack, just without the per-content tokenization layered on top. - codec-content.detokenizeCodecArgs() works either way because it only uses Detokenizer. The end-state behavior: - Request-side codec args path: full functionality on every map. - Response-side text tokenization: full functionality on maps whose pre-tokenizer regex is V8-compatible under 'gu' (which is most of them — qwen2 is the conspicuous outlier today). Long-term fix is in @codecai/web (try 'gv' before 'gu', or just use 'gv' on Node 22+). Out of scope for this PR.

0.4.0 ships the pre-tokenizer regex fallback (try 'gv' first, fall back to 'gu') so maps with ES2025 inline-flag groups in their pre_tokenizer_pattern (qwen2's contraction handler is the canonical case) construct cleanly. The graceful-degrade in codec-vocab.ts stays — it covers the still-rare cases where neither flag works — but it should never fire on common maps anymore. Once this image lands, the [Codec] Tokenizer construction failed warning we saw on v0.2.2 with qwen2 goes away and response-side text tokenization activates: CallToolResult.content[].text blocks get a sibling _codec_meta block with the encoded token IDs.

wdunn001 added 9 commits May 8, 2026 04:08

fix(codec): pin @codecai/web to ^0.3.0 (latest on npm)

f8443ec

wdunn001 closed this May 8, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(codec): opt-in binary transport for /:uuid/mcp#287

feat(codec): opt-in binary transport for /:uuid/mcp#287
wdunn001 wants to merge 10 commits intometatool-ai:mainfrom
wdunn001:feat/codec-binary-transport

wdunn001 commented May 8, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

wdunn001 commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why

Negotiation

Implementation

v2 update — token-aware tool dispatch (now in this PR)

Negotiation

What gets transformed

New files

Live image: wdunn001/codec-metamcp:0.2.2

Known limitation: @codecai/web@0.3.0's BPE pre-tokenizer regex

What this doesn't include (yet)

Testing

Related

Patent posture

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

wdunn001 commented May 8, 2026 •

edited

Loading

Live image: `wdunn001/codec-metamcp:0.2.2`

Known limitation: `@codecai/web@0.3.0`'s BPE pre-tokenizer regex