Skip to content

fix(telemetry): emit OTel-standard gen_ai.usage.cache_read.input_tokens across providers#1666

Merged
gold-silver-copper merged 3 commits into0xPlaygrounds:mainfrom
alwayys-afk:fix/openai-otel-cache-token-attrs
Apr 28, 2026
Merged

fix(telemetry): emit OTel-standard gen_ai.usage.cache_read.input_tokens across providers#1666
gold-silver-copper merged 3 commits into0xPlaygrounds:mainfrom
alwayys-afk:fix/openai-otel-cache-token-attrs

Conversation

@alwayys-afk
Copy link
Copy Markdown
Contributor

Summary

  • Rename the tracing/OTel attribute gen_ai.usage.cached_tokens to gen_ai.usage.cache_read.input_tokens across every provider that emits it.
  • gen_ai.usage.cache_read.input_tokens is the canonical attribute name listed in the OpenTelemetry GenAI semantic-conventions registry. The previous name was non-standard and would not be picked up by OTel-aware backends.
  • Every instance — span declarations and span.record(...) call sites, streaming and non-streaming paths — has been updated so a single provider never emits two different attribute names.

Providers touched

openai (chat + responses_api, streaming + non-streaming), openai chat-completions-compatible shared helper, anthropic (via the existing cached_input_tokens plumbing), azure, chatgpt, cohere, copilot, deepseek, galadriel, gemini (completion, streaming, interactions_api), groq, huggingface, hyperbolic, llamafile, mira, mistral, moonshot, ollama, openrouter, perplexity, together, xai.

Notes

  • No provider wire-format changes. Provider response structs that happen to deserialize a JSON field named cached_tokens (e.g. DeepSeek's prompt_tokens_details.cached_tokens, OpenAI's input_tokens_details.cached_tokens) keep those names — those are the provider's API, not our telemetry.
  • Per OTel guidance, cache_read.input_tokens SHOULD be included in gen_ai.usage.input_tokens. Existing providers already follow that convention; this PR does not change any token-accounting logic.
  • This is a breaking change for any downstream dashboards/alerts that queried the old attribute name.

Test plan

  • cargo fmt -- --check
  • cargo clippy -p rig-core --all-targets --all-features — clean
  • cargo test -p rig-core --all-features --lib — 555 passed, 0 failed, 8 ignored
  • grep -rn "gen_ai.usage.cached_tokens" rig/ returns no matches

Provider spans declared `gen_ai.usage.cached_tokens` (non-OTel) but the
shared telemetry helpers (`SpanCombinator::record_token_usage`,
`openai_chat_completions_compatible::record_usage`) write the OTel-
standard `gen_ai.usage.cache_read.input_tokens`. Because `tracing`
silently drops `.record()` for fields not declared on the span, cached-
token values were being computed and thrown away on every provider whose
span did not declare the OTel name.

Declare `gen_ai.usage.cache_read.input_tokens` on every affected span
across OpenAI, Azure, Cohere, Gemini (incl. interactions_api), Groq,
DeepSeek, HuggingFace, Mistral, OpenRouter, Together, xAI, Copilot,
and llamafile. Emit `cache_read.input_tokens` from the OpenAI chat-
completions compatible helper and the OpenAI Responses API record
sites. Do not emit `cache_creation.input_tokens` on OpenAI-family spans
— those APIs have no cache-creation concept and a hardcoded 0 would be
misleading rather than informative. Anthropic, which does report cache
creation, is unchanged.

Remove the non-OTel `gen_ai.usage.cached_tokens` attribute from every
path this change touches: drop the span declaration and, on paths whose
only recording sites are modified here, drop the record calls too. Only
the OTel-standard attribute is emitted from these paths.

Spans that record cache tokens entirely through their own inline
`span.record("gen_ai.usage.cached_tokens", ...)` calls (e.g. non-
streaming paths of Groq, DeepSeek, Copilot; standalone files like
Galadriel, Hyperbolic, Mira, Moonshot, Ollama, Perplexity, ChatGPT;
non-streaming Together and xAI) are out of scope for this change and
continue to emit only `cached_tokens`.
Galadriel, Hyperbolic, Mira, Moonshot, and Perplexity all route their
streaming paths through `send_compatible_streaming_request`, whose
shared `record_usage` helper was updated in the prior commit to write
the OTel-standard `gen_ai.usage.cache_read.input_tokens`. These five
provider spans still declared `gen_ai.usage.cached_tokens`, so the
newly-recorded value was silently dropped by `tracing` — and since
none of these files inline-records `cached_tokens` either, their
streaming paths were emitting no cache-read metric at all after the
prior commit landed.

Rename the declaration to `gen_ai.usage.cache_read.input_tokens` on
both the non-streaming and streaming span in each file, matching the
pattern the prior commit already applied across the other providers.
The non-streaming rename is a no-op (no recorder targets the field on
that path) but keeps both spans in each file consistent.
Prior two commits partially renamed gen_ai.usage.cached_tokens to the
canonical OTel GenAI attribute gen_ai.usage.cache_read.input_tokens but
left several providers (and some streaming-vs-non-streaming paths within
a provider) emitting the old name. Finish the rename in chatgpt,
copilot, deepseek (non-streaming), groq (non-streaming), ollama,
together/completion, and xai/completion so every span consistently
emits cache_read.input_tokens.
Copy link
Copy Markdown

@anish-kristipati anish-kristipati left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch on the caching telemetry issues

@gold-silver-copper gold-silver-copper added this pull request to the merge queue Apr 28, 2026
Merged via the queue into 0xPlaygrounds:main with commit 3a07cb4 Apr 28, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants