fix(telemetry): emit OTel-standard gen_ai.usage.cache_read.input_tokens across providers#1666
Merged
gold-silver-copper merged 3 commits into0xPlaygrounds:mainfrom Apr 28, 2026
Conversation
Provider spans declared `gen_ai.usage.cached_tokens` (non-OTel) but the
shared telemetry helpers (`SpanCombinator::record_token_usage`,
`openai_chat_completions_compatible::record_usage`) write the OTel-
standard `gen_ai.usage.cache_read.input_tokens`. Because `tracing`
silently drops `.record()` for fields not declared on the span, cached-
token values were being computed and thrown away on every provider whose
span did not declare the OTel name.
Declare `gen_ai.usage.cache_read.input_tokens` on every affected span
across OpenAI, Azure, Cohere, Gemini (incl. interactions_api), Groq,
DeepSeek, HuggingFace, Mistral, OpenRouter, Together, xAI, Copilot,
and llamafile. Emit `cache_read.input_tokens` from the OpenAI chat-
completions compatible helper and the OpenAI Responses API record
sites. Do not emit `cache_creation.input_tokens` on OpenAI-family spans
— those APIs have no cache-creation concept and a hardcoded 0 would be
misleading rather than informative. Anthropic, which does report cache
creation, is unchanged.
Remove the non-OTel `gen_ai.usage.cached_tokens` attribute from every
path this change touches: drop the span declaration and, on paths whose
only recording sites are modified here, drop the record calls too. Only
the OTel-standard attribute is emitted from these paths.
Spans that record cache tokens entirely through their own inline
`span.record("gen_ai.usage.cached_tokens", ...)` calls (e.g. non-
streaming paths of Groq, DeepSeek, Copilot; standalone files like
Galadriel, Hyperbolic, Mira, Moonshot, Ollama, Perplexity, ChatGPT;
non-streaming Together and xAI) are out of scope for this change and
continue to emit only `cached_tokens`.
Galadriel, Hyperbolic, Mira, Moonshot, and Perplexity all route their streaming paths through `send_compatible_streaming_request`, whose shared `record_usage` helper was updated in the prior commit to write the OTel-standard `gen_ai.usage.cache_read.input_tokens`. These five provider spans still declared `gen_ai.usage.cached_tokens`, so the newly-recorded value was silently dropped by `tracing` — and since none of these files inline-records `cached_tokens` either, their streaming paths were emitting no cache-read metric at all after the prior commit landed. Rename the declaration to `gen_ai.usage.cache_read.input_tokens` on both the non-streaming and streaming span in each file, matching the pattern the prior commit already applied across the other providers. The non-streaming rename is a no-op (no recorder targets the field on that path) but keeps both spans in each file consistent.
Prior two commits partially renamed gen_ai.usage.cached_tokens to the canonical OTel GenAI attribute gen_ai.usage.cache_read.input_tokens but left several providers (and some streaming-vs-non-streaming paths within a provider) emitting the old name. Finish the rename in chatgpt, copilot, deepseek (non-streaming), groq (non-streaming), ollama, together/completion, and xai/completion so every span consistently emits cache_read.input_tokens.
anish-kristipati
approved these changes
Apr 24, 2026
anish-kristipati
left a comment
There was a problem hiding this comment.
Good catch on the caching telemetry issues
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
gen_ai.usage.cached_tokenstogen_ai.usage.cache_read.input_tokensacross every provider that emits it.gen_ai.usage.cache_read.input_tokensis the canonical attribute name listed in the OpenTelemetry GenAI semantic-conventions registry. The previous name was non-standard and would not be picked up by OTel-aware backends.span.record(...)call sites, streaming and non-streaming paths — has been updated so a single provider never emits two different attribute names.Providers touched
openai (chat + responses_api, streaming + non-streaming), openai chat-completions-compatible shared helper, anthropic (via the existing
cached_input_tokensplumbing), azure, chatgpt, cohere, copilot, deepseek, galadriel, gemini (completion, streaming, interactions_api), groq, huggingface, hyperbolic, llamafile, mira, mistral, moonshot, ollama, openrouter, perplexity, together, xai.Notes
cached_tokens(e.g. DeepSeek'sprompt_tokens_details.cached_tokens, OpenAI'sinput_tokens_details.cached_tokens) keep those names — those are the provider's API, not our telemetry.cache_read.input_tokensSHOULD be included ingen_ai.usage.input_tokens. Existing providers already follow that convention; this PR does not change any token-accounting logic.Test plan
cargo fmt -- --checkcargo clippy -p rig-core --all-targets --all-features— cleancargo test -p rig-core --all-features --lib— 555 passed, 0 failed, 8 ignoredgrep -rn "gen_ai.usage.cached_tokens" rig/returns no matches