Add TTL cache + cache exclusion handling#299
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 69f7224b26
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "Codex (@codex) review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "Codex (@codex) address that feedback".
| prompt_cache_creation_5m_tokens: usage | ||
| .get("cache_creation") | ||
| .and_then(|c| c.get("ephemeral_5m_input_tokens")) | ||
| .and_then(Value::as_i64), | ||
| prompt_cache_creation_1h_tokens: usage | ||
| .get("cache_creation") | ||
| .and_then(|c| c.get("ephemeral_1h_input_tokens")) | ||
| .and_then(Value::as_i64), |
There was a problem hiding this comment.
Keep TTL usage parsing behind typed boundaries
The project guide in AGENTS.md says provider-boundary code must not inspect serde_json::Value directly for provider semantics and must deserialize into typed structs/enums first; these new TTL fields pluck cache_creation and its children directly from the raw map. Since Anthropic already has generated Usage/CacheCreation types for this shape, parsing the TTL split through a typed view avoids bypassing the typed-boundary check and drifting from the canonical serde renames.
Useful? React with 👍 / 👎.
| pub fn inclusive_prompt_tokens(&self) -> Option<i64> { | ||
| if !self.prompt_tokens_exclude_cache { | ||
| return self.prompt_tokens; | ||
| } | ||
| if self.prompt_tokens.is_none() | ||
| && self.prompt_cached_tokens.is_none() | ||
| && self.prompt_cache_creation_tokens.is_none() | ||
| { | ||
| return None; | ||
| } | ||
| Some( | ||
| self.prompt_tokens.unwrap_or(0) | ||
| + self.prompt_cached_tokens.unwrap_or(0) | ||
| + self.prompt_cache_creation_tokens.unwrap_or(0), | ||
| ) |
There was a problem hiding this comment.
Use inclusive prompt totals for OpenAI-style usage
This helper is added to compute OpenAI-convention prompt totals, but the provider serializers still read self.prompt_tokens directly for OpenAI/Responses/Google. For an Anthropic or Converse response with input_tokens: 10 and cache read/write tokens, transforming to OpenAI-style output will still emit prompt_tokens: 10 and an understated total_tokens, even though the new flag says those prompt tokens exclude cache buckets. Wire this helper into the OpenAI-style to_provider_value paths and Google UsageMetadata conversion so cached tokens are counted consistently.
Useful? React with 👍 / 👎.
Add support for lingua to take in the TTL cache that Anthropic supports and also convert it to the openAI convention so logging is consistent