fix(langchain): stop double-counting anthropic cache tokens in prompt totals#510
Merged
Abhijeet Prasad (AbhiPrasad) merged 3 commits intoJun 11, 2026
Conversation
… totals langchain-anthropic has folded cache read/creation tokens into usage_metadata input_tokens since 0.2.3 (versions before that don't emit input_token_details at all), and langchain-aws does the same — per the langchain-core UsageMetadata contract, input_token_details is a breakdown of input_tokens, not an addition to it. The cache normalization from #411/#445 detected "separate cache token accounting" by the presence of cache_creation/ephemeral_* detail keys, which langchain-anthropic always emits, so every cached Anthropic call had cache tokens added to prompt_tokens a second time. With a warm cache this roughly doubles reported prompt tokens (e.g. a real trace reported 75,387 prompt tokens for a 37,694-token request with 37,324 cache reads and 369 cache writes). Detect separate accounting arithmetically instead: only fold cache tokens into prompt/total when they exceed the reported prompt total, which is impossible under the UsageMetadata contract but is exactly the inconsistency the original normalization (BT-5150) was added to repair. Strengthen the VCR prompt-caching test to assert span prompt/total tokens equal the usage_metadata the model reported, and add unit coverage for the folded (Anthropic), subset (OpenAI), and separate (legacy) conventions. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Stephen Belanger (Qard)
approved these changes
Jun 11, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
supercedes #504
see https://github.com/braintrustdata/braintrust-spec/blob/main/docs/features/prompt-cache.md