fix(langchain): stop double-counting anthropic cache tokens in prompt totals by AbhiPrasad · Pull Request #510 · braintrustdata/braintrust-sdk-python

Abhijeet Prasad (AbhiPrasad) · 2026-06-11T17:55:59Z

supercedes #504

see https://github.com/braintrustdata/braintrust-spec/blob/main/docs/features/prompt-cache.md

… totals langchain-anthropic has folded cache read/creation tokens into usage_metadata input_tokens since 0.2.3 (versions before that don't emit input_token_details at all), and langchain-aws does the same — per the langchain-core UsageMetadata contract, input_token_details is a breakdown of input_tokens, not an addition to it. The cache normalization from #411/#445 detected "separate cache token accounting" by the presence of cache_creation/ephemeral_* detail keys, which langchain-anthropic always emits, so every cached Anthropic call had cache tokens added to prompt_tokens a second time. With a warm cache this roughly doubles reported prompt tokens (e.g. a real trace reported 75,387 prompt tokens for a 37,694-token request with 37,324 cache reads and 369 cache writes). Detect separate accounting arithmetically instead: only fold cache tokens into prompt/total when they exceed the reported prompt total, which is impossible under the UsageMetadata contract but is exactly the inconsistency the original normalization (BT-5150) was added to repair. Strengthen the VCR prompt-caching test to assert span prompt/total tokens equal the usage_metadata the model reported, and add unit coverage for the folded (Anthropic), subset (OpenAI), and separate (legacy) conventions. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

bhaveshklaviyo and others added 3 commits June 9, 2026 18:16

Merge branch 'main' into fix/langchain-anthropic-cache-double-count

c7dc3fc

remove dup mock test

ae30959

Abhijeet Prasad (AbhiPrasad) requested review from Stephen Belanger (Qard) and Luca Forstner (lforst) June 11, 2026 17:55

Abhijeet Prasad (AbhiPrasad) self-assigned this Jun 11, 2026

Abhijeet Prasad (AbhiPrasad) mentioned this pull request Jun 11, 2026

fix(langchain): stop double-counting anthropic cache tokens in prompt totals #504

Closed

Stephen Belanger (Qard) approved these changes Jun 11, 2026

View reviewed changes

Abhijeet Prasad (AbhiPrasad) merged commit bf9dfaf into main Jun 11, 2026
82 checks passed

Abhijeet Prasad (AbhiPrasad) deleted the fix/langchain-anthropic-cache-double-count branch June 11, 2026 18:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(langchain): stop double-counting anthropic cache tokens in prompt totals#510

fix(langchain): stop double-counting anthropic cache tokens in prompt totals#510
Abhijeet Prasad (AbhiPrasad) merged 3 commits into
mainfrom
fix/langchain-anthropic-cache-double-count

Abhijeet Prasad (AbhiPrasad) commented Jun 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Abhijeet Prasad (AbhiPrasad) commented Jun 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants