Bug
In multi-step tool-call assistant messages, processor.ts overwrites assistantMessage.tokens on each finish-step event instead of accumulating additive fields. Only the last step's token counts survive.
Root cause: processor.ts line ~362:
ctx.assistantMessage.tokens = usage.tokens // overwrite!
While ctx.assistantMessage.cost += usage.cost correctly accumulates, tokens is replaced wholesale.
Impact
For an assistant message with N tool-call steps:
| Field |
Current |
Correct |
Why |
input |
Last step's value |
Last step's value |
Each step's inputTokens includes the full conversation prompt → last step is already correct |
cache.read |
Last step's value |
Last step's value |
Cache read reflects current cache state → snapshot, not cumulative |
output |
Last step only |
Sum across all steps |
Each step produces new output tokens |
reasoning |
Last step only |
Sum across all steps |
Each step produces new reasoning tokens |
cache.write |
Last step only |
Sum across all steps |
Each step may write new entries to cache |
total |
API totalTokens from last step |
Derived from components |
totalTokens = inputTokens + outputTokens, but our input is adjusted (cache subtracted) |
cost |
Correctly accumulated |
No change |
Already uses += |
Also fixes:
- Context % display and compaction used
total (which double-counted cached tokens) instead of deriving from components
- Custom provider models without
limit.context defaulted to 0, breaking context % display and disabling auto-compaction entirely
Changes
1. Token accumulation in processor.ts
Replace the overwrite with field-wise accumulation:
const prev = ctx.assistantMessage.tokens
ctx.assistantMessage.tokens = {
total: usage.total,
input: usage.tokens.input,
output: (prev?.output ?? 0) + usage.tokens.output,
reasoning: (prev?.reasoning ?? 0) + usage.tokens.reasoning,
cache: {
read: usage.tokens.cache.read,
write: (prev?.cache?.write ?? 0) + usage.tokens.cache.write,
},
}
2. Derive total from components in getUsage()
total is now computed as input + output + reasoning + cache.read + cache.write instead of using the API's totalTokens (which double-counts cached tokens since AI SDK v6 includes them in inputTokens).
3. Add MessageV2.promptSize() and MessageV2.totalSize() helpers
totalSize = input + output + reasoning + cache.read + cache.write — full conversation size after this turn (used for context % display and compaction threshold)
promptSize = input + cache.read + cache.write — current prompt footprint (input tokens sent to the LLM)
overflow.ts uses totalSize for compaction because output/reasoning tokens become part of the context on the next turn.
4. Fix limit.context default from 0 to 128,000
Custom provider models not in models.dev no longer get context: 0 which broke context % display and disabled auto-compaction. Also limit.output defaults to 4,096 instead of 0.
5. ACP usage fix
acp/agent.ts now includes cache.write in the used token count for usage reporting (input + cache.read + cache.write).
Related issues
Bug
In multi-step tool-call assistant messages,
processor.tsoverwritesassistantMessage.tokenson eachfinish-stepevent instead of accumulating additive fields. Only the last step's token counts survive.Root cause:
processor.tsline ~362:While
ctx.assistantMessage.cost += usage.costcorrectly accumulates,tokensis replaced wholesale.Impact
For an assistant message with N tool-call steps:
inputinputTokensincludes the full conversation prompt → last step is already correctcache.readoutputreasoningcache.writetotaltotalTokensfrom last steptotalTokens=inputTokens + outputTokens, but ourinputis adjusted (cache subtracted)cost+=Also fixes:
total(which double-counted cached tokens) instead of deriving from componentslimit.contextdefaulted to0, breaking context % display and disabling auto-compaction entirelyChanges
1. Token accumulation in
processor.tsReplace the overwrite with field-wise accumulation:
2. Derive
totalfrom components ingetUsage()totalis now computed asinput + output + reasoning + cache.read + cache.writeinstead of using the API'stotalTokens(which double-counts cached tokens since AI SDK v6 includes them ininputTokens).3. Add
MessageV2.promptSize()andMessageV2.totalSize()helperstotalSize = input + output + reasoning + cache.read + cache.write— full conversation size after this turn (used for context % display and compaction threshold)promptSize = input + cache.read + cache.write— current prompt footprint (input tokens sent to the LLM)overflow.tsusestotalSizefor compaction because output/reasoning tokens become part of the context on the next turn.4. Fix
limit.contextdefault from0to128,000Custom provider models not in models.dev no longer get
context: 0which broke context % display and disabled auto-compaction. Alsolimit.outputdefaults to4,096instead of0.5. ACP usage fix
acp/agent.tsnow includescache.writein theusedtoken count for usage reporting (input + cache.read + cache.write).Related issues
@ai-sdk/openaiResponses API