Skip to content

Multi-step tool calls overwrite token counts instead of accumulating (output, reasoning, cache.write lost) #21913

@KonstantinMirin

Description

@KonstantinMirin

Bug

In multi-step tool-call assistant messages, processor.ts overwrites assistantMessage.tokens on each finish-step event instead of accumulating additive fields. Only the last step's token counts survive.

Root cause: processor.ts line ~362:

ctx.assistantMessage.tokens = usage.tokens  // overwrite!

While ctx.assistantMessage.cost += usage.cost correctly accumulates, tokens is replaced wholesale.

Impact

For an assistant message with N tool-call steps:

Field Current Correct Why
input Last step's value Last step's value Each step's inputTokens includes the full conversation prompt → last step is already correct
cache.read Last step's value Last step's value Cache read reflects current cache state → snapshot, not cumulative
output Last step only Sum across all steps Each step produces new output tokens
reasoning Last step only Sum across all steps Each step produces new reasoning tokens
cache.write Last step only Sum across all steps Each step may write new entries to cache
total API totalTokens from last step Derived from components totalTokens = inputTokens + outputTokens, but our input is adjusted (cache subtracted)
cost Correctly accumulated No change Already uses +=

Also fixes:

  • Context % display and compaction used total (which double-counted cached tokens) instead of deriving from components
  • Custom provider models without limit.context defaulted to 0, breaking context % display and disabling auto-compaction entirely

Changes

1. Token accumulation in processor.ts

Replace the overwrite with field-wise accumulation:

const prev = ctx.assistantMessage.tokens
ctx.assistantMessage.tokens = {
  total: usage.total,
  input: usage.tokens.input,
  output: (prev?.output ?? 0) + usage.tokens.output,
  reasoning: (prev?.reasoning ?? 0) + usage.tokens.reasoning,
  cache: {
    read: usage.tokens.cache.read,
    write: (prev?.cache?.write ?? 0) + usage.tokens.cache.write,
  },
}

2. Derive total from components in getUsage()

total is now computed as input + output + reasoning + cache.read + cache.write instead of using the API's totalTokens (which double-counts cached tokens since AI SDK v6 includes them in inputTokens).

3. Add MessageV2.promptSize() and MessageV2.totalSize() helpers

  • totalSize = input + output + reasoning + cache.read + cache.write — full conversation size after this turn (used for context % display and compaction threshold)
  • promptSize = input + cache.read + cache.write — current prompt footprint (input tokens sent to the LLM)

overflow.ts uses totalSize for compaction because output/reasoning tokens become part of the context on the next turn.

4. Fix limit.context default from 0 to 128,000

Custom provider models not in models.dev no longer get context: 0 which broke context % display and disabled auto-compaction. Also limit.output defaults to 4,096 instead of 0.

5. ACP usage fix

acp/agent.ts now includes cache.write in the used token count for usage reporting (input + cache.read + cache.write).

Related issues

Metadata

Metadata

Assignees

Labels

coreAnything pertaining to core functionality of the application (opencode server stuff)perfIndicates a performance issue or need for optimization

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions