Skip to content

πŸ“Š cli-proxy token usage impact: preliminary observationsΒ #1885

@lpcox

Description

@lpcox

Summary

The cli-proxy feature (replacing GitHub MCP server with gh CLI commands routed through a local proxy) shows a ~24% reduction in token usage and cost in the initial Smoke Copilot data. This is a single data point β€” more runs are needed to confirm the trend.

What cli-proxy changes

With cli-proxy enabled (features: cli-proxy: true):

  • Before: Agent β†’ GitHub MCP server (tool schemas injected into context) β†’ GitHub API
  • After: Agent β†’ gh bash commands β†’ cli-proxy β†’ DIFC proxy β†’ GitHub API
  • LLM calls unchanged: Agent β†’ api-proxy (token tracker) β†’ Copilot API

The key savings mechanism: MCP tool schemas are removed from the context window. GitHub MCP tools inject ~22 tool definitions (~500-700 tokens each, ~10-15K total) into every LLM turn. With cli-proxy, the agent uses gh CLI commands via bash instead, which requires no schema injection.

Observed data

Smoke Copilot (only workflow with before/after data)

Metric Pre-cli-proxy (Apr 7) Post-cli-proxy (Apr 10) Change
Tokens/run ~334K ~262K -21.6%
Cost/run $0.68 $0.52 -23.5%
I/O ratio 156:1 235:1 Higher (less output per input)
Cache hit rate 42.2% β€” β€”
Requests/run ~4 5 Similar

Source data:

  • Pre-cli-proxy: Report #1768 (Apr 7, 4 runs, avg $0.68/run)
  • Post-cli-proxy: Manual artifact analysis of run Β§24222066741 (Apr 10, 5 requests, 262K tokens, $0.52)

cli-proxy enabled: PR #1820, merged Apr 8

Other cli-proxy workflows (no before/after comparison yet)

Caveats

  1. Limited data: Only 1 post-cli-proxy Smoke Copilot run was analyzed (the analyzer workflow had a bug preventing it from finding the data β€” see below)
  2. Branch difference: The post-cli-proxy run was on a PR branch (chore/upgrade-ghaw-v0.68.0), not main. The gh-aw version upgrade may contribute to the token difference.
  3. I/O ratio increased: The higher I/O ratio (235:1 vs 156:1) suggests the agent may be sending more context per request, though producing similar output. This could be a side effect of longer bash command outputs vs structured MCP tool responses.
  4. No cache data: The post-cli-proxy run did not expose cache read/write breakdown in the raw JSONL (all via copilot provider), making cache comparison impossible.

Token analyzer data gap (resolved)

The daily token usage reports (#1878, #1818) reported "Smoke Copilot produces no token-usage.jsonl" β€” this was incorrect. The data existed in the firewall-audit-logs artifact but the analyzer workflow was downloading the wrong artifact name (agent-artifacts).

Fixes:

  • PR #1883: corrects the artifact name
  • PR #1884: refactors all 4 token workflows to use gh aw logs --json (which handles artifact naming internally)

Once merged, future reports will correctly include Smoke Copilot data, enabling proper trend tracking.

Expected savings at scale

If the ~24% reduction holds across workflows:

Workflow Current avg cost/run Projected with cli-proxy Daily runs Daily savings
Smoke Copilot $0.68 $0.52 ~4 ~$0.64
Build Test Suite $4.54 ~$3.45 ~3 ~$3.27
Secret Digger $0.40 (success) ~$0.30 ~15 ~$1.50

Note: Build Test Suite and Secret Digger do not currently use cli-proxy. These are projections based on the Smoke Copilot observation.

Next steps

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions