📊 cli-proxy token usage impact: preliminary observations

## Summary

The cli-proxy feature (replacing GitHub MCP server with `gh` CLI commands routed through a local proxy) shows a **~24% reduction in token usage and cost** in the initial Smoke Copilot data. This is a single data point — more runs are needed to confirm the trend.

## What cli-proxy changes

With cli-proxy enabled (`features: cli-proxy: true`):
- **Before**: Agent → GitHub MCP server (tool schemas injected into context) → GitHub API
- **After**: Agent → `gh` bash commands → cli-proxy → DIFC proxy → GitHub API
- **LLM calls unchanged**: Agent → api-proxy (token tracker) → Copilot API

The key savings mechanism: **MCP tool schemas are removed from the context window**. GitHub MCP tools inject ~22 tool definitions (~500-700 tokens each, ~10-15K total) into every LLM turn. With cli-proxy, the agent uses `gh` CLI commands via bash instead, which requires no schema injection.

## Observed data

### Smoke Copilot (only workflow with before/after data)

| Metric | Pre-cli-proxy (Apr 7) | Post-cli-proxy (Apr 10) | Change |
|--------|----------------------|------------------------|--------|
| Tokens/run | ~334K | ~262K | **-21.6%** |
| Cost/run | $0.68 | $0.52 | **-23.5%** |
| I/O ratio | 156:1 | 235:1 | Higher (less output per input) |
| Cache hit rate | 42.2% | — | — |
| Requests/run | ~4 | 5 | Similar |

**Source data:**
- Pre-cli-proxy: Report [#1768](https://github.com/github/gh-aw-firewall/issues/1768) (Apr 7, 4 runs, avg $0.68/run)
- Post-cli-proxy: Manual artifact analysis of run [§24222066741](https://github.com/github/gh-aw-firewall/actions/runs/24222066741) (Apr 10, 5 requests, 262K tokens, $0.52)

**cli-proxy enabled**: [PR #1820](https://github.com/github/gh-aw-firewall/pull/1820), merged Apr 8

### Other cli-proxy workflows (no before/after comparison yet)

- **Smoke Services** (`cli-proxy: true` in [PR #1862](https://github.com/github/gh-aw-firewall/pull/1862), not yet merged)
- **Firewall Issue Dispatcher** (`cli-proxy: true` in PR #1862, not yet merged)

## Caveats

1. **Limited data**: Only 1 post-cli-proxy Smoke Copilot run was analyzed (the analyzer workflow had a bug preventing it from finding the data — see below)
2. **Branch difference**: The post-cli-proxy run was on a PR branch (`chore/upgrade-ghaw-v0.68.0`), not main. The gh-aw version upgrade may contribute to the token difference.
3. **I/O ratio increased**: The higher I/O ratio (235:1 vs 156:1) suggests the agent may be sending more context per request, though producing similar output. This could be a side effect of longer bash command outputs vs structured MCP tool responses.
4. **No cache data**: The post-cli-proxy run did not expose cache read/write breakdown in the raw JSONL (all via `copilot` provider), making cache comparison impossible.

## Token analyzer data gap (resolved)

The daily token usage reports (#1878, #1818) reported "Smoke Copilot produces no token-usage.jsonl" — this was **incorrect**. The data existed in the `firewall-audit-logs` artifact but the analyzer workflow was downloading the wrong artifact name (`agent-artifacts`).

**Fixes:**
- [PR #1883](https://github.com/github/gh-aw-firewall/pull/1883): corrects the artifact name
- [PR #1884](https://github.com/github/gh-aw-firewall/pull/1884): refactors all 4 token workflows to use `gh aw logs --json` (which handles artifact naming internally)

Once merged, future reports will correctly include Smoke Copilot data, enabling proper trend tracking.

## Expected savings at scale

If the ~24% reduction holds across workflows:

| Workflow | Current avg cost/run | Projected with cli-proxy | Daily runs | Daily savings |
|----------|---------------------|-------------------------|------------|--------------|
| Smoke Copilot | $0.68 | $0.52 | ~4 | ~$0.64 |
| Build Test Suite | $4.54 | ~$3.45 | ~3 | ~$3.27 |
| Secret Digger | $0.40 (success) | ~$0.30 | ~15 | ~$1.50 |

**Note**: Build Test Suite and Secret Digger do not currently use cli-proxy. These are projections based on the Smoke Copilot observation.

## Next steps

- [ ] Merge PRs #1883/#1884 to fix token analyzer data collection
- [ ] Monitor next 5-7 days of Smoke Copilot reports for trend confirmation
- [ ] Merge PR #1862 to enable cli-proxy on Smoke Services and Firewall Issue Dispatcher
- [ ] Consider enabling cli-proxy on Build Test Suite ($4.54/run — highest potential savings)
- [ ] Investigate whether the I/O ratio increase is a concern or expected behavior

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

📊 cli-proxy token usage impact: preliminary observations #1885

Summary

What cli-proxy changes

Observed data

Smoke Copilot (only workflow with before/after data)

Other cli-proxy workflows (no before/after comparison yet)

Caveats

Token analyzer data gap (resolved)

Expected savings at scale

Next steps

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Metric	Pre-cli-proxy (Apr 7)	Post-cli-proxy (Apr 10)	Change
Tokens/run	~334K	~262K	-21.6%
Cost/run	$0.68	$0.52	-23.5%
I/O ratio	156:1	235:1	Higher (less output per input)
Cache hit rate	42.2%	—	—
Requests/run	~4	5	Similar

Workflow	Current avg cost/run	Projected with cli-proxy	Daily runs	Daily savings
Smoke Copilot	$0.68	$0.52	~4	~$0.64
Build Test Suite	$4.54	~$3.45	~3	~$3.27
Secret Digger	$0.40 (success)	~$0.30	~15	~$1.50

📊 cli-proxy token usage impact: preliminary observations #1885

Description

Summary

What cli-proxy changes

Observed data

Smoke Copilot (only workflow with before/after data)

Other cli-proxy workflows (no before/after comparison yet)

Caveats

Token analyzer data gap (resolved)

Expected savings at scale

Next steps

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions