fix: force gc + clear_cache after KV prefix cache eviction by adurham · Pull Request #1832 · exo-explore/exo

adurham · 2026-04-02T02:41:56Z

Summary

After KVPrefixCache evicts LRU entries, the MLX Metal buffers stay allocated until Python's GC runs
This leaks ~3-4 GB between long-context requests, reducing the effective context ceiling for back-to-back requests
Adding gc.collect() + mx.clear_cache() after eviction frees Metal buffers promptly

Test plan

Measured on 2-node PP cluster with Qwen3.5-397B-A17B-4bit at 63K context
Before: 108.88 GB retained after eviction (3.78 GB above baseline)
After: 105.48 GB retained after eviction (0.38 GB above baseline — draft model KV + minor overhead)
gc.collect() adds ~2-3ms latency, runs once per eviction cycle (not per token)
Verify with uv run pytest

🤖 Generated with Claude Code

When `KVPrefixCache` evicts LRU entries under memory pressure, the Python list `pop()` removes references but the underlying MLX Metal buffers stay allocated until Python's garbage collector runs. This causes ~3-4 GB of leaked memory between long-context requests, reducing the effective context ceiling for back-to-back requests. Adding `gc.collect()` + `mx.clear_cache()` after eviction ensures Metal buffers are freed promptly. Measured: 3.78 GB leak reduced to 0.38 GB (draft model KV + minor Python heap overhead). The `gc.collect()` call adds a few milliseconds of latency but only runs once per eviction cycle (not per token), so the impact on generation throughput is negligible. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: force gc + clear_cache after KV prefix cache eviction#1832

fix: force gc + clear_cache after KV prefix cache eviction#1832
adurham wants to merge 1 commit intoexo-explore:mainfrom
adurham:fix/gc-after-kv-eviction

adurham commented Apr 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

adurham commented Apr 2, 2026

Summary

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant