Summary
Voyage 4's shared embedding space allows mixing models within the same vector collection — vectors from voyage-4-lite and voyage-4-large are directly comparable via cosine similarity. This opens up a cost optimization where we use a cheaper/faster model on store and a more accurate model on recall.
Concept
| Operation |
Model |
Why |
| Store (memory creation) |
voyage-4-lite |
Cheapest, fastest, lowest latency. Memories are written constantly. |
| Recall (query time) |
voyage-4-large |
Most accurate query understanding. Recalls happen less frequently. |
This works because all Voyage 4 family models (voyage-4-large, voyage-4, voyage-4-lite, voyage-4-nano) produce vectors in the same geometric space. A query vector from voyage-4-large can match against document vectors from voyage-4-lite with minimal quality loss — per Voyage's claims.
Why this matters for AutoMem
- Cost optimization: Store is high-volume (every memory), recall is lower-volume (user queries). Using lite for the hot path saves money.
- Latency: Faster embedding on the store path means lower write latency.
- Upgradable without re-embedding: Users can start with voyage-4-lite everywhere, then upgrade recall to voyage-4-large without touching stored vectors.
- Product differentiator: No other memory service offers this. Could be a unique selling point.
Validation needed
Before making this the default, we need to benchmark on our test suite:
1. Baseline: Voyage 4 single-model
- Run LoCoMo benchmark with
voyage-4 on both store and recall
- Compare against current
text-embedding-3-large baseline (90.53%)
2. Mixed-model test
- Store all memories with
voyage-4-lite
- Recall with
voyage-4-large
- Compare accuracy vs single-model baseline
- Measure quality degradation (if any)
3. Cost/performance matrix
| Config |
Store model |
Recall model |
LoCoMo score |
Cost/1M tokens (store) |
Cost/1M tokens (recall) |
| Current default |
text-embedding-3-large |
text-embedding-3-large |
90.53% |
$0.13 |
$0.13 |
| Voyage single |
voyage-4 |
voyage-4 |
? |
$0.06 |
$0.06 |
| Voyage mixed |
voyage-4-lite |
voyage-4-large |
? |
? |
? |
| Voyage lite-only |
voyage-4-lite |
voyage-4-lite |
? |
? |
? |
Implementation
Phase 1: Config support
Add new env vars:
VOYAGE_STORE_MODEL=voyage-4-lite
VOYAGE_RECALL_MODEL=voyage-4-large
When both are set, the Voyage provider uses the store model for generate_embedding() calls from the store path, and the recall model for embeddings generated during recall/search.
Phase 2: Code changes
VoyageEmbeddingProvider needs to support two model handles
- Store path (
/store endpoint) calls with store model
- Recall path (
/recall endpoint) calls with recall model
- Fallback: if only
VOYAGE_MODEL is set (current behavior), use same model for both
Phase 3: If benchmarks validate
- Make Voyage the default
EMBEDDING_PROVIDER
- Default to mixed-model config
- Update docs and migration guide
Related
Summary
Voyage 4's shared embedding space allows mixing models within the same vector collection — vectors from voyage-4-lite and voyage-4-large are directly comparable via cosine similarity. This opens up a cost optimization where we use a cheaper/faster model on store and a more accurate model on recall.
Concept
voyage-4-litevoyage-4-largeThis works because all Voyage 4 family models (voyage-4-large, voyage-4, voyage-4-lite, voyage-4-nano) produce vectors in the same geometric space. A query vector from
voyage-4-largecan match against document vectors fromvoyage-4-litewith minimal quality loss — per Voyage's claims.Why this matters for AutoMem
Validation needed
Before making this the default, we need to benchmark on our test suite:
1. Baseline: Voyage 4 single-model
voyage-4on both store and recalltext-embedding-3-largebaseline (90.53%)2. Mixed-model test
voyage-4-litevoyage-4-large3. Cost/performance matrix
Implementation
Phase 1: Config support
Add new env vars:
When both are set, the Voyage provider uses the store model for
generate_embedding()calls from the store path, and the recall model for embeddings generated during recall/search.Phase 2: Code changes
VoyageEmbeddingProviderneeds to support two model handles/storeendpoint) calls with store model/recallendpoint) calls with recall modelVOYAGE_MODELis set (current behavior), use same model for bothPhase 3: If benchmarks validate
EMBEDDING_PROVIDERRelated