feat: Voyage 4 mixed-model embedding (lite on store, large on recall)

## Summary

Voyage 4's shared embedding space allows mixing models within the same vector collection — vectors from voyage-4-lite and voyage-4-large are directly comparable via cosine similarity. This opens up a cost optimization where we use a cheaper/faster model on store and a more accurate model on recall.

## Concept

| Operation | Model | Why |
|-----------|-------|-----|
| **Store** (memory creation) | `voyage-4-lite` | Cheapest, fastest, lowest latency. Memories are written constantly. |
| **Recall** (query time) | `voyage-4-large` | Most accurate query understanding. Recalls happen less frequently. |

This works because all Voyage 4 family models (voyage-4-large, voyage-4, voyage-4-lite, voyage-4-nano) produce vectors in the same geometric space. A query vector from `voyage-4-large` can match against document vectors from `voyage-4-lite` with minimal quality loss — per Voyage's claims.

## Why this matters for AutoMem

- **Cost optimization**: Store is high-volume (every memory), recall is lower-volume (user queries). Using lite for the hot path saves money.
- **Latency**: Faster embedding on the store path means lower write latency.
- **Upgradable without re-embedding**: Users can start with voyage-4-lite everywhere, then upgrade recall to voyage-4-large without touching stored vectors.
- **Product differentiator**: No other memory service offers this. Could be a unique selling point.

## Validation needed

Before making this the default, we need to benchmark on our test suite:

### 1. Baseline: Voyage 4 single-model
- Run LoCoMo benchmark with `voyage-4` on both store and recall
- Compare against current `text-embedding-3-large` baseline (90.53%)

### 2. Mixed-model test
- Store all memories with `voyage-4-lite`
- Recall with `voyage-4-large`
- Compare accuracy vs single-model baseline
- Measure quality degradation (if any)

### 3. Cost/performance matrix
| Config | Store model | Recall model | LoCoMo score | Cost/1M tokens (store) | Cost/1M tokens (recall) |
|--------|------------|-------------|-------------|----------------------|----------------------|
| Current default | text-embedding-3-large | text-embedding-3-large | 90.53% | $0.13 | $0.13 |
| Voyage single | voyage-4 | voyage-4 | ? | $0.06 | $0.06 |
| Voyage mixed | voyage-4-lite | voyage-4-large | ? | ? | ? |
| Voyage lite-only | voyage-4-lite | voyage-4-lite | ? | ? | ? |

## Implementation

### Phase 1: Config support
Add new env vars:
```
VOYAGE_STORE_MODEL=voyage-4-lite
VOYAGE_RECALL_MODEL=voyage-4-large
```

When both are set, the Voyage provider uses the store model for `generate_embedding()` calls from the store path, and the recall model for embeddings generated during recall/search.

### Phase 2: Code changes
- `VoyageEmbeddingProvider` needs to support two model handles
- Store path (`/store` endpoint) calls with store model
- Recall path (`/recall` endpoint) calls with recall model  
- Fallback: if only `VOYAGE_MODEL` is set (current behavior), use same model for both

### Phase 3: If benchmarks validate
- Make Voyage the default `EMBEDDING_PROVIDER`
- Default to mixed-model config
- Update docs and migration guide

## Related

- #67 — Voyage AI provider (merged in v0.11.0)
- #84 — Discussion on dropping to text-embedding-3-small as default
- Voyage 4 announcement: https://blog.voyageai.com/2026/01/15/voyage-4/


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Voyage 4 mixed-model embedding (lite on store, large on recall) #93

Summary

Concept

Why this matters for AutoMem

Validation needed

1. Baseline: Voyage 4 single-model

2. Mixed-model test

3. Cost/performance matrix

Implementation

Phase 1: Config support

Phase 2: Code changes

Phase 3: If benchmarks validate

Related

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Operation	Model	Why
Store (memory creation)	`voyage-4-lite`	Cheapest, fastest, lowest latency. Memories are written constantly.
Recall (query time)	`voyage-4-large`	Most accurate query understanding. Recalls happen less frequently.

Config	Store model	Recall model	LoCoMo score	Cost/1M tokens (store)	Cost/1M tokens (recall)
Current default	text-embedding-3-large	text-embedding-3-large	90.53%	$0.13	$0.13
Voyage single	voyage-4	voyage-4	?	$0.06	$0.06
Voyage mixed	voyage-4-lite	voyage-4-large	?	?	?
Voyage lite-only	voyage-4-lite	voyage-4-lite	?	?	?

feat: Voyage 4 mixed-model embedding (lite on store, large on recall) #93

Description

Summary

Concept

Why this matters for AutoMem

Validation needed

1. Baseline: Voyage 4 single-model

2. Mixed-model test

3. Cost/performance matrix

Implementation

Phase 1: Config support

Phase 2: Code changes

Phase 3: If benchmarks validate

Related

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions