Skip to content

feat: Voyage 4 mixed-model embedding (lite on store, large on recall) #93

@jack-arturo

Description

@jack-arturo

Summary

Voyage 4's shared embedding space allows mixing models within the same vector collection — vectors from voyage-4-lite and voyage-4-large are directly comparable via cosine similarity. This opens up a cost optimization where we use a cheaper/faster model on store and a more accurate model on recall.

Concept

Operation Model Why
Store (memory creation) voyage-4-lite Cheapest, fastest, lowest latency. Memories are written constantly.
Recall (query time) voyage-4-large Most accurate query understanding. Recalls happen less frequently.

This works because all Voyage 4 family models (voyage-4-large, voyage-4, voyage-4-lite, voyage-4-nano) produce vectors in the same geometric space. A query vector from voyage-4-large can match against document vectors from voyage-4-lite with minimal quality loss — per Voyage's claims.

Why this matters for AutoMem

  • Cost optimization: Store is high-volume (every memory), recall is lower-volume (user queries). Using lite for the hot path saves money.
  • Latency: Faster embedding on the store path means lower write latency.
  • Upgradable without re-embedding: Users can start with voyage-4-lite everywhere, then upgrade recall to voyage-4-large without touching stored vectors.
  • Product differentiator: No other memory service offers this. Could be a unique selling point.

Validation needed

Before making this the default, we need to benchmark on our test suite:

1. Baseline: Voyage 4 single-model

  • Run LoCoMo benchmark with voyage-4 on both store and recall
  • Compare against current text-embedding-3-large baseline (90.53%)

2. Mixed-model test

  • Store all memories with voyage-4-lite
  • Recall with voyage-4-large
  • Compare accuracy vs single-model baseline
  • Measure quality degradation (if any)

3. Cost/performance matrix

Config Store model Recall model LoCoMo score Cost/1M tokens (store) Cost/1M tokens (recall)
Current default text-embedding-3-large text-embedding-3-large 90.53% $0.13 $0.13
Voyage single voyage-4 voyage-4 ? $0.06 $0.06
Voyage mixed voyage-4-lite voyage-4-large ? ? ?
Voyage lite-only voyage-4-lite voyage-4-lite ? ? ?

Implementation

Phase 1: Config support

Add new env vars:

VOYAGE_STORE_MODEL=voyage-4-lite
VOYAGE_RECALL_MODEL=voyage-4-large

When both are set, the Voyage provider uses the store model for generate_embedding() calls from the store path, and the recall model for embeddings generated during recall/search.

Phase 2: Code changes

  • VoyageEmbeddingProvider needs to support two model handles
  • Store path (/store endpoint) calls with store model
  • Recall path (/recall endpoint) calls with recall model
  • Fallback: if only VOYAGE_MODEL is set (current behavior), use same model for both

Phase 3: If benchmarks validate

  • Make Voyage the default EMBEDDING_PROVIDER
  • Default to mixed-model config
  • Update docs and migration guide

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions