Skip to content

feat: graph-enhanced retrieval with PPR and community detection#395

Open
2233admin wants to merge 9 commits intoNevaMind-AI:mainfrom
2233admin:feat/graph-enhanced-retrieval
Open

feat: graph-enhanced retrieval with PPR and community detection#395
2233admin wants to merge 9 commits intoNevaMind-AI:mainfrom
2233admin:feat/graph-enhanced-retrieval

Conversation

@2233admin
Copy link
Copy Markdown

�[38;5;8m 1�[0m �[37m## Summary�[0m
�[38;5;8m 2�[0m
�[38;5;8m 3�[0m �[37mAdds graph-enhanced retrieval that layers a knowledge graph on top of existing vector search for more contextual memory recall. When enabled, the retrieve pipeline fuses vector similarity scores with graph-based Personalized PageRank (PPR) scores, surfacing memories that are both semantically relevant and structurally connected.�[0m
�[38;5;8m 4�[0m
�[38;5;8m 5�[0m �[37m## Core Changes�[0m
�[38;5;8m 6�[0m
�[38;5;8m 7�[0m �[37m### Phase 1: GraphStore Module�[0m
�[38;5;8m 8�[0m �[37m- Domain models: GraphNode, GraphEdge, GraphCommunity (in database/models.py)�[0m
�[38;5;8m 9�[0m �[37m- ORM models: SQLAlchemy models registered in schema.py via get_sqlalchemy_models()�[0m
�[38;5;8m 10�[0m �[37m- Alembic migration: 001_add_graph_tables.py creates gm_nodes, gm_edges, gm_communities tables (idempotent with IF NOT EXISTS)�[0m
�[38;5;8m 11�[0m �[37m- GraphStore repository (repositories/graph_store.py): Full CRUD, load_graph() for in-memory NetworkX representation, PPR/LPA algorithms, dual-path recall (precise entity match + community expansion)�[0m
�[38;5;8m 12�[0m
�[38;5;8m 13�[0m �[37m### Phase 2: Retrieve Pipeline Integration�[0m
�[38;5;8m 14�[0m �[37m- RetrieveGraphConfig in settings.py: enabled (default: False), weight (default: 0.3), max_nodes (default: 5)�[0m
�[38;5;8m 15�[0m �[37m- recall_graph WorkflowStep: Injected before build_context in the retrieve pipeline — finds seed nodes from query, expands via communities, ranks by PPR�[0m
�[38;5;8m 16�[0m �[37m- Score fusion in _rag_build_context: final_score = α * vector_score + β * graph_score where α + β = 1.0�[0m
�[38;5;8m 17�[0m �[37m- Graph nodes included in retrieve response for transparency�[0m
�[38;5;8m 18�[0m
�[38;5;8m 19�[0m �[37m### Phase 3: Tests�[0m
�[38;5;8m 20�[0m �[37m- 30 unit tests covering: PPR (8), Global PageRank (3), LPA (4), merge results (4), score fusion (3), config (4), domain models (3), ORM models (1)�[0m
�[38;5;8m 21�[0m
�[38;5;8m 22�[0m �[37m## Configuration�[0m
�[38;5;8m 23�[0m
�[38;5;8m 24�[0m �[37mpython�[0m �[38;5;8m 25�[0m �[37mfrom memu import MemU�[0m �[38;5;8m 26�[0m �[38;5;8m 27�[0m �[37mm = MemU.from_config(�[0m �[38;5;8m 28�[0m �[37m retrieve_graph={�[0m �[38;5;8m 29�[0m �[37m "enabled": True,�[0m �[38;5;8m 30�[0m �[37m "weight": 0.3, # graph contribution to final score�[0m �[38;5;8m 31�[0m �[37m "max_nodes": 5 # max graph nodes per recall�[0m �[38;5;8m 32�[0m �[37m }�[0m �[38;5;8m 33�[0m �[37m)�[0m �[38;5;8m 34�[0m �[37m�[0m
�[38;5;8m 35�[0m
�[38;5;8m 36�[0m �[37mDefault: disabled — zero impact on existing users. No graph tables are queried unless enabled=True.�[0m
�[38;5;8m 37�[0m
�[38;5;8m 38�[0m �[37m## Files Changed�[0m
�[38;5;8m 39�[0m
�[38;5;8m 40�[0m �[37m| File | Change |�[0m
�[38;5;8m 41�[0m �[37m|------|--------|�[0m
�[38;5;8m 42�[0m �[37m| database/models.py | +3 domain dataclasses |�[0m
�[38;5;8m 43�[0m �[37m| database/postgres/models.py | +3 SQLAlchemy ORM models |�[0m
�[38;5;8m 44�[0m �[37m| database/postgres/schema.py | Register graph models |�[0m
�[38;5;8m 45�[0m �[37m| database/postgres/postgres.py | Wire GraphStore into Database |�[0m
�[38;5;8m 46�[0m �[37m| database/postgres/repositories/graph_store.py | New — 800 LOC repository |�[0m
�[38;5;8m 47�[0m �[37m| database/postgres/migrations/versions/001_add_graph_tables.py | New — Alembic migration |�[0m
�[38;5;8m 48�[0m �[37m| app/settings.py | RetrieveGraphConfig dataclass |�[0m
�[38;5;8m 49�[0m �[37m| app/retrieve.py | recall_graph step + score fusion |�[0m
�[38;5;8m 50�[0m �[37m| README.md | Graph-enhanced retrieval section |�[0m
�[38;5;8m 51�[0m �[37m| tests/test_graph_store.py | New — 30 tests |�[0m
�[38;5;8m 52�[0m
�[38;5;8m 53�[0m �[37m## Known Limitations�[0m
�[38;5;8m 54�[0m
�[38;5;8m 55�[0m �[37mThese are pre-existing design constraints, not introduced by this PR:�[0m
�[38;5;8m 56�[0m
�[38;5;8m 57�[0m �[37m1. ddl_mode="validate" still runs Alembic upgrade() — the migration is safe (uses IF NOT EXISTS) but this behavior predates this PR�[0m
�[38;5;8m 58�[0m �[37m2. Migration hard-codes user_id as the scope column; projects using dynamic scope models may need to adjust�[0m
�[38;5;8m 59�[0m
�[38;5;8m 60�[0m �[37m## Breaking Changes�[0m
�[38;5;8m 61�[0m
�[38;5;8m 62�[0m �[37mNone. Graph retrieval is fully opt-in via configuration. Existing retrieve behavior is unchanged when graph.enabled=False (the default).�[0m

- GraphNode/GraphEdge/GraphCommunity domain models + SQLModel ORM
- GraphStore repository with CRUD + dual-path graph recall + PPR/LPA
- Alembic migration for gm_* tables with scope column support
- Wired into PostgresStore alongside existing repos
- 77 existing tests still passing
- RetrieveGraphConfig: enabled, weight (β), max_nodes
- recall_graph WorkflowStep in RAG workflow
- Score fusion in _rag_build_context: vector*α + graph*β
- graph_nodes[] in retrieve response
- 77 tests pass, E2E verified with live PG data
Tests cover: PPR algorithm (8), global PageRank (3), LPA community
detection (4), merge results (4), score fusion (3), config (4),
domain models (3), ORM registration (1). All pure-Python, no DB needed.
Migration file was lost from working tree (present in e844af4 but not in
subsequent working directory state). DB was already at 002_relation_category
revision with the schema applied; alembic failed on init because it could
not locate the revision file.

Restored via: git checkout e844af4 -- <migration_path>
- admission.py: AdmissionGate with noise pattern detection, min_length check, configurable threshold
- test_admission.py: 14 tests covering disabled pass-through, min_length, noise blocking, custom patterns
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant