feat: graph-enhanced retrieval with PPR and community detection#395
Open
2233admin wants to merge 9 commits intoNevaMind-AI:mainfrom
Open
feat: graph-enhanced retrieval with PPR and community detection#3952233admin wants to merge 9 commits intoNevaMind-AI:mainfrom
2233admin wants to merge 9 commits intoNevaMind-AI:mainfrom
Conversation
- GraphNode/GraphEdge/GraphCommunity domain models + SQLModel ORM - GraphStore repository with CRUD + dual-path graph recall + PPR/LPA - Alembic migration for gm_* tables with scope column support - Wired into PostgresStore alongside existing repos - 77 existing tests still passing
- RetrieveGraphConfig: enabled, weight (β), max_nodes - recall_graph WorkflowStep in RAG workflow - Score fusion in _rag_build_context: vector*α + graph*β - graph_nodes[] in retrieve response - 77 tests pass, E2E verified with live PG data
Tests cover: PPR algorithm (8), global PageRank (3), LPA community detection (4), merge results (4), score fusion (3), config (4), domain models (3), ORM registration (1). All pure-Python, no DB needed.
Migration file was lost from working tree (present in e844af4 but not in subsequent working directory state). DB was already at 002_relation_category revision with the schema applied; alembic failed on init because it could not locate the revision file. Restored via: git checkout e844af4 -- <migration_path>
- admission.py: AdmissionGate with noise pattern detection, min_length check, configurable threshold - test_admission.py: 14 tests covering disabled pass-through, min_length, noise blocking, custom patterns
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
�[38;5;8m 1�[0m �[37m## Summary�[0m
�[38;5;8m 2�[0m
�[38;5;8m 3�[0m �[37mAdds graph-enhanced retrieval that layers a knowledge graph on top of existing vector search for more contextual memory recall. When enabled, the retrieve pipeline fuses vector similarity scores with graph-based Personalized PageRank (PPR) scores, surfacing memories that are both semantically relevant and structurally connected.�[0m
�[38;5;8m 4�[0m
�[38;5;8m 5�[0m �[37m## Core Changes�[0m
�[38;5;8m 6�[0m
�[38;5;8m 7�[0m �[37m### Phase 1: GraphStore Module�[0m
�[38;5;8m 8�[0m �[37m- Domain models:
GraphNode,GraphEdge,GraphCommunity(indatabase/models.py)�[0m�[38;5;8m 9�[0m �[37m- ORM models: SQLAlchemy models registered in
schema.pyviaget_sqlalchemy_models()�[0m�[38;5;8m 10�[0m �[37m- Alembic migration:
001_add_graph_tables.pycreatesgm_nodes,gm_edges,gm_communitiestables (idempotent withIF NOT EXISTS)�[0m�[38;5;8m 11�[0m �[37m- GraphStore repository (
repositories/graph_store.py): Full CRUD,load_graph()for in-memory NetworkX representation, PPR/LPA algorithms, dual-path recall (precise entity match + community expansion)�[0m�[38;5;8m 12�[0m
�[38;5;8m 13�[0m �[37m### Phase 2: Retrieve Pipeline Integration�[0m
�[38;5;8m 14�[0m �[37m-
RetrieveGraphConfiginsettings.py:enabled(default:False),weight(default:0.3),max_nodes(default:5)�[0m�[38;5;8m 15�[0m �[37m-
recall_graphWorkflowStep: Injected beforebuild_contextin the retrieve pipeline — finds seed nodes from query, expands via communities, ranks by PPR�[0m�[38;5;8m 16�[0m �[37m- Score fusion in
_rag_build_context:final_score = α * vector_score + β * graph_scorewhere α + β = 1.0�[0m�[38;5;8m 17�[0m �[37m- Graph nodes included in retrieve response for transparency�[0m
�[38;5;8m 18�[0m
�[38;5;8m 19�[0m �[37m### Phase 3: Tests�[0m
�[38;5;8m 20�[0m �[37m- 30 unit tests covering: PPR (8), Global PageRank (3), LPA (4), merge results (4), score fusion (3), config (4), domain models (3), ORM models (1)�[0m
�[38;5;8m 21�[0m
�[38;5;8m 22�[0m �[37m## Configuration�[0m
�[38;5;8m 23�[0m
�[38;5;8m 24�[0m �[37m
python�[0m �[38;5;8m 25�[0m �[37mfrom memu import MemU�[0m �[38;5;8m 26�[0m �[38;5;8m 27�[0m �[37mm = MemU.from_config(�[0m �[38;5;8m 28�[0m �[37m retrieve_graph={�[0m �[38;5;8m 29�[0m �[37m "enabled": True,�[0m �[38;5;8m 30�[0m �[37m "weight": 0.3, # graph contribution to final score�[0m �[38;5;8m 31�[0m �[37m "max_nodes": 5 # max graph nodes per recall�[0m �[38;5;8m 32�[0m �[37m }�[0m �[38;5;8m 33�[0m �[37m)�[0m �[38;5;8m 34�[0m �[37m�[0m�[38;5;8m 35�[0m
�[38;5;8m 36�[0m �[37mDefault: disabled — zero impact on existing users. No graph tables are queried unless
enabled=True.�[0m�[38;5;8m 37�[0m
�[38;5;8m 38�[0m �[37m## Files Changed�[0m
�[38;5;8m 39�[0m
�[38;5;8m 40�[0m �[37m| File | Change |�[0m
�[38;5;8m 41�[0m �[37m|------|--------|�[0m
�[38;5;8m 42�[0m �[37m|
database/models.py| +3 domain dataclasses |�[0m�[38;5;8m 43�[0m �[37m|
database/postgres/models.py| +3 SQLAlchemy ORM models |�[0m�[38;5;8m 44�[0m �[37m|
database/postgres/schema.py| Register graph models |�[0m�[38;5;8m 45�[0m �[37m|
database/postgres/postgres.py| WireGraphStoreintoDatabase|�[0m�[38;5;8m 46�[0m �[37m|
database/postgres/repositories/graph_store.py| New — 800 LOC repository |�[0m�[38;5;8m 47�[0m �[37m|
database/postgres/migrations/versions/001_add_graph_tables.py| New — Alembic migration |�[0m�[38;5;8m 48�[0m �[37m|
app/settings.py|RetrieveGraphConfigdataclass |�[0m�[38;5;8m 49�[0m �[37m|
app/retrieve.py|recall_graphstep + score fusion |�[0m�[38;5;8m 50�[0m �[37m|
README.md| Graph-enhanced retrieval section |�[0m�[38;5;8m 51�[0m �[37m|
tests/test_graph_store.py| New — 30 tests |�[0m�[38;5;8m 52�[0m
�[38;5;8m 53�[0m �[37m## Known Limitations�[0m
�[38;5;8m 54�[0m
�[38;5;8m 55�[0m �[37mThese are pre-existing design constraints, not introduced by this PR:�[0m
�[38;5;8m 56�[0m
�[38;5;8m 57�[0m �[37m1.
ddl_mode="validate"still runs Alembicupgrade()— the migration is safe (usesIF NOT EXISTS) but this behavior predates this PR�[0m�[38;5;8m 58�[0m �[37m2. Migration hard-codes
user_idas the scope column; projects using dynamic scope models may need to adjust�[0m�[38;5;8m 59�[0m
�[38;5;8m 60�[0m �[37m## Breaking Changes�[0m
�[38;5;8m 61�[0m
�[38;5;8m 62�[0m �[37mNone. Graph retrieval is fully opt-in via configuration. Existing retrieve behavior is unchanged when
graph.enabled=False(the default).�[0m