fix: escape Lucene boolean keywords as whole words, not individual characters (fixes #1302) by voidborne-d · Pull Request #1371 · getzep/graphiti

voidborne-d · 2026-04-02T21:02:06Z

Problem

lucene_sanitize() in graphiti_core/helpers.py used str.maketrans to escape individual uppercase letters O, R, N, T, A, D — intending to block Lucene boolean operators (AND, OR, NOT). Because str.maketrans maps every occurrence of each character, this corrupted virtually every real-world query:

Query	Before fix	After fix
`Donald Trump`	`\Donald \Trump`	`Donald Trump`
`ORACLE`	`\O\R\ACLE`	`ORACLE`
`NASA`	`\N\AS\A`	`NASA`
`Toronto`	`\Toronto`	`Toronto`
`cats AND dogs`	`c\\ats \\AN\D dogs`	`cats \AND dogs`

This broke BM25 fulltext search for entity names, facts, and episodes whenever they contained any of those six letters — which is nearly all real-world text.

Fixes #1302.

Fix

Remove O, R, N, T, A, D from the character-level str.maketrans escape map
Add _LUCENE_KEYWORD_RE = re.compile(r'\b(AND|OR|NOT)\b') that matches the keywords only as standalone uppercase whole words
Apply keyword escaping as a second pass: lambda m: '\\' + m.group(1)

Result:

ANDROID, TORNADO, ORPHAN, NORMANDY — pass through unmodified
Standalone AND, OR, NOT — correctly escaped to \AND, \OR, \NOT
All Lucene special characters still escaped as before

Changes

graphiti_core/helpers.py — replace character-level escaping with regex + lambda
tests/test_lucene_sanitize.py — 58 new parametrized tests covering all cases
tests/helpers_test.py — update existing assertion that expected T in This to be escaped

Test results

58 passed, 0 failed

…aracters (fixes getzep#1302) lucene_sanitize() used str.maketrans to escape individual uppercase letters O, R, N, T, A, D — intending to block Lucene boolean operators (AND, OR, NOT). Because str.maketrans maps every occurrence of each character, this corrupted virtually every query containing those common letters: 'Donald Trump' → '\Donald \Trump' 'ORACLE' → '\O\R\ACLE' 'NASA' → '\N\AS\A' 'Toronto' → '\Toronto' This broke BM25 fulltext search for entity names, facts, and episodes whenever they contained any of those six letters — which is nearly all real-world text. Fix: - Remove O, R, N, T, A, D from the character escape map - Add a regex (_LUCENE_KEYWORD_RE) that matches AND, OR, NOT only as whole uppercase words (\b boundary) - Apply keyword escaping as a second pass after character escaping - Words like ANDROID, TORNADO, ORPHAN pass through correctly Also updates the existing test_lucene_sanitize assertion which expected the T in 'This' to be escaped. 58 new tests covering: - Uppercase letters preserved in normal words (17 cases) - Boolean keywords properly escaped (7 cases) - Keywords inside words NOT escaped (10 cases) - Special characters still escaped (18 cases) - Combined scenarios (6 cases)

danielchalef · 2026-04-02T21:02:18Z

All contributors have signed the CLA ✍️ ✅
_{Posted by the CLA Assistant Lite bot.}

voidborne-d · 2026-04-02T22:11:24Z

I have read the CLA Document and I hereby sign the CLA

danielchalef added a commit that referenced this pull request Apr 2, 2026

@voidborne-d has signed the CLA in #1371

1956ce3

voidborne-d temporarily deployed to development April 3, 2026 00:20 — with GitHub Actions Inactive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: escape Lucene boolean keywords as whole words, not individual characters (fixes #1302)#1371

fix: escape Lucene boolean keywords as whole words, not individual characters (fixes #1302)#1371
voidborne-d wants to merge 1 commit intogetzep:mainfrom
voidborne-d:fix/lucene-sanitize-uppercase-corruption

voidborne-d commented Apr 2, 2026

Uh oh!

danielchalef commented Apr 2, 2026 •

edited

Loading

Uh oh!

voidborne-d commented Apr 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

voidborne-d commented Apr 2, 2026

Problem

Fix

Changes

Test results

Uh oh!

danielchalef commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

voidborne-d commented Apr 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

danielchalef commented Apr 2, 2026 •

edited

Loading