fix(rdf): scope glossary graph by membership and surface glossary labels#28199
fix(rdf): scope glossary graph by membership and surface glossary labels#28199harshach wants to merge 4 commits into
Conversation
The /rdf/glossary/graph endpoint silently ignored its `glossaryId` filter
and returned every term in every glossary, because the SPARQL query bound
`?glossary` via `OPTIONAL { ?term1 om:belongsTo ?glossary }` while the
JSON-LD context (governance.jsonld) maps GlossaryTerm.glossary to
`om:belongsToGlossary`. The downstream `FILTER(?glossary = <…>)` was a
no-op, and the UI hierarchy view rendered every glossary's terms grouped
by their raw UUID — most visibly for cross-glossary scenarios.
Backend changes (openmetadata-service):
- RdfRepository: use the correct `om:belongsToGlossary` predicate and
make membership required (not OPTIONAL + FILTER) when scoped, so the
join drops non-matching terms instead of relying on filter semantics.
- Emit `group` (glossary name via COALESCE on skos:prefLabel /
rdfs:label) and `glossaryId` on every term node; fall back to a DB
lookup so a scoped response always carries the glossary label.
- Read term names from `rdfs:label` (the actual predicate base.jsonld
uses for `name`) instead of `om:name`, which is never written.
- Backfill membership when a node is upgraded from a term2 (edge target)
on one row to a term1 (primary) on a later row.
- removeGlossaryTermRelation deletes both directions so the reverse
triple written by EntityRepository.addRelationship doesn't linger.
- addRelationship (generic) uses additive INSERT DATA instead of
storeEntity, which destructively swept rdf:type, rdfs:label and
om:belongsToGlossary off the source entity when called with a
relationship-only model.
- Database fallback in getGlossaryTermGraphFromDatabase now scopes
in-memory; ListFilter has no first-class `glossary` predicate so
addQueryParam was a silent no-op that leaked every term.
- RdfUpdater short-circuits glossaryTerm⇔glossaryTerm RELATED_TO; the
typed addGlossaryTermRelation / removeGlossaryTermRelation path owns
those edges with the precise predicate.
- RdfPropertyMapper skips blank `xsd:string` literals so empty
displayNames no longer materialize as `skos:prefLabel ""` and win
over `rdfs:label` at read time.
UI changes (openmetadata-ui):
- convertRdfGraphToOntologyGraph prefers the explicit `node.glossaryId`
from the response over the FQN-prefix heuristic.
- hierarchyGraphBuilder falls back to a term node's `group` when the
glossary id can't be resolved against the caller's glossary list, so
the combo container label is never a raw UUID.
- rdfAPI.interface declares the new `glossaryId` field.
Tests (openmetadata-integration-tests):
- RdfGlossaryGraphIT covers nine scenarios end-to-end, including
membership/labels through add+delete relation cycles, displayName
set/clear lifecycle, two relation types between the same term pair,
relation-type change (delete + add new), cross-glossary relations,
and the DB-fallback path for an empty glossary.
…scan Address PR #28199 review: getGlossaryTermGraphFromDatabase previously loaded every glossary term in the deployment with listAll() and then filtered by glossary.id in a loop. That defeats the point of scoping and creates memory / GC pressure on every fallback call. Now drive the scoping at the DB level by calling glossaryTermDAO.getNestedTerms(glossaryFqn), which executes `SELECT json FROM glossary_term_entity WHERE fqnHash LIKE :prefix` — an indexed prefix scan that only returns the requested glossary's terms. Parse the result JSON for ids, then bulk-hydrate relatedTerms / parent / children via EntityRepository.get(uri, ids, fields, include) in one batched call. The unscoped branch (glossaryId == null) still uses listAll because the endpoint is documented to support that case; the regression was strictly about silently doing a full scan when the caller DID supply a scope. All 9 RdfGlossaryGraphIT tests remain green.
|
Addressed in e5b8b51. The DB fallback now drives scoping at the database level via SELECT json FROM glossary_term_entity WHERE fqnHash LIKE :prefixwhich is an indexed The unscoped branch ( All 9 |
…orts Per review on PR #28199: 1. Drive the DB fallback through the SAME code path as `GET /v1/glossaryTerms?glossary=<id>` instead of a parallel getNestedTerms + manual hydration sequence. That endpoint sets `filter.addQueryParam("parent", glossaryFqn)`, and `ListFilter.getParentCondition` turns it into a fqnHash LIKE prefix predicate via `getFqnPrefixCondition` — the same indexed prefix scan, but routed through `listAll` so cursor pagination, field hydration, and bulk relationship fetching all come for free. Net result: one DB-scoped read path used by both the public listing endpoint and the RDF graph DB fallback. 2. Replace fully-qualified type names introduced in the earlier commit (`org.openmetadata.schema.entity.data.GlossaryTerm`, `…Glossary`, `…Include.NON_DELETED`, `org.openmetadata.service.jdbi3.GlossaryTermRepository`, `…ListFilter`) with proper imports. Also collapsed the pre-existing `Relationship` FQN at line 348 since the new import made it redundant. All 9 RdfGlossaryGraphIT tests remain green.
|
Good catch — they weren't sharing the path before. e181f94 makes the RDF DB fallback drive through the same code as var listFilter = new ListFilter(null);
if (glossaryId != null) {
var glossary = (Glossary) glossaryRepo.get(null, glossaryId, …);
listFilter.addQueryParam("parent", glossary.getFullyQualifiedName());
}
var fetched =
glossaryTermRepository.listAll(
glossaryTermRepository.getFields("relatedTerms,parent,children"),
listFilter);That's exactly what Same commit also collapses the freshly-introduced FQN class names ( All 9 |
- RdfPropertyMapperTest.AddTypedPropertyBlankString — direct unit test for the blank xsd:string skip in addTypedProperty. Covers blank, all- whitespace, populated, and the negative case (skip is intentionally scoped to xsd:string so "0" / xsd:integer literals still get emitted). - RdfUpdaterTest (new) — verifies the glossaryTerm⇔glossaryTerm RELATED_TO short-circuit on both addRelationship and removeRelationship: short-circuit shape never reaches the repository, while cross-entity RELATED_TO (table→glossaryTerm) and other relationship types (CONTAINS between glossary terms) still flow through. Uses reflection to inject a Mockito mock + Awaitility to bridge the async submit path. - graphBuilders.test.ts (new) — convertRdfGraphToOntologyGraph prefers the explicit node.glossaryId from the RDF response over the FQN-prefix heuristic; falls back to node.group then FQN; preserves group passthrough so the combo can fall back to it; replaces a UUID-shaped label with the last FQN segment. - hierarchyGraphBuilder.test.ts (new) — combo label resolution order: glossaryNames[id] > non-blank node.group > raw glossaryId. Covers the missing-from-glossaryNames case that previously rendered as a raw UUID in the UI hierarchy view. All 41 new + existing tests across both modules pass.
Code Review ✅ Approved 1 resolved / 1 findingsScopes glossary graph filtering to specific memberships and corrects label extraction to prevent raw UUIDs in the UI. Nine new integration tests confirm full functional coverage and resolved DB fallback inefficiencies. ✅ 1 resolved✅ Performance: DB fallback loads ALL glossary terms into memory
OptionsDisplay: compact → Showing less information. Comment with these commands to change:
Was this helpful? React with 👍 / 👎 | Gitar |
|
|
🔴 Playwright Results — 1 failure(s), 8 flaky✅ 4112 passed · ❌ 1 failed · 🟡 8 flaky · ⏭️ 86 skipped
Genuine Failures (failed on all attempts)❌
|



Summary
/rdf/glossary/graph?glossaryId=<id>silently ignored its filter and returned every term across every glossary, because the SPARQL bound?glossaryviaOPTIONAL { ?term1 om:belongsTo ?glossary }but the JSON-LD context mapsGlossaryTerm.glossarytoom:belongsToGlossary. The downstreamFILTER(?glossary = <…>)was a no-op.glossaryId/groupon the response, so the UI fell throughglossaryNames[id] ?? id).rdf:type/rdfs:label/om:belongsToGlossary; emptyskos:prefLabel = ""won overrdfs:labeland rendered as blank node labels; one-sided RDF deletion left a reverse triple behind; the DB-fallback was scoped via aListFilterparam the table doesn't recognize.This PR fixes all of the above and adds nine integration tests that lock in the user-visible behaviors end-to-end through both the RDF path and the DB-backed term API.
Backend (
openmetadata-service)RdfRepository.buildGlossaryTermGraphQuery— useom:belongsToGlossary, make membership a required triple when scoped, COALESCE glossary label fromskos:prefLabel/rdfs:label.RdfRepository.parseGlossaryTermGraphResults— emitgroup+glossaryIdper node, backfill membership when a node is upgraded from term2→term1, blank-resilient label extraction, DB-resolved glossary-name fallback.RdfRepository.addRelationship— additiveINSERT DATAinstead ofstoreEntity(which swept identity predicates off the source entity).RdfRepository.removeGlossaryTermRelation— delete both directions so the reverse triple written byEntityRepository.addRelationshipdoesn't linger.RdfRepository.getGlossaryTermGraphFromDatabase— scope byglossary.idin-memory;ListFilterhas noglossarypredicate for the table, so the previousaddQueryParamwas a silent no-op.RdfUpdater.addRelationship/removeRelationship— short-circuitglossaryTerm ⇔ glossaryTerm RELATED_TO; the typedaddGlossaryTermRelationpath owns those edges with the precise predicate (skos:exactMatch,skos:broader, …) so type changes don't leave staleom:relatedTotriples.RdfPropertyMapper.addTypedProperty— skip blankxsd:stringliterals (one canonical source of truth on the write side).UI (
openmetadata-ui)convertRdfGraphToOntologyGraphprefers the explicitnode.glossaryIdfrom the response over the FQN-prefix heuristic.hierarchyGraphBuildercombo label falls back to a node'sgroupwhen the glossary id isn't in the caller's visible glossary list — so the container never renders as a raw UUID.GraphNodetyped with the new optionalglossaryId.Tests (
openmetadata-integration-tests)RdfGlossaryGraphIT(new, 9 tests):glossaryIdFilterScopesGraphToRequestedGlossary— only requested glossary's terms returned.scopedResponseCarriesGlossaryNameAndIdPerNode—group+glossaryIdpopulated per term.termNodeLabelFallsBackToNameWhenDisplayNameIsAbsent— label = term name when no displayName, never UUID.labelTracksDisplayNameLifecycle— displayName: unset → set → renamed → cleared → still resolves correctly through both the RDF graph and the DB term API.glossaryMembershipSurvivesAddAndDeleteRelations—belongsToGlossary/ label preserved through add+delete cycles; cross-checked via DB.sameTermPairCanHoldMultipleRelationTypesIndependently— two relation types between the same pair coexist; deleting one leaves the other intact.changingRelationTypeReplacesOldEdgeWithNewType— delete-then-add to a new type leaves no orphan of the old type.crossGlossaryRelationDoesNotLeakIntoOtherGlossaryScope— cross-glossary relation does NOT make the source term a member of the target glossary.glossaryIdFilterReturnsEmptyForGlossaryWithNoTerms— DB-fallback path also scopes correctly.Test plan
mvn -pl openmetadata-integration-tests test -Dtest=RdfGlossaryGraphIT— 9/9 greenmvn spotless:apply— cleanyarn organize-imports + eslint --fix + prettieron changed UI files — clean🤖 Generated with Claude Code