fix(rdf): scope glossary graph by membership and surface glossary labels by harshach · Pull Request #28199 · open-metadata/OpenMetadata

harshach · 2026-05-17T03:09:25Z

Summary

/rdf/glossary/graph?glossaryId=<id> silently ignored its filter and returned every term across every glossary, because the SPARQL bound ?glossary via OPTIONAL { ?term1 om:belongsTo ?glossary } but the JSON-LD context maps GlossaryTerm.glossary to om:belongsToGlossary. The downstream FILTER(?glossary = <…>) was a no-op.
The UI hierarchy view rendered each leaked glossary's terms inside a group container whose label was the raw glossary UUID (no glossaryId/group on the response, so the UI fell through glossaryNames[id] ?? id).
A handful of cascading projection / read-side bugs surfaced once the predicate was fixed: term mutations dropped rdf:type/rdfs:label/om:belongsToGlossary; empty skos:prefLabel = "" won over rdfs:label and rendered as blank node labels; one-sided RDF deletion left a reverse triple behind; the DB-fallback was scoped via a ListFilter param the table doesn't recognize.

This PR fixes all of the above and adds nine integration tests that lock in the user-visible behaviors end-to-end through both the RDF path and the DB-backed term API.

Backend (`openmetadata-service`)

RdfRepository.buildGlossaryTermGraphQuery — use om:belongsToGlossary, make membership a required triple when scoped, COALESCE glossary label from skos:prefLabel/rdfs:label.
RdfRepository.parseGlossaryTermGraphResults — emit group + glossaryId per node, backfill membership when a node is upgraded from term2→term1, blank-resilient label extraction, DB-resolved glossary-name fallback.
RdfRepository.addRelationship — additive INSERT DATA instead of storeEntity (which swept identity predicates off the source entity).
RdfRepository.removeGlossaryTermRelation — delete both directions so the reverse triple written by EntityRepository.addRelationship doesn't linger.
RdfRepository.getGlossaryTermGraphFromDatabase — scope by glossary.id in-memory; ListFilter has no glossary predicate for the table, so the previous addQueryParam was a silent no-op.
RdfUpdater.addRelationship / removeRelationship — short-circuit glossaryTerm ⇔ glossaryTerm RELATED_TO; the typed addGlossaryTermRelation path owns those edges with the precise predicate (skos:exactMatch, skos:broader, …) so type changes don't leave stale om:relatedTo triples.
RdfPropertyMapper.addTypedProperty — skip blank xsd:string literals (one canonical source of truth on the write side).

UI (`openmetadata-ui`)

convertRdfGraphToOntologyGraph prefers the explicit node.glossaryId from the response over the FQN-prefix heuristic.
hierarchyGraphBuilder combo label falls back to a node's group when the glossary id isn't in the caller's visible glossary list — so the container never renders as a raw UUID.
GraphNode typed with the new optional glossaryId.

Tests (`openmetadata-integration-tests`)

RdfGlossaryGraphIT (new, 9 tests):

glossaryIdFilterScopesGraphToRequestedGlossary — only requested glossary's terms returned.
scopedResponseCarriesGlossaryNameAndIdPerNode — group + glossaryId populated per term.
termNodeLabelFallsBackToNameWhenDisplayNameIsAbsent — label = term name when no displayName, never UUID.
labelTracksDisplayNameLifecycle — displayName: unset → set → renamed → cleared → still resolves correctly through both the RDF graph and the DB term API.
glossaryMembershipSurvivesAddAndDeleteRelations — belongsToGlossary / label preserved through add+delete cycles; cross-checked via DB.
sameTermPairCanHoldMultipleRelationTypesIndependently — two relation types between the same pair coexist; deleting one leaves the other intact.
changingRelationTypeReplacesOldEdgeWithNewType — delete-then-add to a new type leaves no orphan of the old type.
crossGlossaryRelationDoesNotLeakIntoOtherGlossaryScope — cross-glossary relation does NOT make the source term a member of the target glossary.
glossaryIdFilterReturnsEmptyForGlossaryWithNoTerms — DB-fallback path also scopes correctly.

Tests run: 9, Failures: 0, Errors: 0, Skipped: 0

Test plan

mvn -pl openmetadata-integration-tests test -Dtest=RdfGlossaryGraphIT — 9/9 green
mvn spotless:apply — clean
yarn organize-imports + eslint --fix + prettier on changed UI files — clean
Manual UI verification on a local Docker stack with Fuseki enabled (Finance + Marketing glossaries with hierarchy, intra- and cross-glossary relations): scoped graphs render only the requested glossary's terms with proper group labels.

🤖 Generated with Claude Code

The /rdf/glossary/graph endpoint silently ignored its `glossaryId` filter and returned every term in every glossary, because the SPARQL query bound `?glossary` via `OPTIONAL { ?term1 om:belongsTo ?glossary }` while the JSON-LD context (governance.jsonld) maps GlossaryTerm.glossary to `om:belongsToGlossary`. The downstream `FILTER(?glossary = <…>)` was a no-op, and the UI hierarchy view rendered every glossary's terms grouped by their raw UUID — most visibly for cross-glossary scenarios. Backend changes (openmetadata-service): - RdfRepository: use the correct `om:belongsToGlossary` predicate and make membership required (not OPTIONAL + FILTER) when scoped, so the join drops non-matching terms instead of relying on filter semantics. - Emit `group` (glossary name via COALESCE on skos:prefLabel / rdfs:label) and `glossaryId` on every term node; fall back to a DB lookup so a scoped response always carries the glossary label. - Read term names from `rdfs:label` (the actual predicate base.jsonld uses for `name`) instead of `om:name`, which is never written. - Backfill membership when a node is upgraded from a term2 (edge target) on one row to a term1 (primary) on a later row. - removeGlossaryTermRelation deletes both directions so the reverse triple written by EntityRepository.addRelationship doesn't linger. - addRelationship (generic) uses additive INSERT DATA instead of storeEntity, which destructively swept rdf:type, rdfs:label and om:belongsToGlossary off the source entity when called with a relationship-only model. - Database fallback in getGlossaryTermGraphFromDatabase now scopes in-memory; ListFilter has no first-class `glossary` predicate so addQueryParam was a silent no-op that leaked every term. - RdfUpdater short-circuits glossaryTerm⇔glossaryTerm RELATED_TO; the typed addGlossaryTermRelation / removeGlossaryTermRelation path owns those edges with the precise predicate. - RdfPropertyMapper skips blank `xsd:string` literals so empty displayNames no longer materialize as `skos:prefLabel ""` and win over `rdfs:label` at read time. UI changes (openmetadata-ui): - convertRdfGraphToOntologyGraph prefers the explicit `node.glossaryId` from the response over the FQN-prefix heuristic. - hierarchyGraphBuilder falls back to a term node's `group` when the glossary id can't be resolved against the caller's glossary list, so the combo container label is never a raw UUID. - rdfAPI.interface declares the new `glossaryId` field. Tests (openmetadata-integration-tests): - RdfGlossaryGraphIT covers nine scenarios end-to-end, including membership/labels through add+delete relation cycles, displayName set/clear lifecycle, two relation types between the same term pair, relation-type change (delete + add new), cross-glossary relations, and the DB-fallback path for an empty glossary.

Copilot

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

…scan Address PR #28199 review: getGlossaryTermGraphFromDatabase previously loaded every glossary term in the deployment with listAll() and then filtered by glossary.id in a loop. That defeats the point of scoping and creates memory / GC pressure on every fallback call. Now drive the scoping at the DB level by calling glossaryTermDAO.getNestedTerms(glossaryFqn), which executes `SELECT json FROM glossary_term_entity WHERE fqnHash LIKE :prefix` — an indexed prefix scan that only returns the requested glossary's terms. Parse the result JSON for ids, then bulk-hydrate relatedTerms / parent / children via EntityRepository.get(uri, ids, fields, include) in one batched call. The unscoped branch (glossaryId == null) still uses listAll because the endpoint is documented to support that case; the regression was strictly about silently doing a full scan when the caller DID supply a scope. All 9 RdfGlossaryGraphIT tests remain green.

harshach · 2026-05-17T03:22:24Z

Addressed in e5b8b51.

The DB fallback now drives scoping at the database level via glossaryTermDAO.getNestedTerms(glossaryFqn):

SELECT json FROM glossary_term_entity WHERE fqnHash LIKE :prefix

which is an indexed fqnHash-prefix scan that only returns the requested glossary's terms. We then parse the result JSON for ids and bulk-hydrate relatedTerms / parent / children via EntityRepository.get(uri, ids, fields, include) in a single batched call — no more listAll + in-memory filter.

The unscoped branch (glossaryId == null) is unchanged because that's the documented "all glossaries" mode; the regression was strictly about silently doing a full scan when the caller DID supply a scope.

All 9 RdfGlossaryGraphIT tests still green:

Tests run: 9, Failures: 0, Errors: 0, Skipped: 0

…orts Per review on PR #28199: 1. Drive the DB fallback through the SAME code path as `GET /v1/glossaryTerms?glossary=<id>` instead of a parallel getNestedTerms + manual hydration sequence. That endpoint sets `filter.addQueryParam("parent", glossaryFqn)`, and `ListFilter.getParentCondition` turns it into a fqnHash LIKE prefix predicate via `getFqnPrefixCondition` — the same indexed prefix scan, but routed through `listAll` so cursor pagination, field hydration, and bulk relationship fetching all come for free. Net result: one DB-scoped read path used by both the public listing endpoint and the RDF graph DB fallback. 2. Replace fully-qualified type names introduced in the earlier commit (`org.openmetadata.schema.entity.data.GlossaryTerm`, `…Glossary`, `…Include.NON_DELETED`, `org.openmetadata.service.jdbi3.GlossaryTermRepository`, `…ListFilter`) with proper imports. Also collapsed the pre-existing `Relationship` FQN at line 348 since the new import made it redundant. All 9 RdfGlossaryGraphIT tests remain green.

harshach · 2026-05-17T03:40:15Z

Good catch — they weren't sharing the path before. e181f94 makes the RDF DB fallback drive through the same code as GET /v1/glossaryTerms?glossary=<id>:

var listFilter = new ListFilter(null);
if (glossaryId != null) {
  var glossary = (Glossary) glossaryRepo.get(null, glossaryId, …);
  listFilter.addQueryParam("parent", glossary.getFullyQualifiedName());
}
var fetched =
    glossaryTermRepository.listAll(
        glossaryTermRepository.getFields("relatedTerms,parent,children"),
        listFilter);

That's exactly what GlossaryTermResource.list(...) does — parent flows through ListFilter.getParentCondition → getFqnPrefixCondition → an indexed fqnHash LIKE '<prefix>.%' predicate on glossary_term_entity. Single read path now serves both the public listing and the RDF graph DB fallback; no parallel getNestedTerms + manual id hydration.

Same commit also collapses the freshly-introduced FQN class names (org.openmetadata.schema.entity.data.GlossaryTerm, etc.) into proper imports per the convention.

All 9 RdfGlossaryGraphIT tests still green.

Copilot

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

github-actions · 2026-05-17T04:06:23Z

Jest test Coverage

UI tests summary

Lines	Statements	Branches	Functions
	62.69% (65385/104296)	43.52% (35792/82227)	46.27% (10528/22753)

- RdfPropertyMapperTest.AddTypedPropertyBlankString — direct unit test for the blank xsd:string skip in addTypedProperty. Covers blank, all- whitespace, populated, and the negative case (skip is intentionally scoped to xsd:string so "0" / xsd:integer literals still get emitted). - RdfUpdaterTest (new) — verifies the glossaryTerm⇔glossaryTerm RELATED_TO short-circuit on both addRelationship and removeRelationship: short-circuit shape never reaches the repository, while cross-entity RELATED_TO (table→glossaryTerm) and other relationship types (CONTAINS between glossary terms) still flow through. Uses reflection to inject a Mockito mock + Awaitility to bridge the async submit path. - graphBuilders.test.ts (new) — convertRdfGraphToOntologyGraph prefers the explicit node.glossaryId from the RDF response over the FQN-prefix heuristic; falls back to node.group then FQN; preserves group passthrough so the combo can fall back to it; replaces a UUID-shaped label with the last FQN segment. - hierarchyGraphBuilder.test.ts (new) — combo label resolution order: glossaryNames[id] > non-blank node.group > raw glossaryId. Covers the missing-from-glossaryNames case that previously rendered as a raw UUID in the UI hierarchy view. All 41 new + existing tests across both modules pass.

gitar-bot · 2026-05-17T04:56:46Z

Code Review ✅ Approved 1 resolved / 1 findings

Scopes glossary graph filtering to specific memberships and corrects label extraction to prevent raw UUIDs in the UI. Nine new integration tests confirm full functional coverage and resolved DB fallback inefficiencies.

✅ 1 resolved

✅ Performance: DB fallback loads ALL glossary terms into memory

📄 openmetadata-service/src/main/java/org/openmetadata/service/rdf/RdfRepository.java:1622-1636
The getGlossaryTermGraphFromDatabase method calls glossaryTermRepository.listAll(...) which internally uses dao.listAfter(filter, Integer.MAX_VALUE, "", ""). This fetches every glossary term in the system into memory, then filters by glossaryId in a loop. For deployments with thousands of terms this creates unnecessary memory pressure and GC load on every invocation of the DB-fallback path.

The original code attempted to use listFilter.addQueryParam("glossary", ...) which (as the PR correctly notes) is a silent no-op. However, a better fix would be to use a JDBI query that supports glossary filtering at the DB level rather than loading everything and filtering in-memory.

If a DB-level filter is not feasible short-term, at minimum add a comment documenting the scalability limitation and consider streaming/pagination.

Options

Display: compact → Showing less information.

Comment with these commands to change:

`Compact`
`gitar display:verbose`

_{Was this helpful? React with 👍 / 👎 | Gitar}

sonarqubecloud · 2026-05-17T05:20:40Z

Quality Gate passed for 'open-metadata-ui'

Issues
1351 New issues
0 Accepted issues

Measures
17 Security Hotspots
0.0% Coverage on New Code
4.0% Duplication on New Code

See analysis details on SonarQube Cloud

sonarqubecloud · 2026-05-17T06:05:20Z

Quality Gate passed for 'open-metadata-ingestion'

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

github-actions · 2026-05-17T07:12:53Z

🔴 Playwright Results — 1 failure(s), 8 flaky

✅ 4112 passed · ❌ 1 failed · 🟡 8 flaky · ⏭️ 86 skipped

Shard	Passed	Failed	Flaky	Skipped
✅ Shard 1	299	0	0	4
🔴 Shard 2	771	1	2	8
🟡 Shard 3	780	0	4	7
✅ Shard 4	816	0	0	18
🟡 Shard 5	708	0	1	41
🟡 Shard 6	738	0	1	8

Genuine Failures (failed on all attempts)

❌ Features/Glossary/GlossaryWorkflow.spec.ts › should display correct status badge color and icon (shard 2)

Error: �[2mexpect(�[22m�[31mlocator�[39m�[2m).�[22mtoHaveText�[2m(�[22m�[32mexpected�[39m�[2m)�[22m failed

Locator:  locator('[data-row-key*="StatusBadgeTerm1779000283633"]').locator('.status-badge')
Expected: �[32m"Draft"�[39m
Received: �[31m"In Review"�[39m
Timeout:  15000ms

Call log:
�[2m  - Expect "toHaveText" with timeout 15000ms�[22m
�[2m  - waiting for locator('[data-row-key*="StatusBadgeTerm1779000283633"]').locator('.status-badge')�[22m
�[2m    18 × locator resolved to <div class="status-badge inReview" data-testid=""PW%'a6881030.Calme8bca324".StatusBadgeTerm1779000283633-status">…</div>�[22m
�[2m       - unexpected value "In Review"�[22m

🟡 8 flaky test(s) (passed on retry)

Features/BulkEditEntity.spec.ts › Glossary (shard 2, 1 retry)
Features/KnowledgeCenter.spec.ts › Article mentions in description should working for Knowledge Center (shard 2, 1 retry)
Features/RTL.spec.ts › Verify Following widget functionality (shard 3, 1 retry)
Features/Table.spec.ts › Table pagination with sorting should works (shard 3, 1 retry)
Flow/ObservabilityAlerts.spec.ts › Table alert (shard 3, 1 retry)
Flow/PersonaFlow.spec.ts › Set default persona for team should work properly (shard 3, 1 retry)
Pages/ExplorePageRightPanel_KnowledgeCenter.spec.ts › Should remove user owner for knowledgeCenter (shard 5, 2 retries)
Pages/Lineage/LineageFilters.spec.ts › Verify lineage schema filter selection (shard 6, 1 retry)

📦 Download artifacts

How to debug locally

# Download playwright-test-results-<shard> artifact and unzip
npx playwright show-trace path/to/trace.zip    # view trace

Copilot AI review requested due to automatic review settings May 17, 2026 03:09

harshach requested a review from a team as a code owner May 17, 2026 03:09

github-actions Bot added backend safe to test Add this label to run secure Github workflows on PRs labels May 17, 2026

Copilot started reviewing on behalf of harshach May 17, 2026 03:09 View session

gitar-bot Bot reviewed May 17, 2026

View reviewed changes

Comment thread openmetadata-service/src/main/java/org/openmetadata/service/rdf/RdfRepository.java Outdated

Copilot AI reviewed May 17, 2026

View reviewed changes

harshach had a problem deploying to test May 17, 2026 03:19 — with GitHub Actions Error

harshach had a problem deploying to test May 17, 2026 03:20 — with GitHub Actions Error

harshach had a problem deploying to test May 17, 2026 03:32 — with GitHub Actions Error

harshach had a problem deploying to test May 17, 2026 03:33 — with GitHub Actions Error

Copilot AI review requested due to automatic review settings May 17, 2026 03:40

Copilot started reviewing on behalf of harshach May 17, 2026 03:40 View session

Copilot AI reviewed May 17, 2026

View reviewed changes

harshach had a problem deploying to test May 17, 2026 03:49 — with GitHub Actions Error

harshach temporarily deployed to test May 17, 2026 03:49 — with GitHub Actions Inactive

harshach had a problem deploying to test May 17, 2026 03:49 — with GitHub Actions Error

harshach temporarily deployed to test May 17, 2026 03:49 — with GitHub Actions Inactive

harshach temporarily deployed to test May 17, 2026 05:05 — with GitHub Actions Inactive

harshach temporarily deployed to test May 17, 2026 05:07 — with GitHub Actions Inactive

harshach had a problem deploying to test May 17, 2026 05:07 — with GitHub Actions Failure

harshach temporarily deployed to test May 17, 2026 05:07 — with GitHub Actions Inactive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(rdf): scope glossary graph by membership and surface glossary labels#28199

fix(rdf): scope glossary graph by membership and surface glossary labels#28199
harshach wants to merge 4 commits into
mainfrom
fix/rdf-glossary-graph-scope-and-labels

harshach commented May 17, 2026

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

harshach commented May 17, 2026

Uh oh!

harshach commented May 17, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

github-actions Bot commented May 17, 2026 •

edited

Loading

Uh oh!

gitar-bot Bot commented May 17, 2026 •

edited

Loading

Uh oh!

sonarqubecloud Bot commented May 17, 2026

Uh oh!

sonarqubecloud Bot commented May 17, 2026

Uh oh!

github-actions Bot commented May 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

harshach commented May 17, 2026

Summary

Backend (openmetadata-service)

UI (openmetadata-ui)

Tests (openmetadata-integration-tests)

Test plan

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

harshach commented May 17, 2026

Uh oh!

harshach commented May 17, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented May 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Jest test Coverage

UI tests summary

Uh oh!

gitar-bot Bot commented May 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sonarqubecloud Bot commented May 17, 2026

Quality Gate passed for 'open-metadata-ui'

Uh oh!

sonarqubecloud Bot commented May 17, 2026

Quality Gate passed for 'open-metadata-ingestion'

Uh oh!

github-actions Bot commented May 17, 2026

🔴 Playwright Results — 1 failure(s), 8 flaky

Genuine Failures (failed on all attempts)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Backend (`openmetadata-service`)

UI (`openmetadata-ui`)

Tests (`openmetadata-integration-tests`)

github-actions Bot commented May 17, 2026 •

edited

Loading

gitar-bot Bot commented May 17, 2026 •

edited

Loading