Skip to content

[#10739] fix(core): Avoid stale cache refill during concurrent metadata updates#10740

Draft
diqiu50 wants to merge 3 commits intoapache:mainfrom
diqiu50:upstream/cache-invalidation-race-fixes
Draft

[#10739] fix(core): Avoid stale cache refill during concurrent metadata updates#10740
diqiu50 wants to merge 3 commits intoapache:mainfrom
diqiu50:upstream/cache-invalidation-race-fixes

Conversation

@diqiu50
Copy link
Copy Markdown
Contributor

@diqiu50 diqiu50 commented Apr 10, 2026

What changes were proposed in this pull request?

Reorder cache invalidation in catalog and relational entity store write paths so concurrent reads do not refill cache with stale metadata.

Why are the changes needed?

Several write paths invalidate cache before the backend mutation completes.

Under concurrent access, another thread can miss the cache, read old data from the backend, and write that stale result back into cache. This can leave outdated catalog or relation metadata visible after rename, drop, or relation updates.

Fix: #10739

Does this PR introduce any user-facing change?

No.

How was this patch tested?

  • added unit coverage in TestCatalogManager
  • added unit coverage in TestRelationalEntityStore

Copilot AI review requested due to automatic review settings April 10, 2026 11:05
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses a concurrency race where cache invalidation happening before backend mutations allows concurrent readers to repopulate caches with stale metadata, leaving outdated catalog or relation metadata visible after writes.

Changes:

  • Reorders cache invalidation in CatalogManager write paths to occur after store mutations, and forces updated wrappers into cache after rename.
  • Reorders cache invalidation in RelationalEntityStore write paths (update/delete/relations) to occur after backend mutations.
  • Adds new unit tests intended to validate the new invalidation ordering and stale-refill prevention.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 9 comments.

File Description
core/src/main/java/org/apache/gravitino/catalog/CatalogManager.java Moves invalidation after store mutations for alter/drop; uses cache put to prevent stale overwrite during rename.
core/src/main/java/org/apache/gravitino/storage/relational/RelationalEntityStore.java Reorders invalidation after backend writes for entity and relation operations.
core/src/test/java/org/apache/gravitino/catalog/TestCatalogManager.java Adds/updates tests for cache race scenarios around alter/drop.
core/src/test/java/org/apache/gravitino/storage/relational/TestRelationalEntityStore.java Adds tests to verify invalidation happens after backend writes for entity and relation operations.
Comments suppressed due to low confidence (1)

core/src/main/java/org/apache/gravitino/storage/relational/RelationalEntityStore.java:349

  • Destination-side relation cache invalidation uses srcEntityType for destEntitiesToAdd/Remove. For relations where destination entity types differ (e.g., TAG/POLICY vs metadata object), this misses destination cache keys and can leave stale relation results. Derive the destination type from relType or extend the API to accept destination entity type(s).
    for (NameIdentifier destToAdd : destEntitiesToAdd) {
      cache.invalidate(destToAdd, srcEntityType, relType);
    }

    for (NameIdentifier destToRemove : destEntitiesToRemove) {

Answer<CatalogManager.CatalogWrapper> insertStaleWrapper =
invocation -> {
if (staleInserted.compareAndSet(false, true)) {
catalogManager.getCatalogCache().put(NameIdentifier.of("metalake", "cache_race_test_renamed"), staleWrapper);
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The long put(...) call exceeds typical Google Java Style line length and also reduces readability in tests. Please wrap the arguments onto multiple lines consistent with the rest of this file’s formatting.

Suggested change
catalogManager.getCatalogCache().put(NameIdentifier.of("metalake", "cache_race_test_renamed"), staleWrapper);
catalogManager
.getCatalogCache()
.put(
NameIdentifier.of("metalake", "cache_race_test_renamed"), staleWrapper);

Copilot uses AI. Check for mistakes.
import java.util.List;
import java.util.function.Function;
import org.apache.commons.lang3.reflect.FieldUtils;
import org.apache.commons.lang3.tuple.Pair;
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pair is imported but not used in this test class. This will fail compilation/style checks (Spotless/removeUnusedImports). Remove the unused import.

Suggested change
import org.apache.commons.lang3.tuple.Pair;

Copilot uses AI. Check for mistakes.
Comment on lines +72 to +75
.update(eq(ident), eq(Entity.EntityType.CATALOG), any(Function.class));

store.update(ident, null, Entity.EntityType.CATALOG, entity -> entity);

Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This call passes null for the Class<E> type parameter, which can prevent generic type inference and fail compilation. Pass the concrete entity class used in this test (e.g., CatalogEntity.class / relevant type) instead of null.

Copilot uses AI. Check for mistakes.
Comment on lines +164 to +168
.invalidate(
destToAdd, Entity.EntityType.TABLE, SupportsRelationOperations.Type.TAG_REL);
Mockito.verify(cache, Mockito.never())
.invalidate(
destToRemove,
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The destination identifiers here look like tags, but the test expectations use EntityType.TABLE for destination-side invalidation. For tag relations (e.g., TAG_METADATA_OBJECT_REL), destination cache keys should be invalidated with EntityType.TAG (consistent with how insertRelation invalidates using dstType), otherwise destination relation caches can remain stale.

Copilot uses AI. Check for mistakes.
Comment on lines +118 to +120
.insertRelation(
SupportsRelationOperations.Type.TAG_REL,
src,
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SupportsRelationOperations.Type does not define TAG_REL (the enum includes TAG_METADATA_OBJECT_REL, etc.), so this test will not compile. Replace TAG_REL with the correct relation type constant used by the implementation.

Copilot uses AI. Check for mistakes.
Comment on lines 186 to 190
try {
boolean deleted = backend.delete(ident, entityType, cascade);
cache.invalidate(ident, entityType);
return backend.delete(ident, entityType, cascade);
return deleted;
} catch (NoSuchEntityException e) {
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If backend.delete(...) throws NoSuchEntityException, this method returns false without invalidating the cache entry. That can leave a stale cached entity even though the backend reports it missing. Invalidate the cache in the exception path as well (or use a finally).

Copilot uses AI. Check for mistakes.
Comment on lines +593 to +597
Mockito.doAnswer(insertStaleWrapper)
.when(catalogManager)
.createCatalogWrapper(any(CatalogEntity.class), eq(null));
Mockito.doReturn(freshWrapper)
.when(catalogManager)
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test stubs CatalogManager#createCatalogWrapper(...), but the method is private in CatalogManager, so this code will not compile (and cannot be stubbed with plain Mockito). Introduce a non-private seam for wrapper creation (e.g., injectable factory) or adjust the test to stub public/protected methods instead.

Copilot uses AI. Check for mistakes.
Comment on lines +596 to +598
Mockito.doReturn(freshWrapper)
.when(catalogManager)
.createCatalogWrapper(any(CatalogEntity.class), eq(null));
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The doReturn(freshWrapper) stubbing here uses the same matchers as the preceding doAnswer(...), which overrides the earlier stub. As a result, the intended race simulation never executes. Use sequential stubbing (doAnswer(...).doReturn(...)) or consolidate into a single Answer.

Copilot generated this review using guidance from repository custom instructions.
Comment on lines +590 to +592
}
return null;
};
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This Answer returns null, which would cause an NPE if the stub were actually used as the return value of createCatalogWrapper(...). Ensure the stub returns a valid CatalogWrapper (and trigger the stale-cache insertion as a side effect if needed).

Copilot generated this review using guidance from repository custom instructions.
@diqiu50 diqiu50 self-assigned this Apr 10, 2026
@diqiu50 diqiu50 marked this pull request as draft April 10, 2026 11:58
@diqiu50 diqiu50 marked this pull request as draft April 10, 2026 11:58
…on PR

- Replace TAG_REL with TAG_METADATA_OBJECT_REL in TestRelationalEntityStore
- Make createCatalogWrapper package-private to allow Mockito spy stubbing
- Use catalog() method instead of direct field access in alterCatalog
- Fix TestCatalogManager: use BaseCatalog mock, fix double-stub override,
  restore real method after test to prevent stub leaking across tests
- Fix delete() to invalidate cache in finally block on NoSuchEntityException
@github-actions
Copy link
Copy Markdown

Code Coverage Report

Overall Project 65.16% -0.02% 🟢
Files changed 62.68% 🟢

Module Coverage
aliyun 1.73% 🔴
api 47.09% 🟢
authorization-common 85.96% 🟢
aws 1.1% 🔴
azure 2.6% 🔴
catalog-common 10.2% 🔴
catalog-fileset 80.02% 🟢
catalog-glue 75.36% 🟢
catalog-hive 81.83% 🟢
catalog-jdbc-clickhouse 79.06% 🟢
catalog-jdbc-common 42.89% 🟢
catalog-jdbc-doris 80.28% 🟢
catalog-jdbc-hologres 54.03% 🟢
catalog-jdbc-mysql 79.23% 🟢
catalog-jdbc-oceanbase 78.38% 🟢
catalog-jdbc-postgresql 82.05% 🟢
catalog-jdbc-starrocks 78.27% 🟢
catalog-kafka 77.01% 🟢
catalog-lakehouse-generic 45.07% 🟢
catalog-lakehouse-hudi 79.1% 🟢
catalog-lakehouse-iceberg 87.27% 🟢
catalog-lakehouse-paimon 77.71% 🟢
catalog-model 77.72% 🟢
cli 44.51% 🟢
client-java 77.63% 🟢
common 48.97% 🟢
core 81.41% -0.59% 🟢
filesystem-hadoop3 76.97% 🟢
flink 40.55% 🟢
flink-runtime 0.0% 🔴
gcp 14.2% 🔴
hadoop-common 10.39% 🔴
hive-metastore-common 46.14% 🟢
iceberg-common 50.73% 🟢
iceberg-rest-server 66.03% 🟢
integration-test-common 0.0% 🔴
jobs 66.17% 🟢
lance-common 23.88% 🔴
lance-rest-server 57.84% 🟢
lineage 53.02% 🟢
optimizer 82.95% 🟢
optimizer-api 21.95% 🔴
server 85.89% 🟢
server-common 69.52% 🟢
spark 32.79% 🔴
spark-common 39.09% 🔴
trino-connector 33.83% 🔴
Files
Module File Coverage
core CatalogManager.java 64.08% 🟢
RelationalEntityStore.java 57.69% 🔴

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug report] Cache invalidation can expose stale metadata under concurrent access

2 participants