[#10739] fix(core): Avoid stale cache refill during concurrent metadata updates#10740
[#10739] fix(core): Avoid stale cache refill during concurrent metadata updates#10740diqiu50 wants to merge 3 commits intoapache:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR addresses a concurrency race where cache invalidation happening before backend mutations allows concurrent readers to repopulate caches with stale metadata, leaving outdated catalog or relation metadata visible after writes.
Changes:
- Reorders cache invalidation in
CatalogManagerwrite paths to occur after store mutations, and forces updated wrappers into cache after rename. - Reorders cache invalidation in
RelationalEntityStorewrite paths (update/delete/relations) to occur after backend mutations. - Adds new unit tests intended to validate the new invalidation ordering and stale-refill prevention.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 9 comments.
| File | Description |
|---|---|
| core/src/main/java/org/apache/gravitino/catalog/CatalogManager.java | Moves invalidation after store mutations for alter/drop; uses cache put to prevent stale overwrite during rename. |
| core/src/main/java/org/apache/gravitino/storage/relational/RelationalEntityStore.java | Reorders invalidation after backend writes for entity and relation operations. |
| core/src/test/java/org/apache/gravitino/catalog/TestCatalogManager.java | Adds/updates tests for cache race scenarios around alter/drop. |
| core/src/test/java/org/apache/gravitino/storage/relational/TestRelationalEntityStore.java | Adds tests to verify invalidation happens after backend writes for entity and relation operations. |
Comments suppressed due to low confidence (1)
core/src/main/java/org/apache/gravitino/storage/relational/RelationalEntityStore.java:349
- Destination-side relation cache invalidation uses
srcEntityTypefordestEntitiesToAdd/Remove. For relations where destination entity types differ (e.g., TAG/POLICY vs metadata object), this misses destination cache keys and can leave stale relation results. Derive the destination type fromrelTypeor extend the API to accept destination entity type(s).
for (NameIdentifier destToAdd : destEntitiesToAdd) {
cache.invalidate(destToAdd, srcEntityType, relType);
}
for (NameIdentifier destToRemove : destEntitiesToRemove) {
| Answer<CatalogManager.CatalogWrapper> insertStaleWrapper = | ||
| invocation -> { | ||
| if (staleInserted.compareAndSet(false, true)) { | ||
| catalogManager.getCatalogCache().put(NameIdentifier.of("metalake", "cache_race_test_renamed"), staleWrapper); |
There was a problem hiding this comment.
The long put(...) call exceeds typical Google Java Style line length and also reduces readability in tests. Please wrap the arguments onto multiple lines consistent with the rest of this file’s formatting.
| catalogManager.getCatalogCache().put(NameIdentifier.of("metalake", "cache_race_test_renamed"), staleWrapper); | |
| catalogManager | |
| .getCatalogCache() | |
| .put( | |
| NameIdentifier.of("metalake", "cache_race_test_renamed"), staleWrapper); |
| import java.util.List; | ||
| import java.util.function.Function; | ||
| import org.apache.commons.lang3.reflect.FieldUtils; | ||
| import org.apache.commons.lang3.tuple.Pair; |
There was a problem hiding this comment.
Pair is imported but not used in this test class. This will fail compilation/style checks (Spotless/removeUnusedImports). Remove the unused import.
| import org.apache.commons.lang3.tuple.Pair; |
| .update(eq(ident), eq(Entity.EntityType.CATALOG), any(Function.class)); | ||
|
|
||
| store.update(ident, null, Entity.EntityType.CATALOG, entity -> entity); | ||
|
|
There was a problem hiding this comment.
This call passes null for the Class<E> type parameter, which can prevent generic type inference and fail compilation. Pass the concrete entity class used in this test (e.g., CatalogEntity.class / relevant type) instead of null.
| .invalidate( | ||
| destToAdd, Entity.EntityType.TABLE, SupportsRelationOperations.Type.TAG_REL); | ||
| Mockito.verify(cache, Mockito.never()) | ||
| .invalidate( | ||
| destToRemove, |
There was a problem hiding this comment.
The destination identifiers here look like tags, but the test expectations use EntityType.TABLE for destination-side invalidation. For tag relations (e.g., TAG_METADATA_OBJECT_REL), destination cache keys should be invalidated with EntityType.TAG (consistent with how insertRelation invalidates using dstType), otherwise destination relation caches can remain stale.
| .insertRelation( | ||
| SupportsRelationOperations.Type.TAG_REL, | ||
| src, |
There was a problem hiding this comment.
SupportsRelationOperations.Type does not define TAG_REL (the enum includes TAG_METADATA_OBJECT_REL, etc.), so this test will not compile. Replace TAG_REL with the correct relation type constant used by the implementation.
| try { | ||
| boolean deleted = backend.delete(ident, entityType, cascade); | ||
| cache.invalidate(ident, entityType); | ||
| return backend.delete(ident, entityType, cascade); | ||
| return deleted; | ||
| } catch (NoSuchEntityException e) { |
There was a problem hiding this comment.
If backend.delete(...) throws NoSuchEntityException, this method returns false without invalidating the cache entry. That can leave a stale cached entity even though the backend reports it missing. Invalidate the cache in the exception path as well (or use a finally).
| Mockito.doAnswer(insertStaleWrapper) | ||
| .when(catalogManager) | ||
| .createCatalogWrapper(any(CatalogEntity.class), eq(null)); | ||
| Mockito.doReturn(freshWrapper) | ||
| .when(catalogManager) |
There was a problem hiding this comment.
This test stubs CatalogManager#createCatalogWrapper(...), but the method is private in CatalogManager, so this code will not compile (and cannot be stubbed with plain Mockito). Introduce a non-private seam for wrapper creation (e.g., injectable factory) or adjust the test to stub public/protected methods instead.
| Mockito.doReturn(freshWrapper) | ||
| .when(catalogManager) | ||
| .createCatalogWrapper(any(CatalogEntity.class), eq(null)); |
There was a problem hiding this comment.
The doReturn(freshWrapper) stubbing here uses the same matchers as the preceding doAnswer(...), which overrides the earlier stub. As a result, the intended race simulation never executes. Use sequential stubbing (doAnswer(...).doReturn(...)) or consolidate into a single Answer.
| } | ||
| return null; | ||
| }; |
There was a problem hiding this comment.
This Answer returns null, which would cause an NPE if the stub were actually used as the return value of createCatalogWrapper(...). Ensure the stub returns a valid CatalogWrapper (and trigger the stale-cache insertion as a side effect if needed).
…on PR - Replace TAG_REL with TAG_METADATA_OBJECT_REL in TestRelationalEntityStore - Make createCatalogWrapper package-private to allow Mockito spy stubbing - Use catalog() method instead of direct field access in alterCatalog - Fix TestCatalogManager: use BaseCatalog mock, fix double-stub override, restore real method after test to prevent stub leaking across tests - Fix delete() to invalidate cache in finally block on NoSuchEntityException
Code Coverage Report
Files
|
What changes were proposed in this pull request?
Reorder cache invalidation in catalog and relational entity store write paths so concurrent reads do not refill cache with stale metadata.
Why are the changes needed?
Several write paths invalidate cache before the backend mutation completes.
Under concurrent access, another thread can miss the cache, read old data from the backend, and write that stale result back into cache. This can leave outdated catalog or relation metadata visible after rename, drop, or relation updates.
Fix: #10739
Does this PR introduce any user-facing change?
No.
How was this patch tested?
TestCatalogManagerTestRelationalEntityStore