[#10737] fix(core): Avoid blocking dropCatalog on imported schemas#10738
[#10737] fix(core): Avoid blocking dropCatalog on imported schemas#10738diqiu50 wants to merge 7 commits intoapache:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
Fixes CatalogManager.dropCatalog(force=false) schema classification so catalogs containing only imported schemas no longer incorrectly fail with NonEmptyCatalogException.
Changes:
- Add an entity-store marker property for schemas created via Gravitino (
gravitino.created=true). - Update
CatalogManager.containsUserCreatedSchemas(...)to treat schemas as user-created only when they are marked in the entity store or have aStringIdentifierembedded in external schema metadata. - Extend
TestCatalogManager.testDropCatalogwith coverage for imported schemas, missing schemas, and classification failures.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| core/src/main/java/org/apache/gravitino/catalog/SchemaOperationDispatcher.java | Marks newly created (unmanaged) schemas in the entity store to support later classification. |
| core/src/main/java/org/apache/gravitino/catalog/CatalogManager.java | Changes the “non-empty catalog” check to ignore imported schemas and fail conservatively on unexpected classification errors. |
| core/src/test/java/org/apache/gravitino/catalog/TestCatalogManager.java | Adds unit coverage for the updated drop-catalog behavior across several scenarios. |
| LOG.warn( | ||
| "Schema {} no longer exists while checking whether it is user-created", | ||
| e.nameIdentifier(), | ||
| ex); |
There was a problem hiding this comment.
This logs at WARN (with stack trace) when loadSchema throws NoSuchSchemaException, but this can happen due to a normal race between listSchemas and loadSchema (and you already filtered by availableSchemaNames). Consider lowering to DEBUG or logging without the exception to avoid noisy logs during catalog drop.
| LOG.warn( | |
| "Schema {} no longer exists while checking whether it is user-created", | |
| e.nameIdentifier(), | |
| ex); | |
| LOG.debug( | |
| "Schema {} no longer exists while checking whether it is user-created", | |
| e.nameIdentifier()); |
Code Coverage Report
Files
|
…assification - Replace stream/lambda with explicit for loop for clarity - Change log level from WARN to DEBUG for missing schema during check - Split testDropCatalog into 4 focused test methods for better isolation
Replace magic string "true" with a named constant SCHEMA_CREATED_BY_GRAVITINO_VALUE for better readability and maintainability.
| if (entityProps != null | ||
| && SchemaOperationDispatcher.SCHEMA_CREATED_BY_GRAVITINO_VALUE.equals( | ||
| entityProps.get(SchemaOperationDispatcher.SCHEMA_CREATED_BY_GRAVITINO))) { | ||
| return true; |
There was a problem hiding this comment.
Since SCHEMA_CREATED_BY_GRAVITINO_VALUE is introduced by this version only, can this apply to older version?
There was a problem hiding this comment.
This is not a problem here, since schemas are not deleted.
| .withNamespace(ident.namespace()) | ||
| .withProperties( | ||
| ImmutableMap.of( | ||
| SCHEMA_CREATED_BY_GRAVITINO, SCHEMA_CREATED_BY_GRAVITINO_VALUE)) |
There was a problem hiding this comment.
By now, the Gravitino entity store will not store properties,so we may need to think twice about this point.
There was a problem hiding this comment.
Good catch. We've reworked the approach to avoid storing anything in the entity store. Instead, containsUserCreatedSchemas now calls loadSchema for each schema and checks StringIdentifier.fromProperties(schema.properties()) — schemas created via Gravitino carry the identifier, imported ones do not.
One edge case: backends that cannot store a StringIdentifier (e.g., MySQL) return null properties, making it impossible to distinguish user-created from imported schemas. In that case, dropCatalog(force=false) has no meaningful semantic, so we suggest logging a warning instead of throwing NonEmptyCatalogException. WDYT?
| public void testDropCatalogIgnoresMissingSchema() throws Exception { | ||
| NameIdentifier ident = NameIdentifier.of("metalake", "test41"); | ||
| Map<String, String> props = | ||
| ImmutableMap.of( | ||
| "provider", | ||
| "test", | ||
| PROPERTY_KEY1, | ||
| "value1", | ||
| PROPERTY_KEY2, | ||
| "value2", | ||
| PROPERTY_KEY5_PREFIX + "1", | ||
| "value3"); | ||
| String comment = "comment"; | ||
|
|
||
| Catalog catalog = | ||
| catalogManager.createCatalog(ident, Catalog.Type.RELATIONAL, provider, comment, props); | ||
| Mockito.doCallRealMethod().when(catalogManager).loadCatalogAndWrap(ident); | ||
| Assertions.assertDoesNotThrow(() -> catalogManager.disableCatalog(ident)); | ||
| CatalogEntity catalogEntity = entityStore.get(ident, EntityType.CATALOG, CatalogEntity.class); | ||
| FieldUtils.writeField(catalog, "entity", catalogEntity, true); | ||
|
|
||
| CatalogManager.CatalogWrapper wrapper = Mockito.mock(CatalogManager.CatalogWrapper.class); | ||
| Capability capability = Mockito.mock(Capability.class); | ||
| CapabilityResult unsupportedResult = CapabilityResult.unsupported("Not managed"); | ||
| Mockito.doReturn(wrapper).when(catalogManager).loadCatalogAndWrap(ident); | ||
| Mockito.doReturn(catalog).when(wrapper).catalog(); | ||
| Mockito.doReturn(capability).when(wrapper).capabilities(); | ||
| Mockito.doReturn(unsupportedResult).when(capability).managedStorage(any()); | ||
| Mockito.doReturn(new NameIdentifier[] {NameIdentifier.of("metalake", "test41", "default")}) | ||
| .doThrow(new NoSuchSchemaException("Schema not found")) | ||
| .when(wrapper) | ||
| .doWithSchemaOps(any()); | ||
|
|
||
| // Schema disappearing between listSchemas and loadSchema should not block drop. | ||
| Assertions.assertTrue(catalogManager.dropCatalog(ident)); | ||
| } |
There was a problem hiding this comment.
testDropCatalogIgnoresMissingSchema doesn’t add any SchemaEntity to the entity store, so dropCatalog() will short-circuit on schemaEntities.isEmpty() and never exercise the intended listSchemas/loadSchema race handling. Add a matching SchemaEntity (e.g., for default) so containsUserCreatedSchemas actually calls loadSchema and hits the NoSuchSchemaException path.
| @Test | ||
| public void testDropCatalogSkipsImportedSchemas() throws Exception { | ||
| NameIdentifier ident = NameIdentifier.of("metalake", "test41"); | ||
| Map<String, String> props = | ||
| ImmutableMap.of( | ||
| "provider", | ||
| "test", | ||
| PROPERTY_KEY1, | ||
| "value1", | ||
| PROPERTY_KEY2, | ||
| "value2", | ||
| PROPERTY_KEY5_PREFIX + "1", | ||
| "value3"); | ||
| String comment = "comment"; | ||
|
|
||
| Catalog catalog = | ||
| catalogManager.createCatalog(ident, Catalog.Type.RELATIONAL, provider, comment, props); | ||
| Mockito.doCallRealMethod().when(catalogManager).loadCatalogAndWrap(ident); | ||
| Assertions.assertDoesNotThrow(() -> catalogManager.disableCatalog(ident)); | ||
| CatalogEntity catalogEntity = entityStore.get(ident, EntityType.CATALOG, CatalogEntity.class); | ||
| FieldUtils.writeField(catalog, "entity", catalogEntity, true); | ||
|
|
There was a problem hiding this comment.
These tests stub catalogManager.loadCatalogAndWrap(ident) on a static Mockito spy, but the @BeforeEach/@AfterEach reset only clears the entity store and doesn’t reset Mockito stubbings. Since multiple tests reuse the same catalog identifier (metalake.test41), stubs can leak across test methods and make ordering matter. Consider either using distinct catalog names per test or resetting the spy in reset() (e.g., Mockito.reset(catalogManager) and then re-spy/re-stub any common behavior).
|
Can you explain more about how you fixed this issue, I'm a little confused. |
|
The root cause is that The fix distinguishes user-created schemas from imported ones: a schema is considered user-created only if it carries a However, as @yuqi1129 pointed out, the entity store does not persist properties, so the marker approach needs to be rethought. |
…r to detect user-created schemas The previous approach stored a gravitino.created marker in SchemaEntity properties to distinguish user-created from imported schemas. This had two problems: entity store properties are not currently used for schema entities (backward-compat gap for existing schemas), and the marker could be lost when an import overwrites an existing entity. Schemas created by Gravitino already embed a StringIdentifier in the external catalog's properties. Use that as the sole signal in containsUserCreatedSchemas, making the check work correctly for both old and new schemas without any migration. Also add a SchemaEntity to testDropCatalogIgnoresMissingSchema so the NoSuchSchemaException race path is actually exercised.
…SQL) conservatively When a schema's properties are null after loadSchema, we cannot tell whether the backend failed to store the StringIdentifier (e.g., MySQL schema comment not supported) or the schema is truly imported. Treat null properties as user-created to avoid accidental data loss on such backends. Only skip a schema when properties are non-null and contain no StringIdentifier, which is the reliable signal of an imported schema on backends that do support identifier storage.
… regression MySQL's JdbcDatabaseOperations.load() returns ImmutableMap.of() (non-null empty map) since it cannot store schema comments or properties. The previous null-only check missed this case, causing user-created MySQL schemas to be misclassified as imported and allowing dropCatalog(force=false) to succeed unexpectedly. Only schemas with non-null, non-empty properties containing no StringIdentifier are reliably identified as imported (on backends like Hive/Iceberg that support property storage). All other cases are treated conservatively as user-created.
What changes were proposed in this pull request?
Fix schema classification in
dropCatalog(force = false)so imported schemas do not block catalog deletion.Why are the changes needed?
Imported schemas can be written into the entity store during metadata synchronization and later be misclassified as user-created schemas.
That makes
dropCatalog(force = false)fail withNonEmptyCatalogExceptioneven though the remaining schema was imported from the external catalog.Fix: #10737
Does this PR introduce any user-facing change?
dropCatalog(force = false)no longer incorrectly fails when only imported schemas remain.How was this patch tested?
TestCatalogManager