Skip to content

[#10737] fix(core): Avoid blocking dropCatalog on imported schemas#10738

Open
diqiu50 wants to merge 7 commits intoapache:mainfrom
diqiu50:upstream/drop-catalog-imported-schema
Open

[#10737] fix(core): Avoid blocking dropCatalog on imported schemas#10738
diqiu50 wants to merge 7 commits intoapache:mainfrom
diqiu50:upstream/drop-catalog-imported-schema

Conversation

@diqiu50
Copy link
Copy Markdown
Contributor

@diqiu50 diqiu50 commented Apr 10, 2026

What changes were proposed in this pull request?

Fix schema classification in dropCatalog(force = false) so imported schemas do not block catalog deletion.

Why are the changes needed?

Imported schemas can be written into the entity store during metadata synchronization and later be misclassified as user-created schemas.

That makes dropCatalog(force = false) fail with NonEmptyCatalogException even though the remaining schema was imported from the external catalog.

Fix: #10737

Does this PR introduce any user-facing change?

dropCatalog(force = false) no longer incorrectly fails when only imported schemas remain.

How was this patch tested?

  • added unit coverage in TestCatalogManager

Copilot AI review requested due to automatic review settings April 10, 2026 10:05
@diqiu50 diqiu50 self-assigned this Apr 10, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes CatalogManager.dropCatalog(force=false) schema classification so catalogs containing only imported schemas no longer incorrectly fail with NonEmptyCatalogException.

Changes:

  • Add an entity-store marker property for schemas created via Gravitino (gravitino.created=true).
  • Update CatalogManager.containsUserCreatedSchemas(...) to treat schemas as user-created only when they are marked in the entity store or have a StringIdentifier embedded in external schema metadata.
  • Extend TestCatalogManager.testDropCatalog with coverage for imported schemas, missing schemas, and classification failures.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
core/src/main/java/org/apache/gravitino/catalog/SchemaOperationDispatcher.java Marks newly created (unmanaged) schemas in the entity store to support later classification.
core/src/main/java/org/apache/gravitino/catalog/CatalogManager.java Changes the “non-empty catalog” check to ignore imported schemas and fail conservatively on unexpected classification errors.
core/src/test/java/org/apache/gravitino/catalog/TestCatalogManager.java Adds unit coverage for the updated drop-catalog behavior across several scenarios.

Comment thread core/src/main/java/org/apache/gravitino/catalog/CatalogManager.java Outdated
Comment on lines +902 to +905
LOG.warn(
"Schema {} no longer exists while checking whether it is user-created",
e.nameIdentifier(),
ex);
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logs at WARN (with stack trace) when loadSchema throws NoSuchSchemaException, but this can happen due to a normal race between listSchemas and loadSchema (and you already filtered by availableSchemaNames). Consider lowering to DEBUG or logging without the exception to avoid noisy logs during catalog drop.

Suggested change
LOG.warn(
"Schema {} no longer exists while checking whether it is user-created",
e.nameIdentifier(),
ex);
LOG.debug(
"Schema {} no longer exists while checking whether it is user-created",
e.nameIdentifier());

Copilot uses AI. Check for mistakes.
Comment thread core/src/test/java/org/apache/gravitino/catalog/TestCatalogManager.java Outdated
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 10, 2026

Code Coverage Report

Overall Project 65.17% +0.02% 🟢
Files changed 67.9% 🟢

Module Coverage
aliyun 1.73% 🔴
api 47.09% 🟢
authorization-common 85.96% 🟢
aws 1.1% 🔴
azure 2.6% 🔴
catalog-common 10.2% 🔴
catalog-fileset 80.02% 🟢
catalog-glue 75.36% 🟢
catalog-hive 81.83% 🟢
catalog-jdbc-clickhouse 79.06% 🟢
catalog-jdbc-common 42.89% 🟢
catalog-jdbc-doris 80.28% 🟢
catalog-jdbc-hologres 54.03% 🟢
catalog-jdbc-mysql 79.23% 🟢
catalog-jdbc-oceanbase 78.38% 🟢
catalog-jdbc-postgresql 82.05% 🟢
catalog-jdbc-starrocks 78.27% 🟢
catalog-kafka 77.01% 🟢
catalog-lakehouse-generic 45.07% 🟢
catalog-lakehouse-hudi 79.1% 🟢
catalog-lakehouse-iceberg 87.27% 🟢
catalog-lakehouse-paimon 77.71% 🟢
catalog-model 77.72% 🟢
cli 44.51% 🟢
client-java 77.63% 🟢
common 48.97% 🟢
core 81.47% -0.34% 🟢
filesystem-hadoop3 76.97% 🟢
flink 40.55% 🟢
flink-runtime 0.0% 🔴
gcp 14.2% 🔴
hadoop-common 10.39% 🔴
hive-metastore-common 46.14% 🟢
iceberg-common 50.73% 🟢
iceberg-rest-server 65.93% 🟢
integration-test-common 0.0% 🔴
jobs 66.17% 🟢
lance-common 23.88% 🔴
lance-rest-server 57.84% 🟢
lineage 53.02% 🟢
optimizer 82.95% 🟢
optimizer-api 21.95% 🔴
server 85.75% 🟢
server-common 69.52% 🟢
spark 32.79% 🔴
spark-common 39.09% 🔴
trino-connector 33.83% 🔴
Files
Module File Coverage
core CatalogManager.java 67.9% 🟢

diqiu50 added 2 commits April 13, 2026 15:44
…assification

- Replace stream/lambda with explicit for loop for clarity
- Change log level from WARN to DEBUG for missing schema during check
- Split testDropCatalog into 4 focused test methods for better isolation
Replace magic string "true" with a named constant SCHEMA_CREATED_BY_GRAVITINO_VALUE
for better readability and maintainability.
@diqiu50 diqiu50 requested a review from yuqi1129 April 13, 2026 09:12
@yuqi1129 yuqi1129 requested a review from roryqi April 13, 2026 09:49
if (entityProps != null
&& SchemaOperationDispatcher.SCHEMA_CREATED_BY_GRAVITINO_VALUE.equals(
entityProps.get(SchemaOperationDispatcher.SCHEMA_CREATED_BY_GRAVITINO))) {
return true;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since SCHEMA_CREATED_BY_GRAVITINO_VALUE is introduced by this version only, can this apply to older version?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a problem here, since schemas are not deleted.

.withNamespace(ident.namespace())
.withProperties(
ImmutableMap.of(
SCHEMA_CREATED_BY_GRAVITINO, SCHEMA_CREATED_BY_GRAVITINO_VALUE))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By now, the Gravitino entity store will not store properties,so we may need to think twice about this point.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. We've reworked the approach to avoid storing anything in the entity store. Instead, containsUserCreatedSchemas now calls loadSchema for each schema and checks StringIdentifier.fromProperties(schema.properties()) — schemas created via Gravitino carry the identifier, imported ones do not.

One edge case: backends that cannot store a StringIdentifier (e.g., MySQL) return null properties, making it impossible to distinguish user-created from imported schemas. In that case, dropCatalog(force=false) has no meaningful semantic, so we suggest logging a warning instead of throwing NonEmptyCatalogException. WDYT?

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

Comment on lines +649 to +684
public void testDropCatalogIgnoresMissingSchema() throws Exception {
NameIdentifier ident = NameIdentifier.of("metalake", "test41");
Map<String, String> props =
ImmutableMap.of(
"provider",
"test",
PROPERTY_KEY1,
"value1",
PROPERTY_KEY2,
"value2",
PROPERTY_KEY5_PREFIX + "1",
"value3");
String comment = "comment";

Catalog catalog =
catalogManager.createCatalog(ident, Catalog.Type.RELATIONAL, provider, comment, props);
Mockito.doCallRealMethod().when(catalogManager).loadCatalogAndWrap(ident);
Assertions.assertDoesNotThrow(() -> catalogManager.disableCatalog(ident));
CatalogEntity catalogEntity = entityStore.get(ident, EntityType.CATALOG, CatalogEntity.class);
FieldUtils.writeField(catalog, "entity", catalogEntity, true);

CatalogManager.CatalogWrapper wrapper = Mockito.mock(CatalogManager.CatalogWrapper.class);
Capability capability = Mockito.mock(Capability.class);
CapabilityResult unsupportedResult = CapabilityResult.unsupported("Not managed");
Mockito.doReturn(wrapper).when(catalogManager).loadCatalogAndWrap(ident);
Mockito.doReturn(catalog).when(wrapper).catalog();
Mockito.doReturn(capability).when(wrapper).capabilities();
Mockito.doReturn(unsupportedResult).when(capability).managedStorage(any());
Mockito.doReturn(new NameIdentifier[] {NameIdentifier.of("metalake", "test41", "default")})
.doThrow(new NoSuchSchemaException("Schema not found"))
.when(wrapper)
.doWithSchemaOps(any());

// Schema disappearing between listSchemas and loadSchema should not block drop.
Assertions.assertTrue(catalogManager.dropCatalog(ident));
}
Copy link

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

testDropCatalogIgnoresMissingSchema doesn’t add any SchemaEntity to the entity store, so dropCatalog() will short-circuit on schemaEntities.isEmpty() and never exercise the intended listSchemas/loadSchema race handling. Add a matching SchemaEntity (e.g., for default) so containsUserCreatedSchemas actually calls loadSchema and hits the NoSuchSchemaException path.

Copilot uses AI. Check for mistakes.
Comment on lines +594 to +615
@Test
public void testDropCatalogSkipsImportedSchemas() throws Exception {
NameIdentifier ident = NameIdentifier.of("metalake", "test41");
Map<String, String> props =
ImmutableMap.of(
"provider",
"test",
PROPERTY_KEY1,
"value1",
PROPERTY_KEY2,
"value2",
PROPERTY_KEY5_PREFIX + "1",
"value3");
String comment = "comment";

Catalog catalog =
catalogManager.createCatalog(ident, Catalog.Type.RELATIONAL, provider, comment, props);
Mockito.doCallRealMethod().when(catalogManager).loadCatalogAndWrap(ident);
Assertions.assertDoesNotThrow(() -> catalogManager.disableCatalog(ident));
CatalogEntity catalogEntity = entityStore.get(ident, EntityType.CATALOG, CatalogEntity.class);
FieldUtils.writeField(catalog, "entity", catalogEntity, true);

Copy link

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests stub catalogManager.loadCatalogAndWrap(ident) on a static Mockito spy, but the @BeforeEach/@AfterEach reset only clears the entity store and doesn’t reset Mockito stubbings. Since multiple tests reuse the same catalog identifier (metalake.test41), stubs can leak across test methods and make ordering matter. Consider either using distinct catalog names per test or resetting the spy in reset() (e.g., Mockito.reset(catalogManager) and then re-spy/re-stub any common behavior).

Copilot uses AI. Check for mistakes.
@jerryshao
Copy link
Copy Markdown
Contributor

Can you explain more about how you fixed this issue, I'm a little confused.

@diqiu50
Copy link
Copy Markdown
Contributor Author

diqiu50 commented Apr 16, 2026

The root cause is that dropCatalog(force=false) checks whether any schemas exist in the entity store, but imported schemas (synced from external catalogs via search listener load) are also written to the entity store — so they were mistakenly treated as user-created schemas, causing a false NonEmptyCatalogException.

The fix distinguishes user-created schemas from imported ones: a schema is considered user-created only if it carries a StringIdentifier in its properties (written by Gravitino when the user creates it via API) or is marked with gravitino.created=true in the entity store.

However, as @yuqi1129 pointed out, the entity store does not persist properties, so the marker approach needs to be rethought.

diqiu50 added 3 commits April 16, 2026 22:51
…r to detect user-created schemas

The previous approach stored a gravitino.created marker in SchemaEntity
properties to distinguish user-created from imported schemas. This had two
problems: entity store properties are not currently used for schema entities
(backward-compat gap for existing schemas), and the marker could be lost
when an import overwrites an existing entity.

Schemas created by Gravitino already embed a StringIdentifier in the
external catalog's properties. Use that as the sole signal in
containsUserCreatedSchemas, making the check work correctly for both
old and new schemas without any migration.

Also add a SchemaEntity to testDropCatalogIgnoresMissingSchema so the
NoSuchSchemaException race path is actually exercised.
…SQL) conservatively

When a schema's properties are null after loadSchema, we cannot tell whether
the backend failed to store the StringIdentifier (e.g., MySQL schema comment
not supported) or the schema is truly imported. Treat null properties as
user-created to avoid accidental data loss on such backends.

Only skip a schema when properties are non-null and contain no StringIdentifier,
which is the reliable signal of an imported schema on backends that do support
identifier storage.
… regression

MySQL's JdbcDatabaseOperations.load() returns ImmutableMap.of() (non-null
empty map) since it cannot store schema comments or properties. The previous
null-only check missed this case, causing user-created MySQL schemas to be
misclassified as imported and allowing dropCatalog(force=false) to succeed
unexpectedly.

Only schemas with non-null, non-empty properties containing no StringIdentifier
are reliably identified as imported (on backends like Hive/Iceberg that support
property storage). All other cases are treated conservatively as user-created.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug report] DropCatalog misclassifies imported schemas

4 participants