[FLINK-39421] Metadata filter push-down for table sources#27913
[FLINK-39421] Metadata filter push-down for table sources#27913twalthr merged 1 commit intoapache:masterfrom
Conversation
2b6f73d to
f96af04
Compare
| String.format( | ||
| "%s does not support SupportsReadingMetadata.", | ||
| tableSource.getClass().getName())); | ||
| } |
There was a problem hiding this comment.
we should also check whether the source still returns supportsMetadataFilterPushDown = true.
| Option.apply( | ||
| context.getTypeFactory().buildRelNodeRowType(metadataKeyRowType))); | ||
|
|
||
| List<Expression> filters = |
There was a problem hiding this comment.
to avoid code deduplication, feel free to create a package visible helper method in FilterPushDownSpec and reuse it here.
There was a problem hiding this comment.
Done. Added resolvePredicates.
| return new Tuple2<>(result, newTableSourceTable); | ||
| } | ||
|
|
||
| /** Replaces SQL alias names with metadata key names in the RowType. */ |
There was a problem hiding this comment.
Is there a change that fields can collide?
CREATE TABLE t (offset INT, msg_offset INT METADATA FROM 'offset')
There was a problem hiding this comment.
We should most likely only store metadata columns in this row, not combined with physical.
There was a problem hiding this comment.
Good catch. I've addressed this and added a test for it.
a4306b8 to
772158d
Compare
|
@twalthr thanks for the review. I used two separate force pushes to rebase on the latest master and then address comments in https://github.com/apache/flink/compare/a4306b81c155ef4793cacbfe843ea641e0b3b060..772158d6d27dfb0e09d402379ac6cab64d40d0fe. That diff should make reviewing the changes easier (if it helps). |
| PlannerMocks plannerMocks = PlannerMocks.create(); | ||
| SerdeContext serdeCtx = | ||
| configuredSerdeContext( | ||
| plannerMocks.getCatalogManager(), plannerMocks.getTableConfig()); | ||
|
|
||
| String json = toJson(serdeCtx, original); | ||
| MetadataFilterPushDownSpec deserialized = | ||
| toObject(serdeCtx, json, MetadataFilterPushDownSpec.class); |
There was a problem hiding this comment.
to avoid code deduplication, add your test case to the existing testDynamicTableSinkSpecSerde list of serde items
There was a problem hiding this comment.
Ok, doing that. In the process, I've extended the TestValues source to make that possible.
ec8b483 to
d5e344e
Compare
Add a dedicated metadata filter push-down path through SupportsReadingMetadata. Metadata predicates are classified separately from physical predicates and pushed via applyMetadataFilters() with a dedicated MetadataFilterResult type. MetadataFilterPushDownSpec stores a metadata-only predicateRowType whose field names are metadata keys (not SQL aliases). Physical columns are not included, which avoids name collisions with metadata keys (e.g. `offset INT, msg_offset INT METADATA FROM 'offset'`). Predicate RexInputRefs are remapped to index into the metadata-only row during rule application, so the serialized spec is self-contained and does not need a column-to-key map. A package-private FilterPushDownSpec.resolvePredicates() helper handles the RexNode -> ResolvedExpression conversion shared between the physical and metadata paths. On plan restoration, applyMetadataFilters() re-verifies both that the source is still a SupportsReadingMetadata AND that it still reports supportsMetadataFilterPushDown() = true. Tests assert on the exact ResolvedExpression list toString (matching the convention in DeletePushDownUtilsTest), including a testPhysicalAndMetadataNameCollision case that covers the physical-vs-metadata-key field name collision scenario. Generated-by: Claude Code (claude-opus-4-6) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
d5e344e to
eb1b6fc
Compare
Add dedicated metadata filter push-down path through SupportsReadingMetadata. Metadata predicates are classified separately from physical predicates and pushed via applyMetadataFilters() with a dedicated MetadataFilterResult type.
MetadataFilterPushDownSpec stores a metadata-only predicateRowType whose field names are metadata keys (not SQL aliases). Physical columns are not included, which avoids name collisions with metadata keys (e.g.
offset INT, msg_offset INT METADATA FROM 'offset'). Predicate RexInputRefs are remapped to index into the metadata-only row during rule application, so the serialized spec is self-contained and does not need a column-to-key map.Generated-by: Claude Code
What is the purpose of the change
Predicates on metadata columns (e.g., Kafka
offset,timestamp,partition) cannot be pushed through the existingSupportsFilterPushDownpath becauseFilterPushDownSpec's serializedRexInputRefindices break during compiled plan restoration whenProjectPushDownSpecnarrows the row type. This PR adds a dedicated metadata filter push-down path that solves the serialization problem and enables metadata-aware source optimizations. See FLIP-574 for the full design.Brief change log
SupportsReadingMetadatawithsupportsMetadataFilterPushDown()andapplyMetadataFilters()default methods and aMetadataFilterResulttypeMetadataFilterPushDownSpecthat stores a metadata-only predicate row type (field names are metadata keys) and re-verifiessupportsMetadataFilterPushDown()on plan restorePushFilterIntoTableSourceScanRuleto classify predicates as physical, metadata, or mixed, with a two-path push-down flowPushFilterIntoSourceScanRuleBasebuilds a metadata-only row type andRexShuttle-remapsRexInputRefindices so the stored predicates reference only metadata columns — avoids field-name collisions with physical columns of the same nameFilterPushDownSpec.resolvePredicates()as a package-private helper shared between the physical and metadata pathsVerifying this change
This change added tests and can be verified as follows:
MetadataFilterInReadingMetadataTestwith 7 planner rule tests: basic push-down, opt-out when unsupported, aliased metadata keys, mixed physical+metadata separation, partial acceptance, interaction with projection push-down, and physical-vs-metadata field name collisionMetadataFilterPushDownSpecentry to the existing parameterizedtestDynamicTableSinkSpecSerdelist inDynamicTableSourceSpecSerdeTest(spec3) to verify serde round-trip and full source applyenable-metadata-filter-push-downoption toTestValuesTableFactory(mirrorsenable-watermark-push-down) to gate the new capability in testsDoes this pull request potentially affect one of the following parts:
@Public(Evolving): yes — new default methods on@PublicEvolvinginterfaceSupportsReadingMetadataDocumentation