You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
#4531 (closing #4526) fixed deep And / Or chains overflowing protobuf's default recursion limit (100) when the serialized plan is re-parsed, on the JVM via OperatorOuterClass.Operator.parseFrom (e.g. findShuffleScanIndices / explain) and in the Rust prost decoder. The fix flattens an associative chain (QueryPlanSerde.flattenAssociative) and rebuilds it as a balanced O(log n)-depth tree (QueryPlanSerde.createBalancedBinaryExpr), routing CometAnd / CometOr through it.
As @mbutrovich noted in review, that fix only covers the boolean case. The protobuf recursion limit applies to any deeply nested BinaryExpr, so a long left-deep chain of other associative binary operators hits the same overflow.
Proposal
Extend the rebalancing introduced in #4531 to other associative binary expressions, for example Add, Multiply, and the bitwise operators (BitwiseAnd, BitwiseOr, BitwiseXor). The flattenAssociative / createBalancedBinaryExpr helpers are already generic over the wrap function and the match/children predicates, so wiring up additional operators should be mechanical.
Considerations
Only rebalance operators that are genuinely associative in Comet's evaluation semantics. Integer Add / Multiply and the bitwise operators are associative. Floating-point Add / Multiply are not exactly associative, so reassociating them can change results; decide explicitly whether to rebalance those (and whether it matters given Spark itself reassociates) or restrict rebalancing to integral types.
Background
#4531 (closing #4526) fixed deep
And/Orchains overflowing protobuf's default recursion limit (100) when the serialized plan is re-parsed, on the JVM viaOperatorOuterClass.Operator.parseFrom(e.g.findShuffleScanIndices/ explain) and in the Rustprostdecoder. The fix flattens an associative chain (QueryPlanSerde.flattenAssociative) and rebuilds it as a balancedO(log n)-depth tree (QueryPlanSerde.createBalancedBinaryExpr), routingCometAnd/CometOrthrough it.As @mbutrovich noted in review, that fix only covers the boolean case. The protobuf recursion limit applies to any deeply nested
BinaryExpr, so a long left-deep chain of other associative binary operators hits the same overflow.Proposal
Extend the rebalancing introduced in #4531 to other associative binary expressions, for example
Add,Multiply, and the bitwise operators (BitwiseAnd,BitwiseOr,BitwiseXor). TheflattenAssociative/createBalancedBinaryExprhelpers are already generic over the wrap function and the match/children predicates, so wiring up additional operators should be mechanical.Considerations
Add/Multiplyand the bitwise operators are associative. Floating-pointAdd/Multiplyare not exactly associative, so reassociating them can change results; decide explicitly whether to rebalance those (and whether it matters given Spark itself reassociates) or restrict rebalancing to integral types.Relationship to #4531 / #4526
Follow-up to the boolean-only fix in #4531.