Skip to content

Rebalance deep associative binary expression chains (Add, Multiply, bitwise) to avoid protobuf recursion limit #4577

@andygrove

Description

@andygrove

Background

#4531 (closing #4526) fixed deep And / Or chains overflowing protobuf's default recursion limit (100) when the serialized plan is re-parsed, on the JVM via OperatorOuterClass.Operator.parseFrom (e.g. findShuffleScanIndices / explain) and in the Rust prost decoder. The fix flattens an associative chain (QueryPlanSerde.flattenAssociative) and rebuilds it as a balanced O(log n)-depth tree (QueryPlanSerde.createBalancedBinaryExpr), routing CometAnd / CometOr through it.

As @mbutrovich noted in review, that fix only covers the boolean case. The protobuf recursion limit applies to any deeply nested BinaryExpr, so a long left-deep chain of other associative binary operators hits the same overflow.

Proposal

Extend the rebalancing introduced in #4531 to other associative binary expressions, for example Add, Multiply, and the bitwise operators (BitwiseAnd, BitwiseOr, BitwiseXor). The flattenAssociative / createBalancedBinaryExpr helpers are already generic over the wrap function and the match/children predicates, so wiring up additional operators should be mechanical.

Considerations

  • Only rebalance operators that are genuinely associative in Comet's evaluation semantics. Integer Add / Multiply and the bitwise operators are associative. Floating-point Add / Multiply are not exactly associative, so reassociating them can change results; decide explicitly whether to rebalance those (and whether it matters given Spark itself reassociates) or restrict rebalancing to integral types.
  • Mirror the existing test approach from fix: rebalance deep AND/OR chains to avoid protobuf recursion limit #4531: project a chain deeper than the recursion limit (e.g. 200) and assert Comet executes it natively with correct results.

Relationship to #4531 / #4526

Follow-up to the boolean-only fix in #4531.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions