perf: optimize array_remove for scalar needle#22390
Conversation
neilconway
left a comment
There was a problem hiding this comment.
Nice performance win!
| ); | ||
| } | ||
| }; | ||
| let original_data = list_array.values().to_data(); |
There was a problem hiding this comment.
This will be inefficient for sliced arrays.
There was a problem hiding this comment.
I now slice the values to the range actually referenced by the offsets.
That said, I wanted to understand your concern better: when a GenericListArray is sliced, values() returns the full underlying array, and to_data() on it wraps the existing buffer references into ArrayData without copying. So the main downside I could identify is that Capacities::Array(original_data.len()) over-estimates the pre-allocation for sliced inputs. Were you thinking of a different inefficiency, or is the over-allocation what you had in mind?
There was a problem hiding this comment.
The over-allocation was one part, but the bigger concern is calling the distinct kernel on the entire values buffer (see other comment).
| let list_array = array.as_list::<i64>(); | ||
| general_remove_with_scalar::<i64>(list_array, needle, arr_n) | ||
| } | ||
| array_type => exec_err!("array_remove does not support type '{array_type}'."), |
There was a problem hiding this comment.
This is called by more than just array_remove; can we improve the error message?
| for (i, keep) in eq_array.iter().enumerate() { | ||
| if keep == Some(false) && removed < max_removals { | ||
| if let Some(bs) = pending_batch_to_retain { | ||
| mutable.extend(0, start + bs, start + i); | ||
| copied += i - bs; | ||
| pending_batch_to_retain = None; | ||
| } | ||
| removed += 1; | ||
| } else if pending_batch_to_retain.is_none() { | ||
| pending_batch_to_retain = Some(i); | ||
| } | ||
| } |
There was a problem hiding this comment.
I wonder if it would be possible to iterate only over the "false" bits, e.g., by negating the buffer and looking at BooleanBuffer::set_indices.
There was a problem hiding this comment.
Great suggestion. Benchmarks show a ~20–40% improvement with this optimization.
| let mut copied = 0usize; | ||
| let mut pending_batch_to_retain: Option<usize> = None; | ||
| for (i, keep) in eq_array.iter().enumerate() { | ||
| if keep == Some(false) && removed < max_removals { |
There was a problem hiding this comment.
Can we break from the loop once we hit max_removals?
There was a problem hiding this comment.
Good point. now break early once max_removals is reached.
| // Iterate only over the positions that need removal using set_indices, | ||
| // which is more efficient than scanning every bit. |
There was a problem hiding this comment.
Might be worth elaborating that the win here is mostly because we expect the # of values-to-remove is a lot smaller than the total array size, which it usually (but not always) will be.
| let keep_mask = | ||
| arrow_ord::cmp::distinct(list_array.values(), &Scalar::new(Arc::clone(needle)))?; |
There was a problem hiding this comment.
This will call the distinct kernel on all the elements in the value buffer, not just the ones that are visible in a sliced array.
| ); | ||
| } | ||
| }; | ||
| let original_data = list_array.values().to_data(); |
There was a problem hiding this comment.
The over-allocation was one part, but the bigger concern is calling the distinct kernel on the entire values buffer (see other comment).
Which issue does this PR close?
Rationale for this change
Similar to #22387 (array_replace scalar optimization)
array_remove/array_remove_n/array_remove_allperform element-wise comparison by invokingcompare_element_to_listagainst each row's sub-array individually. When the needle is a scalar, this can be optimized by performing a single vectorizeddistinctcomparison over the entire flattened values buffer.What changes are included in this PR?
general_remove_with_scalar) that usesarrow_ord::cmp::distinctwithScalarwrapper for a single bulk comparison pass over the flat values buffer.nvalues, and LargeList type coverage.Benchmarks
Are these changes tested?
Yes, existing and new SLT edge-case tests in
array_remove.slt.Are there any user-facing changes?
No.