perf: optimize array_replace for scalar needle#22387
Conversation
Jefffrey
left a comment
There was a problem hiding this comment.
One observation I have is this is a fast path for if from is a scalar, but it would be likely that to (and max too) might also be scalars in that case 🤔
Should we have the paths just be:
-- scalar fast path
select array_replace(array, scalar, scalar[, scalar]);
-- general fallback
select array_replace(array, scalar, array[, scalar]);
-- or any other combinationThoughts?
| /// uses a single bulk comparison for better performance. | ||
| fn general_replace_with_scalar<O: OffsetSizeTrait>( | ||
| list_array: &GenericListArray<O>, | ||
| needle: &ArrayRef, |
There was a problem hiding this comment.
I think needle should be a Scalar in the arguments here, to make it clear this is the scalar (without needing to read the docstring)
- This can be passed in all the way from
replace_with_scalar_needlefor example, since its still aScalarValueat that point
| capacity, | ||
| ); | ||
|
|
||
| let mut valid = NullBufferBuilder::new(list_array.len()); |
There was a problem hiding this comment.
I don't think we need a builder for nulls, we can copy the input array null buffer as is
| to_array: &ArrayRef, | ||
| arr_n: &[i64], | ||
| ) -> Result<ArrayRef> { | ||
| let mut offsets: Vec<O> = vec![O::usize_as(0)]; |
There was a problem hiding this comment.
Using OffsetBufferBuilder provides a nicer API for doing these operations (can just push length)
|
@Jefffrey Thanks for the review! I agree that to and max are also commonly scalar in practice and have already specialized them. but I think the performance gains here are relatively minor. The other suggestions have already been fixed. |
Which issue does this PR close?
Rationale for this change
Currently,
array_replace/array_replace_n/array_replace_allperform element-wise comparison by invokingcompare_element_to_listagainst each row's sub-array individually. When the needle is a scalar, this can be optimized by performing a single vectorizednot_distinctcomparison over the entire flattened values buffer.What changes are included in this PR?
arrow_ord::cmp::not_distinctwithScalarwrapper for a single bulk comparison pass over the flat values buffer.nvalues for LargeList/FixedSizeList types.Benchmarks
Are these changes tested?
Yes, existing and new slt edge-case tests in
array_replace.slt.Are there any user-facing changes?
No.