Skip to content

perf: optimize array_replace for scalar needle#22387

Open
lyne7-sc wants to merge 6 commits into
apache:mainfrom
lyne7-sc:perf/replace
Open

perf: optimize array_replace for scalar needle#22387
lyne7-sc wants to merge 6 commits into
apache:mainfrom
lyne7-sc:perf/replace

Conversation

@lyne7-sc
Copy link
Copy Markdown
Contributor

Which issue does this PR close?

  • Closes #.

Rationale for this change

Currently, array_replace / array_replace_n / array_replace_all perform element-wise comparison by invoking compare_element_to_list against each row's sub-array individually. When the needle is a scalar, this can be optimized by performing a single vectorized not_distinct comparison over the entire flattened values buffer.

What changes are included in this PR?

  • Add a specialized replacement kernel that uses arrow_ord::cmp::not_distinct with Scalar wrapper for a single bulk comparison pass over the flat values buffer.
  • Extend SLT tests with multi-row scalar-argument coverage, empty-array edge cases, NULL needle replacement, and boundary n values for LargeList/FixedSizeList types.

Benchmarks

group                                                                      baseline                                optimized
-----                                                                      --------                                ---------
array_replace_all_int64/replace/list size: 10, num_rows: 4000              5.04  1124.5±146.98µs        ? ?/sec    1.00    223.1±2.79µs        ? ?/sec
array_replace_all_int64/replace/list size: 100, num_rows: 10000            1.64      7.2±0.59ms        ? ?/sec     1.00      4.4±0.12ms        ? ?/sec
array_replace_all_int64/replace/list size: 500, num_rows: 10000            1.16     25.3±4.09ms        ? ?/sec     1.00     21.8±0.69ms        ? ?/sec
array_replace_all_int64_nested/replace/list size: 10, num_rows: 4000       1.00      7.5±0.30ms        ? ?/sec     1.01      7.5±0.24ms        ? ?/sec
array_replace_all_int64_nested/replace/list size: 100, num_rows: 3000      1.00     38.5±0.52ms        ? ?/sec     1.02     39.2±1.02ms        ? ?/sec
array_replace_all_int64_nested/replace/list size: 300, num_rows: 1500      1.00     55.4±1.73ms        ? ?/sec     1.02     56.5±2.13ms        ? ?/sec
array_replace_boolean/replace/list size: 10, num_rows: 4000                4.57  1072.4±82.05µs        ? ?/sec     1.00    234.6±7.55µs        ? ?/sec
array_replace_boolean/replace/list size: 100, num_rows: 10000              2.38      3.7±0.43ms        ? ?/sec     1.00  1536.5±47.67µs        ? ?/sec
array_replace_boolean/replace/list size: 500, num_rows: 10000              1.51      6.5±0.51ms        ? ?/sec     1.00      4.3±0.12ms        ? ?/sec
array_replace_fixed_size_binary/replace/list size: 10, num_rows: 4000      3.61  1174.3±90.82µs        ? ?/sec     1.00   325.2±26.75µs        ? ?/sec
array_replace_fixed_size_binary/replace/list size: 100, num_rows: 10000    1.45      7.2±0.88ms        ? ?/sec     1.00      4.9±0.11ms        ? ?/sec
array_replace_fixed_size_binary/replace/list size: 500, num_rows: 10000    1.05     25.9±2.34ms        ? ?/sec     1.00     24.6±0.71ms        ? ?/sec
array_replace_int64/replace/list size: 10, num_rows: 4000                  5.49  1025.4±24.08µs        ? ?/sec     1.00   186.7±18.10µs        ? ?/sec
array_replace_int64/replace/list size: 100, num_rows: 10000                2.46      3.6±0.13ms        ? ?/sec     1.00  1455.7±138.70µs        ? ?/sec
array_replace_int64/replace/list size: 500, num_rows: 10000                1.26      7.0±0.75ms        ? ?/sec     1.00      5.6±0.77ms        ? ?/sec
array_replace_int64_nested/replace/list size: 10, num_rows: 4000           1.03      7.3±0.14ms        ? ?/sec     1.00      7.2±0.21ms        ? ?/sec
array_replace_int64_nested/replace/list size: 100, num_rows: 3000          1.03     37.8±1.62ms        ? ?/sec     1.00     36.7±0.43ms        ? ?/sec
array_replace_int64_nested/replace/list size: 300, num_rows: 1500          1.03     53.2±1.16ms        ? ?/sec     1.00     51.7±1.87ms        ? ?/sec
array_replace_n_int64/replace/list size: 10, num_rows: 4000                5.02  1074.4±30.92µs        ? ?/sec     1.00    214.1±2.22µs        ? ?/sec
array_replace_n_int64/replace/list size: 100, num_rows: 10000              1.83      5.0±0.15ms        ? ?/sec     1.00      2.7±0.06ms        ? ?/sec
array_replace_n_int64/replace/list size: 500, num_rows: 10000              1.17     15.5±1.11ms        ? ?/sec     1.00     13.3±0.24ms        ? ?/sec
array_replace_n_int64_nested/replace/list size: 10, num_rows: 4000         1.05      7.5±0.45ms        ? ?/sec     1.00      7.1±0.07ms        ? ?/sec
array_replace_n_int64_nested/replace/list size: 100, num_rows: 3000        1.02     37.4±0.51ms        ? ?/sec     1.00     36.5±0.62ms        ? ?/sec
array_replace_n_int64_nested/replace/list size: 300, num_rows: 1500        1.02     54.9±4.97ms        ? ?/sec     1.00     53.8±3.15ms        ? ?/sec
array_replace_strings/replace/list size: 10, num_rows: 4000                2.78  1408.8±44.99µs        ? ?/sec     1.00   506.6±16.32µs        ? ?/sec
array_replace_strings/replace/list size: 100, num_rows: 10000              1.32     11.0±1.25ms        ? ?/sec     1.00      8.3±0.37ms        ? ?/sec
array_replace_strings/replace/list size: 500, num_rows: 10000              1.14     42.4±6.39ms        ? ?/sec     1.00     37.2±0.74ms        ? ?/sec

Are these changes tested?

Yes, existing and new slt edge-case tests in array_replace.slt.

Are there any user-facing changes?

No.

@github-actions github-actions Bot added sqllogictest SQL Logic Tests (.slt) functions Changes to functions implementation labels May 20, 2026
Copy link
Copy Markdown
Contributor

@Jefffrey Jefffrey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One observation I have is this is a fast path for if from is a scalar, but it would be likely that to (and max too) might also be scalars in that case 🤔

Should we have the paths just be:

-- scalar fast path
select array_replace(array, scalar, scalar[, scalar]);
-- general fallback
select array_replace(array, scalar, array[, scalar]);
-- or any other combination

Thoughts?

/// uses a single bulk comparison for better performance.
fn general_replace_with_scalar<O: OffsetSizeTrait>(
list_array: &GenericListArray<O>,
needle: &ArrayRef,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think needle should be a Scalar in the arguments here, to make it clear this is the scalar (without needing to read the docstring)

  • This can be passed in all the way from replace_with_scalar_needle for example, since its still a ScalarValue at that point

capacity,
);

let mut valid = NullBufferBuilder::new(list_array.len());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need a builder for nulls, we can copy the input array null buffer as is

to_array: &ArrayRef,
arr_n: &[i64],
) -> Result<ArrayRef> {
let mut offsets: Vec<O> = vec![O::usize_as(0)];
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using OffsetBufferBuilder provides a nicer API for doing these operations (can just push length)

@lyne7-sc
Copy link
Copy Markdown
Contributor Author

@Jefffrey Thanks for the review! I agree that to and max are also commonly scalar in practice and have already specialized them. but I think the performance gains here are relatively minor.

The other suggestions have already been fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

functions Changes to functions implementation sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants