perf: optimize `array_replace` for scalar needle by lyne7-sc · Pull Request #22387 · apache/datafusion

lyne7-sc · 2026-05-20T07:18:31Z

Which issue does this PR close?

Closes #.

Rationale for this change

Currently, array_replace / array_replace_n / array_replace_all perform element-wise comparison by invoking compare_element_to_list against each row's sub-array individually. When the needle is a scalar, this can be optimized by performing a single vectorized not_distinct comparison over the entire flattened values buffer.

What changes are included in this PR?

Add a specialized replacement kernel that uses arrow_ord::cmp::not_distinct with Scalar wrapper for a single bulk comparison pass over the flat values buffer.
Extend SLT tests with multi-row scalar-argument coverage, empty-array edge cases, NULL needle replacement, and boundary n values for LargeList/FixedSizeList types.

Benchmarks

group                                                                      baseline                                optimized
-----                                                                      --------                                ---------
array_replace_all_int64/replace/list size: 10, num_rows: 4000              5.04  1124.5±146.98µs        ? ?/sec    1.00    223.1±2.79µs        ? ?/sec
array_replace_all_int64/replace/list size: 100, num_rows: 10000            1.64      7.2±0.59ms        ? ?/sec     1.00      4.4±0.12ms        ? ?/sec
array_replace_all_int64/replace/list size: 500, num_rows: 10000            1.16     25.3±4.09ms        ? ?/sec     1.00     21.8±0.69ms        ? ?/sec
array_replace_all_int64_nested/replace/list size: 10, num_rows: 4000       1.00      7.5±0.30ms        ? ?/sec     1.01      7.5±0.24ms        ? ?/sec
array_replace_all_int64_nested/replace/list size: 100, num_rows: 3000      1.00     38.5±0.52ms        ? ?/sec     1.02     39.2±1.02ms        ? ?/sec
array_replace_all_int64_nested/replace/list size: 300, num_rows: 1500      1.00     55.4±1.73ms        ? ?/sec     1.02     56.5±2.13ms        ? ?/sec
array_replace_boolean/replace/list size: 10, num_rows: 4000                4.57  1072.4±82.05µs        ? ?/sec     1.00    234.6±7.55µs        ? ?/sec
array_replace_boolean/replace/list size: 100, num_rows: 10000              2.38      3.7±0.43ms        ? ?/sec     1.00  1536.5±47.67µs        ? ?/sec
array_replace_boolean/replace/list size: 500, num_rows: 10000              1.51      6.5±0.51ms        ? ?/sec     1.00      4.3±0.12ms        ? ?/sec
array_replace_fixed_size_binary/replace/list size: 10, num_rows: 4000      3.61  1174.3±90.82µs        ? ?/sec     1.00   325.2±26.75µs        ? ?/sec
array_replace_fixed_size_binary/replace/list size: 100, num_rows: 10000    1.45      7.2±0.88ms        ? ?/sec     1.00      4.9±0.11ms        ? ?/sec
array_replace_fixed_size_binary/replace/list size: 500, num_rows: 10000    1.05     25.9±2.34ms        ? ?/sec     1.00     24.6±0.71ms        ? ?/sec
array_replace_int64/replace/list size: 10, num_rows: 4000                  5.49  1025.4±24.08µs        ? ?/sec     1.00   186.7±18.10µs        ? ?/sec
array_replace_int64/replace/list size: 100, num_rows: 10000                2.46      3.6±0.13ms        ? ?/sec     1.00  1455.7±138.70µs        ? ?/sec
array_replace_int64/replace/list size: 500, num_rows: 10000                1.26      7.0±0.75ms        ? ?/sec     1.00      5.6±0.77ms        ? ?/sec
array_replace_int64_nested/replace/list size: 10, num_rows: 4000           1.03      7.3±0.14ms        ? ?/sec     1.00      7.2±0.21ms        ? ?/sec
array_replace_int64_nested/replace/list size: 100, num_rows: 3000          1.03     37.8±1.62ms        ? ?/sec     1.00     36.7±0.43ms        ? ?/sec
array_replace_int64_nested/replace/list size: 300, num_rows: 1500          1.03     53.2±1.16ms        ? ?/sec     1.00     51.7±1.87ms        ? ?/sec
array_replace_n_int64/replace/list size: 10, num_rows: 4000                5.02  1074.4±30.92µs        ? ?/sec     1.00    214.1±2.22µs        ? ?/sec
array_replace_n_int64/replace/list size: 100, num_rows: 10000              1.83      5.0±0.15ms        ? ?/sec     1.00      2.7±0.06ms        ? ?/sec
array_replace_n_int64/replace/list size: 500, num_rows: 10000              1.17     15.5±1.11ms        ? ?/sec     1.00     13.3±0.24ms        ? ?/sec
array_replace_n_int64_nested/replace/list size: 10, num_rows: 4000         1.05      7.5±0.45ms        ? ?/sec     1.00      7.1±0.07ms        ? ?/sec
array_replace_n_int64_nested/replace/list size: 100, num_rows: 3000        1.02     37.4±0.51ms        ? ?/sec     1.00     36.5±0.62ms        ? ?/sec
array_replace_n_int64_nested/replace/list size: 300, num_rows: 1500        1.02     54.9±4.97ms        ? ?/sec     1.00     53.8±3.15ms        ? ?/sec
array_replace_strings/replace/list size: 10, num_rows: 4000                2.78  1408.8±44.99µs        ? ?/sec     1.00   506.6±16.32µs        ? ?/sec
array_replace_strings/replace/list size: 100, num_rows: 10000              1.32     11.0±1.25ms        ? ?/sec     1.00      8.3±0.37ms        ? ?/sec
array_replace_strings/replace/list size: 500, num_rows: 10000              1.14     42.4±6.39ms        ? ?/sec     1.00     37.2±0.74ms        ? ?/sec

Are these changes tested?

Yes, existing and new slt edge-case tests in array_replace.slt.

Are there any user-facing changes?

No.

Jefffrey

One observation I have is this is a fast path for if from is a scalar, but it would be likely that to (and max too) might also be scalars in that case 🤔

Should we have the paths just be:

-- scalar fast path
select array_replace(array, scalar, scalar[, scalar]);
-- general fallback
select array_replace(array, scalar, array[, scalar]);
-- or any other combination

Thoughts?

Jefffrey · 2026-05-20T13:47:45Z

+/// uses a single bulk comparison for better performance.
+fn general_replace_with_scalar<O: OffsetSizeTrait>(
+    list_array: &GenericListArray<O>,
+    needle: &ArrayRef,


I think needle should be a Scalar in the arguments here, to make it clear this is the scalar (without needing to read the docstring)

This can be passed in all the way from replace_with_scalar_needle for example, since its still a ScalarValue at that point

Jefffrey · 2026-05-20T13:48:51Z

+        capacity,
+    );
+
+    let mut valid = NullBufferBuilder::new(list_array.len());


I don't think we need a builder for nulls, we can copy the input array null buffer as is

Jefffrey · 2026-05-20T13:49:46Z

+    to_array: &ArrayRef,
+    arr_n: &[i64],
+) -> Result<ArrayRef> {
+    let mut offsets: Vec<O> = vec![O::usize_as(0)];


Using OffsetBufferBuilder provides a nicer API for doing these operations (can just push length)

lyne7-sc · 2026-05-21T11:22:10Z

@Jefffrey Thanks for the review! I agree that to and max are also commonly scalar in practice and have already specialized them. but I think the performance gains here are relatively minor.

The other suggestions have already been fixed.

lyne7-sc and others added 3 commits May 20, 2026 11:16

Refactor array replace invocation

1468803

Refine array replace scalar path tests

3e79a38

Merge branch 'main' into perf/replace

47b811b

github-actions Bot added sqllogictest SQL Logic Tests (.slt) functions Changes to functions implementation labels May 20, 2026

lyne7-sc mentioned this pull request May 20, 2026

perf: optimize array_remove for scalar needle #22390

Open

Jefffrey reviewed May 20, 2026

View reviewed changes

lyne7-sc added 3 commits May 21, 2026 15:53

better clarity

11dfa84

handle to and max arg scalar

3bae38b

Merge branch 'main' into perf/replace

e50617c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: optimize `array_replace` for scalar needle#22387

perf: optimize `array_replace` for scalar needle#22387
lyne7-sc wants to merge 6 commits into
apache:mainfrom
lyne7-sc:perf/replace

lyne7-sc commented May 20, 2026

Uh oh!

Jefffrey left a comment

Uh oh!

Jefffrey May 20, 2026

Uh oh!

Jefffrey May 20, 2026

Uh oh!

Jefffrey May 20, 2026

Uh oh!

lyne7-sc commented May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

lyne7-sc commented May 20, 2026

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Benchmarks

Are these changes tested?

Are there any user-facing changes?

Uh oh!

Jefffrey left a comment

Choose a reason for hiding this comment

Uh oh!

Jefffrey May 20, 2026

Choose a reason for hiding this comment

Uh oh!

Jefffrey May 20, 2026

Choose a reason for hiding this comment

Uh oh!

Jefffrey May 20, 2026

Choose a reason for hiding this comment

Uh oh!

lyne7-sc commented May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants