Skip to content

PERF: bypass factorize in axis=1 EA reduction fastpath (GH-56903)#65597

Open
jbrockmendel wants to merge 2 commits into
pandas-dev:mainfrom
jbrockmendel:perf-56903
Open

PERF: bypass factorize in axis=1 EA reduction fastpath (GH-56903)#65597
jbrockmendel wants to merge 2 commits into
pandas-dev:mainfrom
jbrockmendel:perf-56903

Conversation

@jbrockmendel
Copy link
Copy Markdown
Member

Summary

  • The EA-backed axis=1 reduction fastpath in DataFrame._reduce flattens the frame into a 1D EA and groups by row indices to simulate a transpose+reduce. The row indices passed to Series.groupby were already in factorized form (np.tile(np.arange(nrows), ncols)), but the standard groupby pipeline re-factorized them.
  • Call arr._groupby_op directly with the pre-known ngroups and ids, skipping the redundant factorize and the surrounding groupby pipeline. idxmin/idxmax get an inline conversion from flat positions to column positions plus the same skipna=False NA-detection that GroupBy._idxmax_idxmin was raising before.
  • Closes PERF: pd.BooleanDtype in row operations is still very slow #56903.

Benchmark (Apple M1 Pro)

import pandas as pd, numpy as np

shape = 250_000, 100
pd_mask = pd.DataFrame(np.random.randint(0, 2, size=shape)).astype(pd.BooleanDtype())
%timeit pd_mask.any(axis=1)
%timeit pd_mask.all(axis=1)
%timeit pd_mask.sum(axis=1)
op before after
any 256 ms 117 ms
all 277 ms 112 ms
sum 204 ms 74 ms

Small-frame numbers (10k x 10, BooleanDtype) drop from ~1.1 ms to ~0.35 ms across any/all/sum/min/max/mean.

cProfile on the 250k x 100 case showed factorize at 43% of wall time pre-change; that line item is gone post-change.

Test plan

  • pandas/tests/reductions
  • pandas/tests/frame/test_reductions.py
  • pandas/tests/extension
  • pandas/tests/arrays/masked
  • pandas/tests/groupby/aggregate
  • pre-commit run on changed files
  • mypy / pyright on pandas/core/frame.py

🤖 Generated with Claude Code

The EA-backed axis=1 reduction in DataFrame._reduce built a synthetic
ser.groupby(row_index).agg(...), which re-factorized codes that were
already in factorized form. Call the EA's _groupby_op directly with the
pre-known ngroups/codes instead, skipping factorize and the surrounding
groupby pipeline. ~2-3x faster on axis=1 reductions for nullable/Arrow
extension dtypes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@jbrockmendel jbrockmendel added the Performance Memory or execution speed performance label May 11, 2026
@jbrockmendel jbrockmendel marked this pull request as ready for review May 11, 2026 22:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Performance Memory or execution speed performance

Projects

None yet

Development

Successfully merging this pull request may close these issues.

PERF: pd.BooleanDtype in row operations is still very slow

1 participant