Skip to content

PERF: Improve performance of groupby.size#65648

Open
rhshadrach wants to merge 3 commits into
pandas-dev:mainfrom
rhshadrach:perf_groupby_size
Open

PERF: Improve performance of groupby.size#65648
rhshadrach wants to merge 3 commits into
pandas-dev:mainfrom
rhshadrach:perf_groupby_size

Conversation

@rhshadrach
Copy link
Copy Markdown
Member

@rhshadrach rhshadrach commented May 15, 2026

size = 10_000_000
df = pd.DataFrame(
    {
        "a": np.random.randint(0, 10000, size),
        "b": np.random.randint(0, 100000, size),
        "c": np.random.randint(0, 1000000, size),
    }
)
for column in list("abc"):
    gb = df.groupby(column)
    gb.size()
    %timeit gb.size()

# main
# 28.5 ms ± 1.34 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
# 33.3 ms ± 1.53 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
# 48.4 ms ± 12.5 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

# PR
# 10.4 ms ± 498 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
# 13.1 ms ± 63 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
# 18.5 ms ± 44.6 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)

@rhshadrach rhshadrach added Groupby Performance Memory or execution speed performance labels May 15, 2026
@rhshadrach rhshadrach requested a review from jbrockmendel May 15, 2026 11:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Groupby Performance Memory or execution speed performance

Projects

None yet

Development

Successfully merging this pull request may close these issues.

PERF: groupby.value_counts

1 participant