Skip to content

feat: allow exponentiation post agg for log-fold-change in rank_genes_groups#4037

Open
ilan-gold wants to merge 17 commits intomainfrom
ig/exp_post_agg
Open

feat: allow exponentiation post agg for log-fold-change in rank_genes_groups#4037
ilan-gold wants to merge 17 commits intomainfrom
ig/exp_post_agg

Conversation

@ilan-gold
Copy link
Copy Markdown
Contributor

@ilan-gold ilan-gold commented Apr 7, 2026

My three questions here:

  1. What to do about t-tests? Do people expect fold change of the input to the t-test (as is currently done) or fold change of the exponent?
  2. How do we test this? Dummy groups? Wait for illico to compare? I added a very simple test.
  3. Does the name exp_post_agg make sense? exp_post_mean?

@ilan-gold ilan-gold changed the title feat: allow exponentiation post agg for log-fold-change feat: allow exponentiation post agg for log-fold-change in rank_genes_groups Apr 7, 2026
@codecov
Copy link
Copy Markdown

codecov bot commented Apr 7, 2026

Codecov Report

❌ Patch coverage is 93.33333% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 78.63%. Comparing base (2fa6ac0) to head (7fcdac7).
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
src/scanpy/tools/_rank_genes_groups.py 92.85% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4037      +/-   ##
==========================================
+ Coverage   78.60%   78.63%   +0.02%     
==========================================
  Files         118      118              
  Lines       12756    12767      +11     
==========================================
+ Hits        10027    10039      +12     
+ Misses       2729     2728       -1     
Flag Coverage Δ
hatch-test.low-vers 77.93% <93.33%> (+0.02%) ⬆️
hatch-test.pre 78.51% <93.33%> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
src/scanpy/_settings/presets.py 86.61% <100.00%> (+0.10%) ⬆️
src/scanpy/tools/_rank_genes_groups.py 93.31% <92.85%> (+0.47%) ⬆️

@ilan-gold ilan-gold added this to the 1.13.0 milestone Apr 14, 2026
@ilan-gold ilan-gold added the Area – Reproducibility Exact replication label Apr 14, 2026
@ilan-gold ilan-gold marked this pull request as ready for review April 14, 2026 14:29
@ilan-gold ilan-gold requested a review from eroell April 14, 2026 14:31
@ilan-gold ilan-gold requested a review from flying-sheep April 14, 2026 14:33
@ilan-gold ilan-gold self-assigned this Apr 16, 2026
Copy link
Copy Markdown
Member

@flying-sheep flying-sheep left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding 1: I didn’t look at the math, but I assume to get both accurate logfoldchanges and correct t-test results, the t-test needs the means of the data, and the logfoldchanges need the means of e^data. So we’d have to calculate means twice I think. If @eroell doesn’t chime in I’ll re-learn the math …

Regarding 3: maybe lfc_estim_fast or so?

@ilan-gold
Copy link
Copy Markdown
Contributor Author

Regarding 1:

I think my question is less about math than expectations. Do people expect #864 to be applied to all tests that report LFC or just wilcoxon?

Regarding 3 lfc_estim_fast

I like this but I wonder if being more explicit about the math is nicer. @eroell thoughts?

@flying-sheep
Copy link
Copy Markdown
Member

Do people expect #864 to be applied to all tests that report LFC or just wilcoxon?

I think it would be confusing if the option only applied to one of them, but of course I don’t know what people expect.

@ilan-gold
Copy link
Copy Markdown
Contributor Author

That's a pretty good reason! Nothing in the issue is focused on wilcoxon, I guess!

@ilan-gold
Copy link
Copy Markdown
Contributor Author

Regarding 3: maybe lfc_estim_fast or so?

Looking back at things, I think this setting was created for a time when the code looked very different. It is only faster for t-test to do exp_post_agg.

@flying-sheep
Copy link
Copy Markdown
Member

I think you’re right, but expm1 is a more expensive operation than the others, so doing it on the full data instead of just the means might do something. Probably not a huge factor, but let’s see.

@scverse-benchmark
Copy link
Copy Markdown

@flying-sheep
Copy link
Copy Markdown
Member

huh, I didn’t expect it to have zero impact. OK then, let’s see what @eroell says

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Area – Reproducibility Exact replication

Projects

None yet

Development

Successfully merging this pull request may close these issues.

rank_genes_groups logfoldchange different than seurat

2 participants