Use filter doc ID runs in MaxScoreBulkScorer by iprithv · Pull Request #16002 · apache/lucene

iprithv · 2026-04-30T18:29:27Z

This changes MaxScoreBulkScorer to use filter doc ID runs when scoring filtered top-score disjunctions. When the filter is positioned on the current essential candidate and exposes a meaningful confirmed run via docIDRunEnd(), the scorer bulk-scores essential clauses up to the end of that run. Otherwise it keeps the existing leap-frog filter behavior.

The fast path is intentionally conservative: it only triggers when the confirmed filter run covers at least half an inner window, avoiding extra bulk-scoring overhead for tiny runs.

Benchmark

essentialInterval	filterInterval	baseline ops/s	current ops/s	delta
100	1	107.60	108.87	+1.2%
100	10	514.12	551.08	+7.2%
100	1000	19321.47	19763.20	+2.3%
1000	1	102.01	116.30	+14.0%
1000	10	586.99	588.72	+0.3%
1000	1000	26120.80	26904.54	+3.0%

benwtrent · 2026-04-30T19:17:24Z

+      if (shouldPostFilter(filter)) {
+        scoreInnerWindowWithPostFilter(collector, acceptDocs, max, filter);
+      } else {
+        scoreInnerWindowWithFilter(collector, acceptDocs, max, filter);
+      }


I would much prefer if we continued with one function.

Really, the optimization is just providing bulk scoring on contiguous blocks of IDs, right?

If that is the case, I think we can use the filter iterator's "docEndRunID" or whatever to get a block of matching ids, allowing for bulk scoring of the competitive scorers.

Sure, updated. The fast path now only uses filter docIDRunEnd() when the matching run covers at least half an inner window, similar in spirit to DenseConjunctionBulkScorer’s run-size guard. Otherwise it keeps the existing leap-frog path. Thanks!

benwtrent · 2026-04-30T21:05:04Z

-      } else {
-        scoreInnerWindowMultipleEssentialClauses(collector, acceptDocs, max);
-      }
+      scoreInnerWindowWithoutFilter(collector, acceptDocs, max);


why all these unnecessary changes?

The idea is to improve scoreInnerWindowWithFilter right? We are focused on bulk scoring doc end run number of docs?

Please, we need way more than a JMH benchmark. We need to ensure correctness & please benchmark with lucene util to see the performance on a realistic dataset.

Post-filter broad filters in MaxScoreBulkScorer

63206a3

github-actions Bot added the module:core/search label Apr 30, 2026

Updated CHANGES.txt

e9d0784

github-actions Bot added this to the 11.0.0 milestone Apr 30, 2026

Merge branch 'main' into maxscore-post-filter

eb55b72

benwtrent reviewed Apr 30, 2026

View reviewed changes

Use filter runs in MaxScoreBulkScorer

ffb92ba

iprithv requested a review from benwtrent April 30, 2026 20:50

benwtrent reviewed Apr 30, 2026

View reviewed changes

Exercise MaxScore filter run path

cce7580

iprithv changed the title ~~Post-filter broad filters in MaxScoreBulkScorer~~ Use filter doc ID runs in MaxScoreBulkScorer May 1, 2026

iprithv closed this May 1, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use filter doc ID runs in MaxScoreBulkScorer#16002

Use filter doc ID runs in MaxScoreBulkScorer#16002
iprithv wants to merge 5 commits intoapache:mainfrom
iprithv:maxscore-post-filter

iprithv commented Apr 30, 2026 •

edited

Loading

Uh oh!

benwtrent Apr 30, 2026

Uh oh!

iprithv Apr 30, 2026

Uh oh!

benwtrent Apr 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

iprithv commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmark

Uh oh!

benwtrent Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

iprithv Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

benwtrent Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

iprithv commented Apr 30, 2026 •

edited

Loading