Use filter doc ID runs in MaxScoreBulkScorer#16002
Use filter doc ID runs in MaxScoreBulkScorer#16002iprithv wants to merge 5 commits intoapache:mainfrom
Conversation
| if (shouldPostFilter(filter)) { | ||
| scoreInnerWindowWithPostFilter(collector, acceptDocs, max, filter); | ||
| } else { | ||
| scoreInnerWindowWithFilter(collector, acceptDocs, max, filter); | ||
| } |
There was a problem hiding this comment.
I would much prefer if we continued with one function.
Really, the optimization is just providing bulk scoring on contiguous blocks of IDs, right?
If that is the case, I think we can use the filter iterator's "docEndRunID" or whatever to get a block of matching ids, allowing for bulk scoring of the competitive scorers.
There was a problem hiding this comment.
Sure, updated. The fast path now only uses filter docIDRunEnd() when the matching run covers at least half an inner window, similar in spirit to DenseConjunctionBulkScorer’s run-size guard. Otherwise it keeps the existing leap-frog path. Thanks!
| } else { | ||
| scoreInnerWindowMultipleEssentialClauses(collector, acceptDocs, max); | ||
| } | ||
| scoreInnerWindowWithoutFilter(collector, acceptDocs, max); |
There was a problem hiding this comment.
why all these unnecessary changes?
The idea is to improve scoreInnerWindowWithFilter right? We are focused on bulk scoring doc end run number of docs?
Please, we need way more than a JMH benchmark. We need to ensure correctness & please benchmark with lucene util to see the performance on a realistic dataset.
Closes #15519
This changes
MaxScoreBulkScorerto use filter doc ID runs when scoring filtered top-score disjunctions. When the filter is positioned on the current essential candidate and exposes a meaningful confirmed run viadocIDRunEnd(), the scorer bulk-scores essential clauses up to the end of that run. Otherwise it keeps the existing leap-frog filter behavior.The fast path is intentionally conservative: it only triggers when the confirmed filter run covers at least half an inner window, avoiding extra bulk-scoring overhead for tiny runs.
Benchmark