[client] Implement adaptive fetch rate control for LogScanner#3007
[client] Implement adaptive fetch rate control for LogScanner#3007swuferhong wants to merge 1 commit intoapache:mainfrom
Conversation
dbe43c8 to
0f38996
Compare
fresh-borzoni
left a comment
There was a problem hiding this comment.
@swuferhong Cool feature, really like it.
I left some comments, PTAL 🙏
Also we might want to have integration test for this, as this seems pretty E2E feature flow.
| private static final Logger LOG = LoggerFactory.getLogger(BucketFetchRateController.class); | ||
|
|
||
| /** Maximum exponent for the exponential backoff (2^5 = 32). */ | ||
| private static final int MAX_BACKOFF_SHIFT = 5; |
There was a problem hiding this comment.
We also have max_skip_rounds, so with this we have weird situation where user expects config to work, but it's capped at 32.
| * @param tableBucket the bucket to check | ||
| * @return {@code true} if the bucket should be fetched in this round | ||
| */ | ||
| boolean shouldFetch(TableBucket tableBucket) { |
There was a problem hiding this comment.
Is it intentional that a single empty fetch arms a skip?
A streaming scanner caught up to HW alternates "batch / empty at HW" every poll, this throttles the empty half even though the bucket is active.
Worst-case new-data latency becomes ~(max-skip × poll interval).
If that's by design, worth a comment in the Javadoc
| } | ||
|
|
||
| /** Removes the tracking state for the given bucket. */ | ||
| void removeBucket(TableBucket tableBucket) { |
There was a problem hiding this comment.
Do we actually call it? It seems that without it being called we have a leak
| } | ||
|
|
||
| /** Resets all tracking state. */ | ||
| void reset() { |
| LogRecords logRecords = fetchResultForBucket.recordsOrEmpty(); | ||
| boolean hasRecords = !MemoryLogRecords.EMPTY.equals(logRecords); | ||
| if (hasRecords) { | ||
| hasData = !MemoryLogRecords.EMPTY.equals(logRecords); |
There was a problem hiding this comment.
|
|
||
| if (readyForFetchCount == 0) { | ||
| if (skippedByAdaptiveFetch > 0) { | ||
| LOG.info( |
There was a problem hiding this comment.
it might be very noisy, mb debug?
| return Collections.emptyMap(); | ||
| } else { | ||
| if (skippedByAdaptiveFetch > 0) { | ||
| LOG.info( |
| state.consecutiveEmptyFetches++; | ||
| int shift = Math.min(state.consecutiveEmptyFetches - 1, MAX_BACKOFF_SHIFT); | ||
| state.remainingSkipRounds = Math.min(1 << shift, maxSkipRounds); | ||
| LOG.info( |
There was a problem hiding this comment.
ditto about logging noise
| bucketStates.computeIfAbsent(tableBucket, k -> new BucketFetchState()); | ||
| if (hasRecords) { | ||
| if (state.consecutiveEmptyFetches > 0) { | ||
| LOG.info( |
(The sections below can be removed for hotfixes or typos)
-->
Purpose
Linked issue: close #3006
For partitioned tables with many inactive partitions, the LogFetcher
previously sent fetch requests to all subscribed buckets equally,
wasting CPU and network resources on empty partitions.
Introduce BucketFetchRateController that uses exponential backoff to
reduce fetch frequency for buckets that consistently return no data.
Buckets that return data are always fetched at full frequency, and a
single non-empty fetch immediately resets the backoff.
Backoff schedule: 1, 2, 4, 8, 16, 32 rounds (configurable max).
New config options:
Brief change log
Tests
API and Format
Documentation