[PLUGIN-1957] Autodetect PK Chunking for incremental loads#354
Open
harishhk107 wants to merge 1 commit intodata-integrations:developfrom
Open
[PLUGIN-1957] Autodetect PK Chunking for incremental loads#354harishhk107 wants to merge 1 commit intodata-integrations:developfrom
harishhk107 wants to merge 1 commit intodata-integrations:developfrom
Conversation
|
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
faac75b to
5b4cf9e
Compare
db562e2 to
3f613ab
Compare
3f613ab to
d6c2a04
Compare
d6c2a04 to
7ff4b89
Compare
d88d48d to
fba1f3f
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
PLUGIN-1957 Autodetect PK Chunking for incremental loads
What
Adds a record count check before enabling PK chunking in SalesforceBatchSource.getSplits().
If the record count is below AUTO_PK_CHUNK_THRESHOLD (1,000,000), PK chunking is skipped
even if enabled in config, to avoid unnecessary overhead on small datasets. Just for DTS
Why
PK chunking is designed for very large datasets. Enabling it on small datasets causes empty
chunk overhead and increased pipeline execution time without any benefit. This change ensures
chunking is only applied when it is operationally justified by the actual record count.
Changes
SalesforceBatchSource.java —> fixed getSplits() to call shouldAutoDetectPKChunk() only when
config.getEnablePKChunk() && pkChunkCountCheck is true, added pkChunkCountCheck parameter
SalesforceSplitUtil.java —> added shouldAutoDetectPKChunk() which runs a COUNT() query
to check record count against the threshold before enabling chunking
SalesforceSourceConstants.java — added AUTO_PK_CHUNK_THRESHOLD = 1_000_000
Manual Testing
Verified pipeline runs correctly with enablePKChunk=true and small dataset → chunking skipped
Verified pipeline runs correctly with enablePKChunk=true and large dataset → chunking applied
Verified pipeline runs correctly with enablePKChunk=false → chunking skipped, no count query
For 1M records
Before ~5m 48s
After ~6m 22s