[PLUGIN-1957] Autodetect PK Chunking for incremental loads#354

Open

harishhk107 wants to merge 1 commit intodata-integrations:developfrom

cloudsufi:feature/prevent-empty-chunks-pk-chunking

harishhk107 commented Apr 20, 2026 •

edited

Loading

PLUGIN-1957 Autodetect PK Chunking for incremental loads

What

Adds a record count check before enabling PK chunking in SalesforceBatchSource.getSplits().
If the record count is below AUTO_PK_CHUNK_THRESHOLD (1,000,000), PK chunking is skipped
even if enabled in config, to avoid unnecessary overhead on small datasets. Just for DTS

Why

PK chunking is designed for very large datasets. Enabling it on small datasets causes empty
chunk overhead and increased pipeline execution time without any benefit. This change ensures
chunking is only applied when it is operationally justified by the actual record count.

Changes

SalesforceBatchSource.java —> fixed getSplits() to call shouldAutoDetectPKChunk() only when
config.getEnablePKChunk() && pkChunkCountCheck is true, added pkChunkCountCheck parameter
SalesforceSplitUtil.java —> added shouldAutoDetectPKChunk() which runs a COUNT() query
to check record count against the threshold before enabling chunking
SalesforceSourceConstants.java — added AUTO_PK_CHUNK_THRESHOLD = 1_000_000

Manual Testing

Verified pipeline runs correctly with enablePKChunk=true and small dataset → chunking skipped
Verified pipeline runs correctly with enablePKChunk=true and large dataset → chunking applied
Verified pipeline runs correctly with enablePKChunk=false → chunking skipped, no count query

For 1M records

Before ~5m 48s
After ~6m 22s

google-cla Bot commented Apr 20, 2026

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

harishhk107 force-pushed the feature/prevent-empty-chunks-pk-chunking branch from faac75b to 5b4cf9e Compare

April 20, 2026 05:53

vikasrathee-cs reviewed

View reviewed changes

src/main/java/io/cdap/plugin/salesforce/plugin/source/batch/util/SalesforceSplitUtil.java Outdated

vikasrathee-cs reviewed

View reviewed changes

src/main/java/io/cdap/plugin/salesforce/plugin/source/batch/SalesforceBatchSource.java Outdated

vikasrathee-cs reviewed

View reviewed changes

src/test/java/io/cdap/plugin/salesforce/etl/SalesforceBatchSourceETLTest.java Outdated

vikasrathee-cs reviewed

View reviewed changes

src/main/java/io/cdap/plugin/salesforce/plugin/source/batch/util/SalesforceSplitUtil.java Outdated

vikasrathee-cs reviewed

View reviewed changes

src/main/java/io/cdap/plugin/salesforce/plugin/source/batch/SalesforceBatchSource.java Outdated

vikasrathee-cs reviewed

View reviewed changes

src/test/java/io/cdap/plugin/salesforce/etl/SalesforceBatchSourceETLTest.java Outdated

vikasrathee-cs reviewed

View reviewed changes

src/main/java/io/cdap/plugin/salesforce/plugin/source/batch/util/SalesforceSplitUtil.java Outdated

vikasrathee-cs changed the title ~~feat: validate PK Chunking for large queries~~ [PLUGIN-1957] Autodetect PK Chunking for incremental loads

vikasrathee-cs reviewed

View reviewed changes

src/main/java/io/cdap/plugin/salesforce/plugin/source/batch/util/SalesforceSplitUtil.java Outdated

vikasrathee-cs reviewed

View reviewed changes

src/main/java/io/cdap/plugin/salesforce/plugin/source/batch/util/SalesforceSplitUtil.java Outdated

vikasrathee-cs reviewed

View reviewed changes

src/main/java/io/cdap/plugin/salesforce/plugin/source/batch/util/SalesforceSplitUtil.java Outdated

harishhk107 force-pushed the feature/prevent-empty-chunks-pk-chunking branch from db562e2 to 3f613ab Compare

April 22, 2026 11:19

vikasrathee-cs force-pushed the feature/prevent-empty-chunks-pk-chunking branch from 3f613ab to d6c2a04 Compare

April 22, 2026 11:31

vikasrathee-cs requested a review from Sunish-Dahiya

April 22, 2026 12:18

harishhk107 force-pushed the feature/prevent-empty-chunks-pk-chunking branch from d6c2a04 to 7ff4b89 Compare

April 22, 2026 12:45


          feat: validate PK Chunking and prevent empty chunks

fba1f3f

harishhk107 force-pushed the feature/prevent-empty-chunks-pk-chunking branch from d88d48d to fba1f3f Compare

April 22, 2026 12:54

vikasrathee-cs added the build label

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels