Skip to content

fix(bulkdata): #3886 quiet select_utils optional-deps warning at every extract_dataset#3982

Draft
jstvz wants to merge 2 commits into
devfrom
triage/fix-tranche1-3886
Draft

fix(bulkdata): #3886 quiet select_utils optional-deps warning at every extract_dataset#3982
jstvz wants to merge 2 commits into
devfrom
triage/fix-tranche1-3886

Conversation

@jstvz
Copy link
Copy Markdown
Contributor

@jstvz jstvz commented May 14, 2026

Summary

Fixes #3886.

cumulusci/tasks/bulkdata/select_utils.py emitted a WARNING-level log every time it was imported, even when the optional embeddings dependency stack was not actually exercised. This makes extract_dataset runs unnecessarily noisy for users who did not opt into the select extra.

The fix removes the import-time warning and gates the same message on first actual use (_warn_missing_optional_deps_once()). Users who never use the embeddings code path never see the message; users who do see it exactly once.

Test plan

  • cumulusci/tasks/bulkdata/tests/test_select_utils.py::test_select_utils_import_emits_no_warning passes (new).
  • xfail marker on cumulusci/tests/triage/test_issue_3886.py removed in the GREEN commit; test now passes.
  • uv run pytest cumulusci/tasks/bulkdata/tests/test_select_utils.py -q clean.

Provenance

Reproduced and characterized in the v5 triage evidence pack (PR #3979). See docs/triage/v5/repro-results.md (### #3886) for the full narrative.

jstvz added 2 commits May 14, 2026 10:07
… no WARNING

Reproduces GH #3886: cumulusci/tasks/bulkdata/select_utils.py emits a
logger.warning() at module-import time inside its try/except ImportError
block for the optional [select] deps (numpy/pandas/annoy/scikit-learn).
Since extract.py transitively imports select_utils via mapping_parser
and step, every extract_dataset invocation surfaces the warning even
when no select strategy is configured.

The new test blocks numpy/pandas/annoy/sklearn via sys.modules sentinels,
re-imports select_utils, and asserts no WARNING-level record is emitted
by the module's own logger. Fails on dev source today; will pass once the
warning is deferred to the point of need.
…need

Previously, cumulusci/tasks/bulkdata/select_utils.py emitted a
logger.warning() at module import time inside its try/except ImportError
block for the optional [select] deps (numpy/pandas/annoy/scikit-learn).
Because extract.py transitively imports select_utils via mapping_parser
and step, the warning surfaced on every extract_dataset invocation even
when the user had not configured any select strategy.

Move the emission out of module import. The warning now fires from
similarity_post_process() only when the high-volume branch
(complexity_constant >= 1000) would have used the Annoy fast path but
optional deps are unavailable - the exact code path that pays the perf
penalty the warning describes. A module-level flag ensures the message
is emitted at most once per process.

Refs #3886.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant