fix(bulkdata): #3886 quiet select_utils optional-deps warning at every extract_dataset#3982
Draft
jstvz wants to merge 2 commits into
Draft
fix(bulkdata): #3886 quiet select_utils optional-deps warning at every extract_dataset#3982jstvz wants to merge 2 commits into
jstvz wants to merge 2 commits into
Conversation
… no WARNING Reproduces GH #3886: cumulusci/tasks/bulkdata/select_utils.py emits a logger.warning() at module-import time inside its try/except ImportError block for the optional [select] deps (numpy/pandas/annoy/scikit-learn). Since extract.py transitively imports select_utils via mapping_parser and step, every extract_dataset invocation surfaces the warning even when no select strategy is configured. The new test blocks numpy/pandas/annoy/sklearn via sys.modules sentinels, re-imports select_utils, and asserts no WARNING-level record is emitted by the module's own logger. Fails on dev source today; will pass once the warning is deferred to the point of need.
…need Previously, cumulusci/tasks/bulkdata/select_utils.py emitted a logger.warning() at module import time inside its try/except ImportError block for the optional [select] deps (numpy/pandas/annoy/scikit-learn). Because extract.py transitively imports select_utils via mapping_parser and step, the warning surfaced on every extract_dataset invocation even when the user had not configured any select strategy. Move the emission out of module import. The warning now fires from similarity_post_process() only when the high-volume branch (complexity_constant >= 1000) would have used the Annoy fast path but optional deps are unavailable - the exact code path that pays the perf penalty the warning describes. A module-level flag ensures the message is emitted at most once per process. Refs #3886.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes #3886.
cumulusci/tasks/bulkdata/select_utils.pyemitted aWARNING-level log every time it was imported, even when the optional embeddings dependency stack was not actually exercised. This makesextract_datasetruns unnecessarily noisy for users who did not opt into theselectextra.The fix removes the import-time warning and gates the same message on first actual use (
_warn_missing_optional_deps_once()). Users who never use the embeddings code path never see the message; users who do see it exactly once.Test plan
cumulusci/tasks/bulkdata/tests/test_select_utils.py::test_select_utils_import_emits_no_warningpasses (new).cumulusci/tests/triage/test_issue_3886.pyremoved in the GREEN commit; test now passes.uv run pytest cumulusci/tasks/bulkdata/tests/test_select_utils.py -qclean.Provenance
Reproduced and characterized in the v5 triage evidence pack (PR #3979). See
docs/triage/v5/repro-results.md(### #3886) for the full narrative.