-
Notifications
You must be signed in to change notification settings - Fork 345
feat: add errors parameter to CategoricalImputer for multimodal variables
#908
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 7 commits
fb230fe
81be348
4fb5b7a
835133f
81f31d8
657de1f
a0ea71d
0cdcf03
cf7670e
fb2f8db
97d6053
c454edd
5992d09
85b1974
09429f3
0b86cfa
45f4e2f
cda93e7
04be1a0
94643d8
ab6ba66
6ba7fce
1a3fde2
36eb1dc
aa37d19
3e58d8b
6746429
c77e8f1
a22f586
51f8276
7156d28
5d65fe8
a95f5e0
6f5b4da
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -86,6 +86,7 @@ celerybeat-schedule | |
| # Environments | ||
| .env | ||
| .venv | ||
| .venv_wsl | ||
| env/ | ||
| venv/ | ||
| ENV/ | ||
|
|
||
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
| @@ -1,33 +1,26 @@ | ||||||
| # Authors: Soledad Galli <solegalli@protonmail.com> | ||||||
| # License: BSD 3 clause | ||||||
|
|
||||||
| import warnings | ||||||
| from typing import List, Optional, Union | ||||||
|
|
||||||
| import pandas as pd | ||||||
|
|
||||||
| from feature_engine._check_init_parameters.check_variables import ( | ||||||
| _check_variables_input_value, | ||||||
| ) | ||||||
| from feature_engine._check_init_parameters.check_variables import \ | ||||||
| _check_variables_input_value | ||||||
| from feature_engine._docstrings.fit_attributes import ( | ||||||
| _feature_names_in_docstring, | ||||||
| _imputer_dict_docstring, | ||||||
| _n_features_in_docstring, | ||||||
| _variables_attribute_docstring, | ||||||
| ) | ||||||
| from feature_engine._docstrings.methods import ( | ||||||
| _fit_transform_docstring, | ||||||
| _transform_imputers_docstring, | ||||||
| ) | ||||||
| _feature_names_in_docstring, _imputer_dict_docstring, | ||||||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please restore to previous format.
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. restored the previous format |
||||||
| _n_features_in_docstring, _variables_attribute_docstring) | ||||||
| from feature_engine._docstrings.methods import (_fit_transform_docstring, | ||||||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. please make format match other imports |
||||||
| _transform_imputers_docstring) | ||||||
| from feature_engine._docstrings.substitute import Substitution | ||||||
| from feature_engine.dataframe_checks import check_X | ||||||
| from feature_engine.imputation.base_imputer import BaseImputer | ||||||
| from feature_engine.tags import _return_tags | ||||||
| from feature_engine.variable_handling import ( | ||||||
| check_all_variables, | ||||||
| check_categorical_variables, | ||||||
| find_all_variables, | ||||||
| find_categorical_variables, | ||||||
| ) | ||||||
| from feature_engine.variable_handling import (check_all_variables, | ||||||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. please restore to previous format
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. restored the previous format |
||||||
| check_categorical_variables, | ||||||
| find_all_variables, | ||||||
| find_categorical_variables) | ||||||
|
|
||||||
|
|
||||||
| @Substitution( | ||||||
|
|
@@ -88,6 +81,18 @@ class CategoricalImputer(BaseImputer): | |||||
| type object or categorical. If True, the imputer will select all variables or | ||||||
| accept all variables entered by the user, including those cast as numeric. | ||||||
|
|
||||||
| errors : str, default='raise' | ||||||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Instead of "errors", let's call this parameter "multimodal" so it is immediately obvious what it is about.
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. made |
||||||
| Indicates what to do when the selected imputation_method='frequent' | ||||||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. applied the sugesstion |
||||||
| and a variable has more than 1 mode. | ||||||
|
|
||||||
| If 'raise', raises a ValueError and stops the fit. | ||||||
|
|
||||||
| If 'warn', raises a UserWarning and continues, imputing using the | ||||||
|
direkkakkar319-ops marked this conversation as resolved.
Outdated
|
||||||
| first most frequent category found. | ||||||
|
|
||||||
| If 'ignore', continues without warnings, imputing using the first | ||||||
| most frequent category found. | ||||||
|
|
||||||
| Attributes | ||||||
| ---------- | ||||||
| {imputer_dict_} | ||||||
|
|
@@ -135,6 +140,7 @@ def __init__( | |||||
| variables: Union[None, int, str, List[Union[str, int]]] = None, | ||||||
| return_object: bool = False, | ||||||
| ignore_format: bool = False, | ||||||
| errors: str = "raise", | ||||||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
| ) -> None: | ||||||
| if imputation_method not in ["missing", "frequent"]: | ||||||
| raise ValueError( | ||||||
|
|
@@ -144,11 +150,18 @@ def __init__( | |||||
| if not isinstance(ignore_format, bool): | ||||||
| raise ValueError("ignore_format takes only booleans True and False") | ||||||
|
|
||||||
| if errors not in ("raise", "warn", "ignore"): | ||||||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. applied the sugesstion |
||||||
| raise ValueError( | ||||||
| "errors takes only values 'raise', 'warn', or 'ignore'. " | ||||||
| f"Got {errors} instead." | ||||||
| ) | ||||||
|
|
||||||
| self.imputation_method = imputation_method | ||||||
| self.fill_value = fill_value | ||||||
| self.variables = _check_variables_input_value(variables) | ||||||
| self.return_object = return_object | ||||||
| self.ignore_format = ignore_format | ||||||
| self.errors = errors | ||||||
|
|
||||||
| def fit(self, X: pd.DataFrame, y: Optional[pd.Series] = None): | ||||||
| """ | ||||||
|
|
@@ -189,9 +202,20 @@ def fit(self, X: pd.DataFrame, y: Optional[pd.Series] = None): | |||||
|
|
||||||
| # Some variables may contain more than 1 mode: | ||||||
| if len(mode_vals) > 1: | ||||||
| raise ValueError( | ||||||
| f"The variable {var} contains multiple frequent categories." | ||||||
| ) | ||||||
| if self.errors == "raise": | ||||||
| raise ValueError( | ||||||
| f"The variable {var} contains multiple " | ||||||
| f"frequent categories. Set errors='warn' or " | ||||||
| f"errors='ignore' to allow imputation using " | ||||||
| f"the first most frequent category found." | ||||||
| ) | ||||||
| elif self.errors == "warn": | ||||||
| warnings.warn( | ||||||
| f"Variable {var} has multiple frequent " | ||||||
| f"categories. The first category found, " | ||||||
| f"{mode_vals[0]}, will be used for imputation.", | ||||||
| UserWarning, | ||||||
| ) | ||||||
|
|
||||||
| self.imputer_dict_ = {var: mode_vals[0]} | ||||||
|
|
||||||
|
|
@@ -208,10 +232,22 @@ def fit(self, X: pd.DataFrame, y: Optional[pd.Series] = None): | |||||
| varnames_str = ", ".join(varnames) | ||||||
| else: | ||||||
| varnames_str = varnames[0] | ||||||
| raise ValueError( | ||||||
| f"The variable(s) {varnames_str} contain(s) multiple frequent " | ||||||
| f"categories." | ||||||
| ) | ||||||
|
|
||||||
| if self.errors == "raise": | ||||||
| raise ValueError( | ||||||
| f"The variable(s) {varnames_str} contain(s) " | ||||||
| f"multiple frequent categories. Set " | ||||||
| f"errors='warn' or errors='ignore' to allow " | ||||||
| f"imputation using the first most frequent " | ||||||
| f"category found." | ||||||
| ) | ||||||
| elif self.errors == "warn": | ||||||
| warnings.warn( | ||||||
| f"Variable(s) {varnames_str} have multiple " | ||||||
| f"frequent categories. The first category " | ||||||
| f"found will be used for imputation.", | ||||||
| UserWarning, | ||||||
| ) | ||||||
|
|
||||||
| self.imputer_dict_ = mode_vals.iloc[0].to_dict() | ||||||
|
|
||||||
|
|
||||||
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
| @@ -1,9 +1,23 @@ | ||||||
| import warnings | ||||||
|
|
||||||
| import numpy as np | ||||||
| import pandas as pd | ||||||
| import pytest | ||||||
|
|
||||||
| from feature_engine.imputation import CategoricalImputer | ||||||
|
|
||||||
|
|
||||||
| # --- Shared fixture: perfectly multimodal variable --- | ||||||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. pls remove comment, the dataframe name is good enough :)
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. removed the comments |
||||||
| @pytest.fixture | ||||||
| def multimodal_df(): | ||||||
| return pd.DataFrame( | ||||||
| { | ||||||
| "city": ["London", "London", "Paris", "Paris", "Berlin", "Berlin"], | ||||||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. would be great to add 1 value to each variable that is not going to be a mode, just in case.
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. added as said to |
||||||
| "country": ["UK", "UK", "FR", "FR", "DE", "DE"], | ||||||
| } | ||||||
| ) | ||||||
|
|
||||||
|
|
||||||
| def test_impute_with_string_missing_and_automatically_find_variables(df_na): | ||||||
| # set up transformer | ||||||
| imputer = CategoricalImputer(imputation_method="missing", variables=None) | ||||||
|
|
@@ -150,14 +164,22 @@ def test_error_when_imputation_method_not_frequent_or_missing(): | |||||
|
|
||||||
|
|
||||||
| def test_error_when_variable_contains_multiple_modes(df_na): | ||||||
| msg = "The variable Name contains multiple frequent categories." | ||||||
| msg = ( | ||||||
|
direkkakkar319-ops marked this conversation as resolved.
Outdated
|
||||||
| "The variable Name contains multiple frequent categories. " | ||||||
| "Set errors='warn' or errors='ignore' to allow imputation " | ||||||
| "using the first most frequent category found." | ||||||
| ) | ||||||
| imputer = CategoricalImputer(imputation_method="frequent", variables="Name") | ||||||
| with pytest.raises(ValueError) as record: | ||||||
| imputer.fit(df_na) | ||||||
| # check that error message matches | ||||||
| assert str(record.value) == msg | ||||||
|
|
||||||
| msg = "The variable(s) Name contain(s) multiple frequent categories." | ||||||
| msg = ( | ||||||
| "The variable(s) Name contain(s) multiple frequent categories. " | ||||||
|
direkkakkar319-ops marked this conversation as resolved.
Outdated
|
||||||
| "Set errors='warn' or errors='ignore' to allow imputation " | ||||||
| "using the first most frequent category found." | ||||||
| ) | ||||||
| imputer = CategoricalImputer(imputation_method="frequent") | ||||||
| with pytest.raises(ValueError) as record: | ||||||
| imputer.fit(df_na) | ||||||
|
|
@@ -166,7 +188,11 @@ def test_error_when_variable_contains_multiple_modes(df_na): | |||||
|
|
||||||
| df_ = df_na.copy() | ||||||
| df_["Name_dup"] = df_["Name"] | ||||||
| msg = "The variable(s) Name, Name_dup contain(s) multiple frequent categories." | ||||||
| msg = ( | ||||||
| "The variable(s) Name, Name_dup contain(s) multiple frequent categories. " | ||||||
| "Set errors='warn' or errors='ignore' to allow imputation " | ||||||
| "using the first most frequent category found." | ||||||
| ) | ||||||
| imputer = CategoricalImputer(imputation_method="frequent") | ||||||
| with pytest.raises(ValueError) as record: | ||||||
| imputer.fit(df_) | ||||||
|
|
@@ -305,3 +331,122 @@ def test_error_when_ignore_format_is_not_boolean(ignore_format): | |||||
|
|
||||||
| # check that error message matches | ||||||
| assert str(record.value) == msg | ||||||
|
|
||||||
|
|
||||||
| def test_errors_raise_on_multimodal_is_default(multimodal_df): | ||||||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. suggestion applied |
||||||
| """Default behaviour: raise ValueError on multimodal variable.""" | ||||||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. pls remove comment, test name should explain what the test is about
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. removed the comments |
||||||
| imputer = CategoricalImputer(imputation_method="frequent") | ||||||
| with pytest.raises(ValueError, match="multiple frequent categories"): | ||||||
| imputer.fit(multimodal_df) | ||||||
|
|
||||||
|
|
||||||
| def test_errors_warn_emits_userwarning(multimodal_df): | ||||||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. suggestion applied |
||||||
| """errors='warn': UserWarning must be emitted.""" | ||||||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. pls remove comment
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. removed the comments |
||||||
| imputer = CategoricalImputer(imputation_method="frequent", errors="warn") | ||||||
| with pytest.warns(UserWarning, match="multiple frequent categories"): | ||||||
| imputer.fit(multimodal_df) | ||||||
|
|
||||||
|
|
||||||
| def test_errors_warn_uses_first_mode(multimodal_df): | ||||||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am not sure we need this test. We test the correct mode of imputation in previous tests. If necessary, we could add a parametrize to a previous test to check that the result is the same when multimodel=warn or ignore.
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Replaced the separate tests test_multimodal_raises_warning , test_errors_warn_uses_first_mode , and test_errors_ignore_no_warning_raised with a single, parameterized test: test_multimodal_imputation_result . |
||||||
| """errors='warn': imputer_dict_ should contain the first mode.""" | ||||||
| imputer = CategoricalImputer(imputation_method="frequent", errors="warn") | ||||||
| with pytest.warns(UserWarning): | ||||||
| imputer.fit(multimodal_df) | ||||||
| expected = multimodal_df["city"].mode()[0] | ||||||
| assert imputer.imputer_dict_["city"] == expected | ||||||
|
|
||||||
|
|
||||||
| def test_errors_ignore_no_warning_raised(multimodal_df): | ||||||
| """errors='ignore': no warnings should be emitted.""" | ||||||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. pls remove comment. Test name should be clear enough to indicate what's being tested.
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. removed the comments and will take care of the test names to be good enough to explain themselves |
||||||
| imputer = CategoricalImputer(imputation_method="frequent", errors="ignore") | ||||||
| with warnings.catch_warnings(): | ||||||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is there not a better way to test the absence of a warning? Something like this: or this: Would you mind trying any of those?
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. i have refaactored the multimodal tests in test_categorical_imputer.py to use a more robust approach for verifying the absence of warnings, as requested tests cases are passing |
||||||
| warnings.simplefilter("error") # Promote all warnings to errors | ||||||
| imputer.fit(multimodal_df) # Should NOT raise | ||||||
| assert imputer.imputer_dict_["city"] == multimodal_df["city"].mode()[0] | ||||||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this was tested before, and it's not specific for this test. Pls remove
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. removed the code block |
||||||
|
|
||||||
|
|
||||||
| def test_errors_invalid_value_raises(): | ||||||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. here we normally use parametrize to pass more than 1 not accepted value, not just a string, but a number or a boolean, and it should fail for all. Could we reformat?
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||||||
| """Passing an unsupported value for errors should raise ValueError at init.""" | ||||||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. pls remove comments from all tests :) |
||||||
| with pytest.raises(ValueError, match="errors takes only values"): | ||||||
| CategoricalImputer(imputation_method="frequent", errors="bad_value") | ||||||
|
|
||||||
|
|
||||||
| def test_errors_param_ignored_when_imputation_method_is_missing(): | ||||||
| """errors param has no effect for imputation_method='missing'.""" | ||||||
| df = pd.DataFrame({"city": ["London", np.nan, "Paris"]}) | ||||||
| imputer = CategoricalImputer(imputation_method="missing", errors="warn") | ||||||
| # Should fit without warnings since there's no mode computation | ||||||
| with warnings.catch_warnings(): | ||||||
| warnings.simplefilter("error") | ||||||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. i am not sure we are testing that there were no warning. Could you pls check?
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||||||
| imputer.fit(df) | ||||||
|
|
||||||
|
|
||||||
| def test_errors_ignore_single_variable(): | ||||||
| """errors='ignore' on single multimodal variable — silent, uses first mode.""" | ||||||
| X = pd.DataFrame( | ||||||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do we need this test? the logic of the transformer is tested in previous tests
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||||||
| {"city": ["London", "London", "Paris", "Paris", "Berlin", "Berlin"]} | ||||||
| ) | ||||||
| imputer = CategoricalImputer(imputation_method="frequent", errors="ignore") | ||||||
| imputer.fit(X) | ||||||
| assert imputer.imputer_dict_["city"] == X["city"].mode()[0] | ||||||
|
|
||||||
|
|
||||||
| def test_errors_ignore_multiple_variables(): | ||||||
| """errors='ignore' on multiple multimodal variables — silent, uses first mode.""" | ||||||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. do we need this test? the logic of the imputation is tested in previous tests |
||||||
| X = pd.DataFrame( | ||||||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. we should remove df from here and use fixture instead. |
||||||
| { | ||||||
| "city": ["London", "London", "Paris", "Paris", "Berlin", "Berlin"], | ||||||
| "country": ["UK", "UK", "FR", "FR", "DE", "DE"], | ||||||
| } | ||||||
| ) | ||||||
| imputer = CategoricalImputer(imputation_method="frequent", errors="ignore") | ||||||
| imputer.fit(X) | ||||||
| assert imputer.imputer_dict_["city"] == X["city"].mode()[0] | ||||||
| assert imputer.imputer_dict_["country"] == X["country"].mode()[0] | ||||||
|
|
||||||
|
|
||||||
| # ============================================================================= | ||||||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. thanks for highlighting this but pls remove these commented block.
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. removed the commented block |
||||||
| # NEW TESTS — added to fix codecov patch coverage (1 missing + 1 partial line) | ||||||
| # ============================================================================= | ||||||
|
|
||||||
| def test_errors_warn_single_variable_emits_userwarning(): | ||||||
| """ | ||||||
| Covers the warnings.warn() inside the SINGLE-VARIABLE block of fit(). | ||||||
|
|
||||||
| The existing test_errors_warn_emits_userwarning uses multimodal_df (2 columns), | ||||||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. thanks for picking this up. Instead of lengthy comments, we should try and capture the essence of the test in the test name.
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yes will take care of it |
||||||
| which goes through the multi-variable code path. This test uses variables='city' | ||||||
| (a single variable) to hit the separate single-variable warn branch. | ||||||
| """ | ||||||
| X = pd.DataFrame( | ||||||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. pls use fixture instead of re defining. You can pass 1 column to the fit param instead.
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||||||
| {"city": ["London", "London", "Paris", "Paris", "Berlin", "Berlin"]} | ||||||
| ) | ||||||
| imputer = CategoricalImputer( | ||||||
| imputation_method="frequent", variables="city", errors="warn" | ||||||
| ) | ||||||
| with pytest.warns(UserWarning, match="multiple frequent categories"): | ||||||
| imputer.fit(X) | ||||||
| # First mode is used | ||||||
| assert imputer.imputer_dict_["city"] == X["city"].mode()[0] | ||||||
|
|
||||||
|
|
||||||
| def test_errors_raise_one_multimodal_among_multiple_variables(): | ||||||
| """ | ||||||
| Covers the `varnames_str = varnames[0]` else-branch in the MULTI-VARIABLE block. | ||||||
|
|
||||||
| This branch is reached when multiple variables are selected but only ONE of them | ||||||
| turns out to have multiple modes. The existing tests either raise on all-multimodal | ||||||
| datasets (len(varnames) > 1) or use errors='ignore'/'warn' (skipping the raise). | ||||||
| Here we select two variables where only 'city' is multimodal, triggering the | ||||||
| singular else-branch before the ValueError is raised. | ||||||
| """ | ||||||
| X = pd.DataFrame( | ||||||
| { | ||||||
| # 'city': 3 equally frequent values → multimodal | ||||||
| "city": ["London", "London", "Paris", "Paris", "Berlin", "Berlin"], | ||||||
| # 'country': clear single mode (UK appears 3×, others once) | ||||||
| "country": ["UK", "UK", "UK", "FR", "DE", "SE"], | ||||||
| } | ||||||
| ) | ||||||
| imputer = CategoricalImputer(imputation_method="frequent", errors="raise") | ||||||
| with pytest.raises(ValueError, match="city"): | ||||||
| imputer.fit(X) | ||||||





Uh oh!
There was an error while loading. Please reload this page.