Skip to content
Open
Show file tree
Hide file tree
Changes from 7 commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
fb230fe
feat(CategoricalImputer): add errors param to handle multimodal varia…
direkkakkar319-ops Mar 8, 2026
81be348
style: fix flake8 line length in CategoricalImputer
direkkakkar319-ops Mar 8, 2026
4fb5b7a
style: fix import order and duplicate pandas import
direkkakkar319-ops Mar 8, 2026
835133f
test: add coverage for errors='ignore' branches
direkkakkar319-ops Mar 8, 2026
81f31d8
style: add missing newline at end of test file
direkkakkar319-ops Mar 8, 2026
657de1f
Changes for codedev tests
direkkakkar319-ops Mar 9, 2026
a0ea71d
added space at last of test_categorical_imputer.py
direkkakkar319-ops Mar 16, 2026
0cdcf03
Revert docs/whats_new/v_190.rst to upstream version
direkkakkar319-ops Mar 26, 2026
cf7670e
changes done to `feature_engine/imputation/categorical.py`
direkkakkar319-ops Mar 26, 2026
fb2f8db
changes made to `tests/test_imputation/test_categorical_imputer.py`
direkkakkar319-ops Mar 26, 2026
97d6053
resolved comment done on R15
direkkakkar319-ops Mar 26, 2026
c454edd
reformated the error tests to match the error from within pytest
direkkakkar319-ops Mar 26, 2026
5992d09
made three tests in on test
direkkakkar319-ops Mar 26, 2026
85b1974
left change
direkkakkar319-ops Mar 26, 2026
09429f3
refaactored the multimodal tests
direkkakkar319-ops Mar 26, 2026
0b86cfa
refactored test_errors_invalid_value_raises
direkkakkar319-ops Mar 26, 2026
45f4e2f
changed the function `test_errors_param_ignored_when_imputation_metho…
direkkakkar319-ops Mar 26, 2026
cda93e7
removed `test_errors_ignore_single_variable` `test_errors_ignore_mult…
direkkakkar319-ops Mar 26, 2026
04be1a0
emove the commented block
direkkakkar319-ops Mar 26, 2026
94643d8
last few changes made
direkkakkar319-ops Mar 26, 2026
ab6ba66
test case style updated
direkkakkar319-ops Mar 26, 2026
6ba7fce
Renamed `errors` to `multimodal` in CategoricalImputer and add missin…
direkkakkar319-ops Mar 27, 2026
1a3fde2
Apply suggestion from @solegalli
direkkakkar319-ops Mar 27, 2026
36eb1dc
Apply suggestion from @solegalli
direkkakkar319-ops Mar 27, 2026
aa37d19
Update categorical.py
direkkakkar319-ops Mar 27, 2026
3e58d8b
removed comments and added tests
direkkakkar319-ops Mar 27, 2026
6746429
Merge branch 'issue-904-categorical-imputer-multimodal' of https://gi…
direkkakkar319-ops Mar 27, 2026
c77e8f1
Update .gitignore
direkkakkar319-ops Mar 27, 2026
a22f586
removed the spaces
direkkakkar319-ops Mar 27, 2026
51f8276
Merge branch 'issue-904-categorical-imputer-multimodal' of https://gi…
direkkakkar319-ops Mar 27, 2026
7156d28
removed the spaces
direkkakkar319-ops Mar 27, 2026
5d65fe8
simplified the test case as asked
direkkakkar319-ops Mar 27, 2026
a95f5e0
simplified the test case as asked
direkkakkar319-ops Mar 27, 2026
6f5b4da
simplified the test case as asked
direkkakkar319-ops Mar 27, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -86,6 +86,7 @@ celerybeat-schedule
# Environments
.env
.venv
.venv_wsl
env/
venv/
ENV/
Expand Down
1 change: 1 addition & 0 deletions docs/whats_new/v_190.rst
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,7 @@ New transformers
Enhancements
~~~~~~~~~~~~

- Added `errors` parameter to `CategoricalImputer` to handle categorical variables with multiple frequent categories instead of automatically raising a `ValueError`. (`DirekKakkar <https://github.com/DirekKakkar>`_)
Comment thread
direkkakkar319-ops marked this conversation as resolved.
Outdated
- Our variable handling functions now return empty lists when no variables of the desired type are found. (`Soledad Galli <https://github.com/solegalli>`_)

BUG
Expand Down
86 changes: 61 additions & 25 deletions feature_engine/imputation/categorical.py
Original file line number Diff line number Diff line change
@@ -1,33 +1,26 @@
# Authors: Soledad Galli <solegalli@protonmail.com>
# License: BSD 3 clause

import warnings
from typing import List, Optional, Union

import pandas as pd

from feature_engine._check_init_parameters.check_variables import (
_check_variables_input_value,
)
from feature_engine._check_init_parameters.check_variables import \
_check_variables_input_value
from feature_engine._docstrings.fit_attributes import (
_feature_names_in_docstring,
_imputer_dict_docstring,
_n_features_in_docstring,
_variables_attribute_docstring,
)
from feature_engine._docstrings.methods import (
_fit_transform_docstring,
_transform_imputers_docstring,
)
_feature_names_in_docstring, _imputer_dict_docstring,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please restore to previous format.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

restored the previous format

_n_features_in_docstring, _variables_attribute_docstring)
from feature_engine._docstrings.methods import (_fit_transform_docstring,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please make format match other imports

_transform_imputers_docstring)
from feature_engine._docstrings.substitute import Substitution
from feature_engine.dataframe_checks import check_X
from feature_engine.imputation.base_imputer import BaseImputer
from feature_engine.tags import _return_tags
from feature_engine.variable_handling import (
check_all_variables,
check_categorical_variables,
find_all_variables,
find_categorical_variables,
)
from feature_engine.variable_handling import (check_all_variables,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please restore to previous format

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

restored the previous format

check_categorical_variables,
find_all_variables,
find_categorical_variables)


@Substitution(
Expand Down Expand Up @@ -88,6 +81,18 @@ class CategoricalImputer(BaseImputer):
type object or categorical. If True, the imputer will select all variables or
accept all variables entered by the user, including those cast as numeric.

errors : str, default='raise'
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of "errors", let's call this parameter "multimodal" so it is immediately obvious what it is about.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

made errors to multimodal

Indicates what to do when the selected imputation_method='frequent'
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Indicates what to do when the selected imputation_method='frequent'
Indicates what to do when `imputation_method='frequent'`

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

applied the sugesstion

and a variable has more than 1 mode.

If 'raise', raises a ValueError and stops the fit.

If 'warn', raises a UserWarning and continues, imputing using the
Comment thread
direkkakkar319-ops marked this conversation as resolved.
Outdated
first most frequent category found.

If 'ignore', continues without warnings, imputing using the first
most frequent category found.

Attributes
----------
{imputer_dict_}
Expand Down Expand Up @@ -135,6 +140,7 @@ def __init__(
variables: Union[None, int, str, List[Union[str, int]]] = None,
return_object: bool = False,
ignore_format: bool = False,
errors: str = "raise",
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
errors: str = "raise",
multimodal: str = "raise",

) -> None:
if imputation_method not in ["missing", "frequent"]:
raise ValueError(
Expand All @@ -144,11 +150,18 @@ def __init__(
if not isinstance(ignore_format, bool):
raise ValueError("ignore_format takes only booleans True and False")

if errors not in ("raise", "warn", "ignore"):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if errors not in ("raise", "warn", "ignore"):
if not isinstance(errors, str) or errors not in ("raise", "warn", "ignore"):

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

applied the sugesstion

raise ValueError(
"errors takes only values 'raise', 'warn', or 'ignore'. "
f"Got {errors} instead."
)

self.imputation_method = imputation_method
self.fill_value = fill_value
self.variables = _check_variables_input_value(variables)
self.return_object = return_object
self.ignore_format = ignore_format
self.errors = errors

def fit(self, X: pd.DataFrame, y: Optional[pd.Series] = None):
"""
Expand Down Expand Up @@ -189,9 +202,20 @@ def fit(self, X: pd.DataFrame, y: Optional[pd.Series] = None):

# Some variables may contain more than 1 mode:
if len(mode_vals) > 1:
raise ValueError(
f"The variable {var} contains multiple frequent categories."
)
if self.errors == "raise":
raise ValueError(
f"The variable {var} contains multiple "
f"frequent categories. Set errors='warn' or "
f"errors='ignore' to allow imputation using "
f"the first most frequent category found."
)
elif self.errors == "warn":
warnings.warn(
f"Variable {var} has multiple frequent "
f"categories. The first category found, "
f"{mode_vals[0]}, will be used for imputation.",
UserWarning,
)

self.imputer_dict_ = {var: mode_vals[0]}

Expand All @@ -208,10 +232,22 @@ def fit(self, X: pd.DataFrame, y: Optional[pd.Series] = None):
varnames_str = ", ".join(varnames)
else:
varnames_str = varnames[0]
raise ValueError(
f"The variable(s) {varnames_str} contain(s) multiple frequent "
f"categories."
)

if self.errors == "raise":
raise ValueError(
f"The variable(s) {varnames_str} contain(s) "
f"multiple frequent categories. Set "
f"errors='warn' or errors='ignore' to allow "
f"imputation using the first most frequent "
f"category found."
)
elif self.errors == "warn":
warnings.warn(
f"Variable(s) {varnames_str} have multiple "
f"frequent categories. The first category "
f"found will be used for imputation.",
UserWarning,
)

self.imputer_dict_ = mode_vals.iloc[0].to_dict()

Expand Down
151 changes: 148 additions & 3 deletions tests/test_imputation/test_categorical_imputer.py
Original file line number Diff line number Diff line change
@@ -1,9 +1,23 @@
import warnings

import numpy as np
import pandas as pd
import pytest

from feature_engine.imputation import CategoricalImputer


# --- Shared fixture: perfectly multimodal variable ---
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls remove comment, the dataframe name is good enough :)

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed the comments

@pytest.fixture
def multimodal_df():
return pd.DataFrame(
{
"city": ["London", "London", "Paris", "Paris", "Berlin", "Berlin"],
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be great to add 1 value to each variable that is not going to be a mode, just in case.
It would also be useful to add 1 variable that contains only 1 mode.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added as said to

            "city": ["London", "London", "Paris", "Paris", "Berlin", "Berlin", "Madrid"],
            "country": ["UK", "UK", "FR", "FR", "DE", "DE", "ES"],
            "one_mode": ["London", "London", "London", "Paris", "Paris", "Berlin", "Berlin"],

"country": ["UK", "UK", "FR", "FR", "DE", "DE"],
}
)


def test_impute_with_string_missing_and_automatically_find_variables(df_na):
# set up transformer
imputer = CategoricalImputer(imputation_method="missing", variables=None)
Expand Down Expand Up @@ -150,14 +164,22 @@ def test_error_when_imputation_method_not_frequent_or_missing():


def test_error_when_variable_contains_multiple_modes(df_na):
msg = "The variable Name contains multiple frequent categories."
msg = (
Comment thread
direkkakkar319-ops marked this conversation as resolved.
Outdated
"The variable Name contains multiple frequent categories. "
"Set errors='warn' or errors='ignore' to allow imputation "
"using the first most frequent category found."
)
imputer = CategoricalImputer(imputation_method="frequent", variables="Name")
with pytest.raises(ValueError) as record:
imputer.fit(df_na)
# check that error message matches
assert str(record.value) == msg

msg = "The variable(s) Name contain(s) multiple frequent categories."
msg = (
"The variable(s) Name contain(s) multiple frequent categories. "
Comment thread
direkkakkar319-ops marked this conversation as resolved.
Outdated
"Set errors='warn' or errors='ignore' to allow imputation "
"using the first most frequent category found."
)
imputer = CategoricalImputer(imputation_method="frequent")
with pytest.raises(ValueError) as record:
imputer.fit(df_na)
Expand All @@ -166,7 +188,11 @@ def test_error_when_variable_contains_multiple_modes(df_na):

df_ = df_na.copy()
df_["Name_dup"] = df_["Name"]
msg = "The variable(s) Name, Name_dup contain(s) multiple frequent categories."
msg = (
"The variable(s) Name, Name_dup contain(s) multiple frequent categories. "
"Set errors='warn' or errors='ignore' to allow imputation "
"using the first most frequent category found."
)
imputer = CategoricalImputer(imputation_method="frequent")
with pytest.raises(ValueError) as record:
imputer.fit(df_)
Expand Down Expand Up @@ -305,3 +331,122 @@ def test_error_when_ignore_format_is_not_boolean(ignore_format):

# check that error message matches
assert str(record.value) == msg


def test_errors_raise_on_multimodal_is_default(multimodal_df):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def test_errors_raise_on_multimodal_is_default(multimodal_df):
def test_multimodal_raises_errors(multimodal_df):

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion applied

"""Default behaviour: raise ValueError on multimodal variable."""
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls remove comment, test name should explain what the test is about

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed the comments

imputer = CategoricalImputer(imputation_method="frequent")
with pytest.raises(ValueError, match="multiple frequent categories"):
imputer.fit(multimodal_df)


def test_errors_warn_emits_userwarning(multimodal_df):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def test_errors_warn_emits_userwarning(multimodal_df):
def test_multimodal_raises_warning(multimodal_df):

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion applied

"""errors='warn': UserWarning must be emitted."""
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls remove comment

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed the comments

imputer = CategoricalImputer(imputation_method="frequent", errors="warn")
with pytest.warns(UserWarning, match="multiple frequent categories"):
imputer.fit(multimodal_df)


def test_errors_warn_uses_first_mode(multimodal_df):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure we need this test. We test the correct mode of imputation in previous tests. If necessary, we could add a parametrize to a previous test to check that the result is the same when multimodel=warn or ignore.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replaced the separate tests test_multimodal_raises_warning , test_errors_warn_uses_first_mode , and test_errors_ignore_no_warning_raised with a single, parameterized test: test_multimodal_imputation_result .
This new test uses @pytest.mark.parametrize("errors", ["warn", "ignore"]) to check both behaviors:

test cases are passing
Image

"""errors='warn': imputer_dict_ should contain the first mode."""
imputer = CategoricalImputer(imputation_method="frequent", errors="warn")
with pytest.warns(UserWarning):
imputer.fit(multimodal_df)
expected = multimodal_df["city"].mode()[0]
assert imputer.imputer_dict_["city"] == expected


def test_errors_ignore_no_warning_raised(multimodal_df):
"""errors='ignore': no warnings should be emitted."""
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls remove comment. Test name should be clear enough to indicate what's being tested.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed the comments and will take care of the test names to be good enough to explain themselves

imputer = CategoricalImputer(imputation_method="frequent", errors="ignore")
with warnings.catch_warnings():
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there not a better way to test the absence of a warning?

Something like this:

    with warnings.catch_warnings(record=True) as w:
        warnings.simplefilter("always")

        # Call your function
        result = my_function()

        # Assert no warnings were raised
        assert len(w) == 0

or this:

    with pytest.warns(None) as record:
        my_function()

    assert len(record) == 0

Would you mind trying any of those?

Copy link
Copy Markdown
Author

@direkkakkar319-ops direkkakkar319-ops Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i have refaactored the multimodal tests in test_categorical_imputer.py to use a more robust approach for verifying the absence of warnings, as requested

tests cases are passing

warnings.simplefilter("error") # Promote all warnings to errors
imputer.fit(multimodal_df) # Should NOT raise
assert imputer.imputer_dict_["city"] == multimodal_df["city"].mode()[0]
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was tested before, and it's not specific for this test. Pls remove

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed the code block



def test_errors_invalid_value_raises():
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here we normally use parametrize to pass more than 1 not accepted value, not just a string, but a number or a boolean, and it should fail for all. Could we reformat?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have refactored test_errors_invalid_value_raises in test_categorical_imputer.py to use @pytest.mark.parametrize as suggested.

test cases are passing
Image

"""Passing an unsupported value for errors should raise ValueError at init."""
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls remove comments from all tests :)

with pytest.raises(ValueError, match="errors takes only values"):
CategoricalImputer(imputation_method="frequent", errors="bad_value")


def test_errors_param_ignored_when_imputation_method_is_missing():
"""errors param has no effect for imputation_method='missing'."""
df = pd.DataFrame({"city": ["London", np.nan, "Paris"]})
imputer = CategoricalImputer(imputation_method="missing", errors="warn")
# Should fit without warnings since there's no mode computation
with warnings.catch_warnings():
warnings.simplefilter("error")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i am not sure we are testing that there were no warning. Could you pls check?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This new approach specifically checks that no warnings containing the message "multiple frequent categories" are raised when imputation_method='missing' , even if errors='warn' is set.

tests cases are passing
Image

imputer.fit(df)


def test_errors_ignore_single_variable():
"""errors='ignore' on single multimodal variable — silent, uses first mode."""
X = pd.DataFrame(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this test? the logic of the transformer is tested in previous tests

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed test_errors_ignore_single_variable and test_errors_ignore_multiple_variables .

test cases are passing
Image

{"city": ["London", "London", "Paris", "Paris", "Berlin", "Berlin"]}
)
imputer = CategoricalImputer(imputation_method="frequent", errors="ignore")
imputer.fit(X)
assert imputer.imputer_dict_["city"] == X["city"].mode()[0]


def test_errors_ignore_multiple_variables():
"""errors='ignore' on multiple multimodal variables — silent, uses first mode."""
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need this test? the logic of the imputation is tested in previous tests

X = pd.DataFrame(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should remove df from here and use fixture instead.

{
"city": ["London", "London", "Paris", "Paris", "Berlin", "Berlin"],
"country": ["UK", "UK", "FR", "FR", "DE", "DE"],
}
)
imputer = CategoricalImputer(imputation_method="frequent", errors="ignore")
imputer.fit(X)
assert imputer.imputer_dict_["city"] == X["city"].mode()[0]
assert imputer.imputer_dict_["country"] == X["country"].mode()[0]


# =============================================================================
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for highlighting this but pls remove these commented block.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed the commented block

# NEW TESTS — added to fix codecov patch coverage (1 missing + 1 partial line)
# =============================================================================

def test_errors_warn_single_variable_emits_userwarning():
"""
Covers the warnings.warn() inside the SINGLE-VARIABLE block of fit().

The existing test_errors_warn_emits_userwarning uses multimodal_df (2 columns),
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for picking this up. Instead of lengthy comments, we should try and capture the essence of the test in the test name.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes will take care of it

which goes through the multi-variable code path. This test uses variables='city'
(a single variable) to hit the separate single-variable warn branch.
"""
X = pd.DataFrame(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls use fixture instead of re defining. You can pass 1 column to the fit param instead.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Improved Test Naming
Updated both test_warning_when_single_variable_is_multimodal and
test_errors_raise_when_only_one_variable_is_multimodal
Simplified test_errors_raise_when_only_one_variable_is_multimodal

test cases are passing
Image

{"city": ["London", "London", "Paris", "Paris", "Berlin", "Berlin"]}
)
imputer = CategoricalImputer(
imputation_method="frequent", variables="city", errors="warn"
)
with pytest.warns(UserWarning, match="multiple frequent categories"):
imputer.fit(X)
# First mode is used
assert imputer.imputer_dict_["city"] == X["city"].mode()[0]


def test_errors_raise_one_multimodal_among_multiple_variables():
"""
Covers the `varnames_str = varnames[0]` else-branch in the MULTI-VARIABLE block.

This branch is reached when multiple variables are selected but only ONE of them
turns out to have multiple modes. The existing tests either raise on all-multimodal
datasets (len(varnames) > 1) or use errors='ignore'/'warn' (skipping the raise).
Here we select two variables where only 'city' is multimodal, triggering the
singular else-branch before the ValueError is raised.
"""
X = pd.DataFrame(
{
# 'city': 3 equally frequent values → multimodal
"city": ["London", "London", "Paris", "Paris", "Berlin", "Berlin"],
# 'country': clear single mode (UK appears 3×, others once)
"country": ["UK", "UK", "UK", "FR", "DE", "SE"],
}
)
imputer = CategoricalImputer(imputation_method="frequent", errors="raise")
with pytest.raises(ValueError, match="city"):
imputer.fit(X)