-
Notifications
You must be signed in to change notification settings - Fork 260
feat: take pipeline's estimator into account to decide tabular vectorizer options in TabularPipeline #2152
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat: take pipeline's estimator into account to decide tabular vectorizer options in TabularPipeline #2152
Changes from 1 commit
de8bd3e
2961a8a
e5406db
37fc386
26ad5e6
30a9019
0a78b9f
3f24e2b
5dc4fa5
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
| @@ -1,7 +1,8 @@ | ||||||
| import pytest | ||||||
| from sklearn import ensemble | ||||||
| from sklearn.impute import SimpleImputer | ||||||
| from sklearn.linear_model import Ridge | ||||||
| from sklearn.linear_model import LogisticRegression, Ridge | ||||||
| from sklearn.pipeline import Pipeline as skpipeline | ||||||
|
MarieSacksick marked this conversation as resolved.
Outdated
|
||||||
| from sklearn.preprocessing import OneHotEncoder, OrdinalEncoder | ||||||
|
|
||||||
| from skrub import ( | ||||||
|
|
@@ -14,7 +15,13 @@ | |||||
|
|
||||||
|
|
||||||
| @pytest.mark.parametrize( | ||||||
| "learner_kind", ["regressor", "regression", "classifier", "classification"] | ||||||
| "learner_kind", | ||||||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. not a big deal at all but in general please try to avoid changes that are unrelated to your Pull Request to make it easier to review and avoid cluttering the git history |
||||||
| [ | ||||||
| "regressor", | ||||||
| "regression", | ||||||
| "classifier", | ||||||
| "classification", | ||||||
| ], | ||||||
| ) | ||||||
| def test_default_pipeline(learner_kind): | ||||||
| p = tabular_pipeline(learner_kind) | ||||||
|
|
@@ -74,3 +81,16 @@ def test_from_dtype(): | |||||
| ensemble.HistGradientBoostingRegressor(categorical_features="from_dtype") | ||||||
| ) | ||||||
| assert isinstance(p.named_steps["tablevectorizer"].low_cardinality, ToCategorical) | ||||||
|
|
||||||
|
|
||||||
| def test_skpipeline_learner(): | ||||||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
| original_learner = LogisticRegression() | ||||||
| sk_pipeline = skpipeline([("imputer", SimpleImputer()), ("clf", original_learner)]) | ||||||
| p = tabular_pipeline(sk_pipeline) | ||||||
|
khaoulariad marked this conversation as resolved.
Outdated
|
||||||
| tv, imputer, scaler, learner = (e for _, e in p.steps) | ||||||
| assert learner is original_learner | ||||||
| assert isinstance(tv.high_cardinality, StringEncoder) | ||||||
| assert isinstance(tv.low_cardinality, OneHotEncoder) | ||||||
| assert isinstance(imputer, SimpleImputer) | ||||||
| assert isinstance(scaler, SquashingScaler) | ||||||
|
khaoulariad marked this conversation as resolved.
Outdated
|
||||||
| assert tv.datetime.periodic_encoding == "spline" | ||||||
|
khaoulariad marked this conversation as resolved.
Outdated
|
||||||
Uh oh!
There was an error while loading. Please reload this page.