Dataops quick tour by jeromedockes · Pull Request #2169 · skrub-data/skrub

jeromedockes · 2026-06-15T16:58:49Z

introductory notebook

comes after #2162

moujanrastgoo · 2026-06-17T14:46:50Z

+encoder = skrub.choose_from(
+    {"lse": skrub.StringEncoder(), "minhash": skrub.MinHashEncoder()}, name="encoder"
+)
+pred = employee_data.skb.apply(
+    skrub.TableVectorizer(high_cardinality=encoder)
+).skb.apply(
+    HistGradientBoostingRegressor(
+        learning_rate=skrub.choose_float(0.01, 0.7, log=True, name="learning_rate")
+    ),
+    y=salary,
+)
+print(pred.skb.describe_param_grid())


Hey Jerome!
First of all, this as a whole is way clearer and more demonstrative as what we had before, but for this specific part, what do you think about also adding choose from with 2 estimators( like hgb and ridge or any other one that you like) to show that is possible as well? I know it might be too soon to show that option , but when learning about it, it really impressed me, so it would have been nice to see it from the start.

thanks @moujanrastgoo! you are right that this is important to show. I tried to show it by having a choice of the encoder between StringEncoder and MinHashEncoder, do you think it is enough to add a sentence to highlight that? I would like to keep the example simple, and also while useful tuning the choice of the final estimators could seem a bit odd, as one might consider those to be 2 different pipelines, and evaluate those separately and make a decision manually 🤔

I see what you mean! Yes, I agree, I think adding a sentence to highlight it, along with the link to the section explaining nested choices in the user guide is enough. Thank you very much!

jeromedockes added 10 commits June 11, 2026 18:34

start improving dataops user guide

926840e

_

1bad33e

_

37aa3e7

add quicktour example

725cbc6

update pixi lock

99dc54f

Merge remote-tracking branch 'upstream/main' into dataops-user-guide

f8ea9b8

_

a06ec12

pixi version

857dbb4

Merge remote-tracking branch 'upstream/main' into dataops-quick-tour

c76ca7f

iter example

85ef979

jeromedockes added documentation Add or improve the documentation data_ops Something related to the skrub DataOps labels Jun 15, 2026

jeromedockes added 4 commits June 15, 2026 19:05

Merge branch 'dataops-user-guide' into dataops-quick-tour

4a07545

doc build

5206864

[doc build]

c49859e

[doc build]

32ba64e

moujanrastgoo reviewed Jun 17, 2026

View reviewed changes

jeromedockes added 4 commits June 18, 2026 12:03

_

fac6ef1

Merge remote-tracking branch 'upstream/main' into dataops-quick-tour

451096b

update links

f901447

_

a8b338d

jeromedockes marked this pull request as ready for review June 23, 2026 13:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Dataops quick tour#2169

Dataops quick tour#2169
jeromedockes wants to merge 18 commits into
skrub-data:mainfrom
jeromedockes:dataops-quick-tour

jeromedockes commented Jun 15, 2026

Uh oh!

moujanrastgoo Jun 17, 2026

Uh oh!

jeromedockes Jun 18, 2026

Uh oh!

moujanrastgoo Jun 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

jeromedockes commented Jun 15, 2026

Uh oh!

moujanrastgoo Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

jeromedockes Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

moujanrastgoo Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants