Dataops quick tour#2169
Conversation
| encoder = skrub.choose_from( | ||
| {"lse": skrub.StringEncoder(), "minhash": skrub.MinHashEncoder()}, name="encoder" | ||
| ) | ||
| pred = employee_data.skb.apply( | ||
| skrub.TableVectorizer(high_cardinality=encoder) | ||
| ).skb.apply( | ||
| HistGradientBoostingRegressor( | ||
| learning_rate=skrub.choose_float(0.01, 0.7, log=True, name="learning_rate") | ||
| ), | ||
| y=salary, | ||
| ) | ||
| print(pred.skb.describe_param_grid()) |
There was a problem hiding this comment.
Hey Jerome!
First of all, this as a whole is way clearer and more demonstrative as what we had before, but for this specific part, what do you think about also adding choose from with 2 estimators( like hgb and ridge or any other one that you like) to show that is possible as well? I know it might be too soon to show that option , but when learning about it, it really impressed me, so it would have been nice to see it from the start.
There was a problem hiding this comment.
thanks @moujanrastgoo! you are right that this is important to show. I tried to show it by having a choice of the encoder between StringEncoder and MinHashEncoder, do you think it is enough to add a sentence to highlight that? I would like to keep the example simple, and also while useful tuning the choice of the final estimators could seem a bit odd, as one might consider those to be 2 different pipelines, and evaluate those separately and make a decision manually 🤔
There was a problem hiding this comment.
I see what you mean! Yes, I agree, I think adding a sentence to highlight it, along with the link to the section explaining nested choices in the user guide is enough. Thank you very much!
introductory notebook
comes after #2162