Add example: working with databases using SQLite and Ibis [Doc] by PredictiveManish · Pull Request #1866 · skrub-data/skrub

PredictiveManish · 2026-01-27T19:32:09Z

This PR adds a new example demonstrating how to use skrub DataOps with database-backed data.
Fixes #1322
The example uses SQLite and Ibis to:

express joins and filtering at the database level
materialize the result once into pandas
build a skrub DataOps plan on top of the resulting DataFrame

The goal is to illustrate a realistic database-first workflow aligned with skrub’s vision, without focusing on model performance or deployment.

I’m happy to iterate on the scope, structure, or level of detail if you’d prefer a narrower or simpler example.

PredictiveManish · 2026-01-27T21:04:22Z

Unable to understand why checks are cancelled. Please help if anything I can do to overcome this. @rcap107

rcap107 · 2026-01-29T09:36:11Z

Checks were failing because of issues not related to your PR @PredictiveManish

For the time being, you can ignore the tests failing. #1855 will fix the problem once it gets merged.

PredictiveManish · 2026-01-29T09:46:07Z

Checks were failing because of issues not related to your PR @PredictiveManish

For the time being, you can ignore the tests failing. #1855 will fix the problem once it gets merged.

Thanks for the clarification!
I’ll wait for #1855 to merge.

PredictiveManish · 2026-02-10T16:14:34Z

@rcap107 Pending PR is merged now! hope we can move further with this PR

rcap107 · 2026-02-13T15:49:07Z

Hi @PredictiveManish, yes, now there shouldn't be issues with the CI. I'll try to give a preliminary review at the start of next week. Thanks!

PredictiveManish · 2026-02-13T15:52:52Z

Yes two checks are failing will recheck and update if there's any problem from my side.

rcap107 · 2026-02-13T16:26:31Z

Checks are failing because you need to specify in pyproject.toml that ibis should be an extra dependency.

The fact that you can run the example locally makes me think that you're using a local virtual environment with packages that are different from what is used on the CI.

The easiest way to test this is by installing pixi (https://pixi.prefix.dev/latest/installation/) and running the command pixi run -e doc build-doc. This is the same command that is run on the CI, so you can make sure that it runs on your machine before pushing on the repo. You can look at #1880 as an example, or check out the documentation https://skrub-data.org/stable/tutorial_example.html#tutorial-write-example

…/1866

rcap107 · 2026-02-16T15:30:33Z

Hi @PredictiveManish, I addressed the problem with the missing dependency, so now the CI should work run.

Overall, I quite like the example: I think the idea of focusing on the database operations is sound given the subject. However, the example has a problem: it is missing the point of using the data ops throughout the pipeline. Indeed, it's possible to wrap all the database operations in DataOps by defining variables and deferred functions from the start. This should be reflected in the example: the database operations should be wrapped in deferred functions.

Another improvement that may be made after merging this PR would be replacing the boilerplate needed to create the database with an actual database. That will be for a separate PR, however.

Do you think you can update this example so that Data Ops are used to wrap all the database operations done with Ibis? You'll find information about the deferred functions in the user guide and in some of the other examples, and if you have more questions you can also ask here.

Otherwise, we can refine this version a bit, and then the example will be updated in a separate PR.

PredictiveManish · 2026-02-16T15:35:03Z

@rcap107 Surely will work on this and make this better, actually this week is fully packed as I'm attending India AI Impact Summit so the whole day is going into that, will surely cover this issue after this week. Sorry for the delay but will cover all the nuances you asked to cover.

Yes I can surely work on these don't worry I'll spend more time and make this solution totally aligned with what the problem actually wanted. Thanks for explanation.

rcap107 · 2026-02-16T15:41:06Z

@rcap107 Surely will work on this and make this better, actually this week is fully packed as I'm attending India AI Impact Summit so the whole day is going into that, will surely cover this issue after this week. Sorry for the delay but will cover all the nuances you asked to cover.

Yes I can surely work on these don't worry I'll spend more time and make this solution totally aligned with what the problem actually wanted. Thanks for explanation.

There's no hurry to merge this, take your time and work on the PR when you can. Enjoy the summit 👍

PredictiveManish · 2026-02-16T15:46:41Z

@rcap107 Surely will work on this and make this better, actually this week is fully packed as I'm attending India AI Impact Summit so the whole day is going into that, will surely cover this issue after this week. Sorry for the delay but will cover all the nuances you asked to cover.
Yes I can surely work on these don't worry I'll spend more time and make this solution totally aligned with what the problem actually wanted. Thanks for explanation.

There's no hurry to merge this, take your time and work on the PR when you can. Enjoy the summit 👍

Thanks for the words!

rcap107 · 2026-03-06T12:14:51Z

Hi @PredictiveManish, are you still working on this PR?

PredictiveManish added 2 commits January 28, 2026 00:55

Added database example using SQLite and Ibis for DataOps

05586e0

Slight docs lang alignment

0fa0d4d

PredictiveManish mentioned this pull request Jan 27, 2026

Add an example with a database #1322

Open

PredictiveManish marked this pull request as ready for review January 27, 2026 21:01

commiting for restart testing

6c284bd

rcap107 added 2 commits February 16, 2026 15:45

Merge remote-tracking branch 'upstream/HEAD' into pr/PredictiveManish…

5d2f0ba

…/1866

adding missing dependency

e9f2d31

rcap107 added the stalled This PR hasn't seen activity in some time and may be closed or handled by maintainers label Mar 30, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add example: working with databases using SQLite and Ibis [Doc]#1866

Add example: working with databases using SQLite and Ibis [Doc]#1866
PredictiveManish wants to merge 5 commits into
skrub-data:mainfrom
PredictiveManish:db-example

PredictiveManish commented Jan 27, 2026

Uh oh!

PredictiveManish commented Jan 27, 2026

Uh oh!

rcap107 commented Jan 29, 2026 •

edited

Loading

Uh oh!

PredictiveManish commented Jan 29, 2026

Uh oh!

PredictiveManish commented Feb 10, 2026

Uh oh!

rcap107 commented Feb 13, 2026

Uh oh!

PredictiveManish commented Feb 13, 2026

Uh oh!

rcap107 commented Feb 13, 2026

Uh oh!

rcap107 commented Feb 16, 2026

Uh oh!

PredictiveManish commented Feb 16, 2026

Uh oh!

rcap107 commented Feb 16, 2026

Uh oh!

PredictiveManish commented Feb 16, 2026

Uh oh!

rcap107 commented Mar 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

PredictiveManish commented Jan 27, 2026

Uh oh!

PredictiveManish commented Jan 27, 2026

Uh oh!

rcap107 commented Jan 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

PredictiveManish commented Jan 29, 2026

Uh oh!

PredictiveManish commented Feb 10, 2026

Uh oh!

rcap107 commented Feb 13, 2026

Uh oh!

PredictiveManish commented Feb 13, 2026

Uh oh!

rcap107 commented Feb 13, 2026

Uh oh!

rcap107 commented Feb 16, 2026

Uh oh!

PredictiveManish commented Feb 16, 2026

Uh oh!

rcap107 commented Feb 16, 2026

Uh oh!

PredictiveManish commented Feb 16, 2026

Uh oh!

rcap107 commented Mar 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

rcap107 commented Jan 29, 2026 •

edited

Loading