Skip to content

Add example: working with databases using SQLite and Ibis [Doc]#1866

Open
PredictiveManish wants to merge 5 commits into
skrub-data:mainfrom
PredictiveManish:db-example
Open

Add example: working with databases using SQLite and Ibis [Doc]#1866
PredictiveManish wants to merge 5 commits into
skrub-data:mainfrom
PredictiveManish:db-example

Conversation

@PredictiveManish

Copy link
Copy Markdown
Contributor

This PR adds a new example demonstrating how to use skrub DataOps with database-backed data.
Fixes #1322
The example uses SQLite and Ibis to:

  • express joins and filtering at the database level

  • materialize the result once into pandas

  • build a skrub DataOps plan on top of the resulting DataFrame

The goal is to illustrate a realistic database-first workflow aligned with skrub’s vision, without focusing on model performance or deployment.

I’m happy to iterate on the scope, structure, or level of detail if you’d prefer a narrower or simpler example.

@PredictiveManish PredictiveManish marked this pull request as ready for review January 27, 2026 21:01
@PredictiveManish

Copy link
Copy Markdown
Contributor Author

Unable to understand why checks are cancelled. Please help if anything I can do to overcome this. @rcap107

@rcap107

rcap107 commented Jan 29, 2026

Copy link
Copy Markdown
Member

Checks were failing because of issues not related to your PR @PredictiveManish

For the time being, you can ignore the tests failing. #1855 will fix the problem once it gets merged.

@PredictiveManish

Copy link
Copy Markdown
Contributor Author

Checks were failing because of issues not related to your PR @PredictiveManish

For the time being, you can ignore the tests failing. #1855 will fix the problem once it gets merged.

Thanks for the clarification!
I’ll wait for #1855 to merge.

@PredictiveManish

Copy link
Copy Markdown
Contributor Author

@rcap107 Pending PR is merged now! hope we can move further with this PR

@rcap107

rcap107 commented Feb 13, 2026

Copy link
Copy Markdown
Member

Hi @PredictiveManish, yes, now there shouldn't be issues with the CI. I'll try to give a preliminary review at the start of next week. Thanks!

@PredictiveManish

Copy link
Copy Markdown
Contributor Author

Yes two checks are failing will recheck and update if there's any problem from my side.

@rcap107

rcap107 commented Feb 13, 2026

Copy link
Copy Markdown
Member

Checks are failing because you need to specify in pyproject.toml that ibis should be an extra dependency.

The fact that you can run the example locally makes me think that you're using a local virtual environment with packages that are different from what is used on the CI.

The easiest way to test this is by installing pixi (https://pixi.prefix.dev/latest/installation/) and running the command pixi run -e doc build-doc. This is the same command that is run on the CI, so you can make sure that it runs on your machine before pushing on the repo. You can look at #1880 as an example, or check out the documentation https://skrub-data.org/stable/tutorial_example.html#tutorial-write-example

@rcap107

rcap107 commented Feb 16, 2026

Copy link
Copy Markdown
Member

Hi @PredictiveManish, I addressed the problem with the missing dependency, so now the CI should work run.

Overall, I quite like the example: I think the idea of focusing on the database operations is sound given the subject. However, the example has a problem: it is missing the point of using the data ops throughout the pipeline. Indeed, it's possible to wrap all the database operations in DataOps by defining variables and deferred functions from the start. This should be reflected in the example: the database operations should be wrapped in deferred functions.

Another improvement that may be made after merging this PR would be replacing the boilerplate needed to create the database with an actual database. That will be for a separate PR, however.

Do you think you can update this example so that Data Ops are used to wrap all the database operations done with Ibis? You'll find information about the deferred functions in the user guide and in some of the other examples, and if you have more questions you can also ask here.

Otherwise, we can refine this version a bit, and then the example will be updated in a separate PR.

@PredictiveManish

Copy link
Copy Markdown
Contributor Author

@rcap107 Surely will work on this and make this better, actually this week is fully packed as I'm attending India AI Impact Summit so the whole day is going into that, will surely cover this issue after this week. Sorry for the delay but will cover all the nuances you asked to cover.

Yes I can surely work on these don't worry I'll spend more time and make this solution totally aligned with what the problem actually wanted. Thanks for explanation.

@rcap107

rcap107 commented Feb 16, 2026

Copy link
Copy Markdown
Member

@rcap107 Surely will work on this and make this better, actually this week is fully packed as I'm attending India AI Impact Summit so the whole day is going into that, will surely cover this issue after this week. Sorry for the delay but will cover all the nuances you asked to cover.

Yes I can surely work on these don't worry I'll spend more time and make this solution totally aligned with what the problem actually wanted. Thanks for explanation.

There's no hurry to merge this, take your time and work on the PR when you can. Enjoy the summit 👍

@PredictiveManish

Copy link
Copy Markdown
Contributor Author

@rcap107 Surely will work on this and make this better, actually this week is fully packed as I'm attending India AI Impact Summit so the whole day is going into that, will surely cover this issue after this week. Sorry for the delay but will cover all the nuances you asked to cover.
Yes I can surely work on these don't worry I'll spend more time and make this solution totally aligned with what the problem actually wanted. Thanks for explanation.

There's no hurry to merge this, take your time and work on the PR when you can. Enjoy the summit 👍

Thanks for the words!

@rcap107

rcap107 commented Mar 6, 2026

Copy link
Copy Markdown
Member

Hi @PredictiveManish, are you still working on this PR?

@rcap107 rcap107 added the stalled This PR hasn't seen activity in some time and may be closed or handled by maintainers label Mar 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

stalled This PR hasn't seen activity in some time and may be closed or handled by maintainers

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add an example with a database

2 participants