Add example: working with databases using SQLite and Ibis [Doc]#1866
Add example: working with databases using SQLite and Ibis [Doc]#1866PredictiveManish wants to merge 5 commits into
Conversation
|
Unable to understand why checks are cancelled. Please help if anything I can do to overcome this. @rcap107 |
|
Checks were failing because of issues not related to your PR @PredictiveManish For the time being, you can ignore the tests failing. #1855 will fix the problem once it gets merged. |
Thanks for the clarification! |
|
@rcap107 Pending PR is merged now! hope we can move further with this PR |
|
Hi @PredictiveManish, yes, now there shouldn't be issues with the CI. I'll try to give a preliminary review at the start of next week. Thanks! |
|
Yes two checks are failing will recheck and update if there's any problem from my side. |
|
Checks are failing because you need to specify in The fact that you can run the example locally makes me think that you're using a local virtual environment with packages that are different from what is used on the CI. The easiest way to test this is by installing pixi (https://pixi.prefix.dev/latest/installation/) and running the command |
|
Hi @PredictiveManish, I addressed the problem with the missing dependency, so now the CI should work run. Overall, I quite like the example: I think the idea of focusing on the database operations is sound given the subject. However, the example has a problem: it is missing the point of using the data ops throughout the pipeline. Indeed, it's possible to wrap all the database operations in DataOps by defining variables and deferred functions from the start. This should be reflected in the example: the database operations should be wrapped in deferred functions. Another improvement that may be made after merging this PR would be replacing the boilerplate needed to create the database with an actual database. That will be for a separate PR, however. Do you think you can update this example so that Data Ops are used to wrap all the database operations done with Ibis? You'll find information about the deferred functions in the user guide and in some of the other examples, and if you have more questions you can also ask here. Otherwise, we can refine this version a bit, and then the example will be updated in a separate PR. |
|
@rcap107 Surely will work on this and make this better, actually this week is fully packed as I'm attending India AI Impact Summit so the whole day is going into that, will surely cover this issue after this week. Sorry for the delay but will cover all the nuances you asked to cover. Yes I can surely work on these don't worry I'll spend more time and make this solution totally aligned with what the problem actually wanted. Thanks for explanation. |
There's no hurry to merge this, take your time and work on the PR when you can. Enjoy the summit 👍 |
Thanks for the words! |
|
Hi @PredictiveManish, are you still working on this PR? |
This PR adds a new example demonstrating how to use skrub DataOps with database-backed data.
Fixes #1322
The example uses
SQLite and Ibisto:express joins and filtering at the database level
materialize the result once into pandas
build a skrub DataOps plan on top of the resulting DataFrame
The goal is to illustrate a realistic database-first workflow aligned with skrub’s vision, without focusing on model performance or deployment.
I’m happy to iterate on the scope, structure, or level of detail if you’d prefer a narrower or simpler example.