Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
18ba895
Adding extensions for generating md version of docs
rcap107 Jun 17, 2026
7067138
Merge remote-tracking branch 'upstream/HEAD' into doc-improve-agentic…
rcap107 Jun 18, 2026
d022a01
changing llms library
rcap107 Jun 18, 2026
209a68c
updating build to ship documentation
rcap107 Jun 18, 2026
b435e39
changelog
rcap107 Jun 18, 2026
0172334
avoiding duplication, renaming folder
rcap107 Jun 18, 2026
b24ce6a
[doc-build]
rcap107 Jun 18, 2026
1b90ce7
pyproject
rcap107 Jun 18, 2026
62dfcbb
Merge remote-tracking branch 'upstream/HEAD' into doc-improve-agentic…
rcap107 Jun 18, 2026
15dcc08
cleanup of doc install
rcap107 Jun 18, 2026
8d15e9d
adding manifest file to clean up wheel
rcap107 Jun 18, 2026
6ab36e8
improvements
rcap107 Jun 18, 2026
76dfc61
more references to applytocols/selectors
rcap107 Jun 18, 2026
8f4e62d
clean up build
rcap107 Jun 19, 2026
4e577f9
updating build process
rcap107 Jun 22, 2026
95e922f
removing setup
rcap107 Jun 22, 2026
1f5d2a5
changelog
rcap107 Jun 22, 2026
4f732fc
removing manifest
rcap107 Jun 22, 2026
f78dac8
avoiding exec
rcap107 Jun 22, 2026
3c748c2
improving tv/applytocol docs
rcap107 Jun 22, 2026
d05cb0d
commenting out a test
rcap107 Jun 22, 2026
b8ec156
cleanup
rcap107 Jun 23, 2026
b8549dc
Merge remote-tracking branch 'upstream/HEAD' into doc-improve-agentic…
rcap107 Jun 23, 2026
aa749a9
lock file
rcap107 Jun 23, 2026
918906d
testing build
rcap107 Jun 23, 2026
ef17076
excluding py files from test collection
rcap107 Jun 23, 2026
6519659
moving doc files to _docs
rcap107 Jun 23, 2026
6cf2470
Merge remote-tracking branch 'upstream/HEAD' into doc-improve-agentic…
rcap107 Jun 25, 2026
e9f4ff7
updating examples
rcap107 Jun 25, 2026
6a19cf7
updating changelog
rcap107 Jun 25, 2026
79b2690
restoring old files
rcap107 Jun 25, 2026
6f96af6
some comments
rcap107 Jun 25, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 19 additions & 0 deletions .gitignore

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adding files copied from the skrub/_docs folder to the gitignore so they don't get counted twice (like the changelog)

Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,25 @@ doc/CHANGES.rst
doc/RELEASE_PROCESS.rst
doc/CONTRIBUTING.rst
doc/sg_execution_times.rst
# RST content files synced from skrub/_docs at build time (conf.py)
doc/about.rst
doc/column_level_featurizing.rst
doc/data_ops.rst
doc/default_wrangling.rst
doc/development.rst
doc/documentation.rst
doc/exploring_a_dataframe.rst
doc/howto.rst
doc/index.rst
doc/install.rst
doc/joining_dataframes.rst
doc/learning_materials.rst
doc/multi_column_operations.rst
doc/tutorial_example.rst
doc/vision.rst
doc/guides/
doc/modules/
doc/tutorials/
.DS_Store
doc/_templates/demo_table_report_generated.html
doc/reference/*.rst
Expand Down
5 changes: 5 additions & 0 deletions CHANGES.rst
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,11 @@ Changes
:pr:`2048` by :user:`Riccardo Cappuzzo <rcap107>`.
- The minimum required version of matplotlib has been increased from 3.4.3 to 3.6.1.
:pr:`2159` by :user:`Riccardo Cappuzzo <rcap107>`.
- The package build has been updated to include the user guide and examples with
the package, so that it is now possible to access it directly from the wheel
rather than having to rely on the online docs. Docs and examples are now stored
in ``skrub/_docs``, rather than in the root of the repository.
:pr:`2173` by :user:`Riccardo Cappuzzo <rcap107>`.

Bugfixes
--------
Expand Down
11 changes: 9 additions & 2 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -17,8 +17,7 @@ skrub
.. |black| image:: https://img.shields.io/badge/code%20style-black-000000.svg


**skrub** (formerly *dirty_cat*) is a Python
library that facilitates machine learning with dataframes.
**skrub** is a Python library that facilitates machine learning with dataframes.

If you like the package, spread the word and ⭐ this repository!
You can also join the `Discord server <https://discord.gg/ABaPnm7fDC>`_.
Expand All @@ -28,6 +27,14 @@ Website: https://skrub-data.org/
See our `examples <https://skrub-data.org/stable/auto_examples>`_, or check out
the `learning materials <https://skrub-data.org/skrub-materials/index.html>`_.

Documentation and examples are bundled with the package itself, in
``skrub/_docs``. After installing, you can find it at:

.. code-block:: python

import skrub
print(skrub.__docs_dir__)

Installation
------------

Expand Down
22 changes: 19 additions & 3 deletions doc/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -29,24 +29,40 @@ html:
rm -rf $(BUILDDIR)/html/_images
#rm -rf _build/doctrees/
SKB_TABLE_REPORT_VERBOSITY=0 $(SPHINXBUILD) -b html $(ALLSPHINXOPTS) $(BUILDDIR)/html
# Build markdown sources so llms.txt links point to .md files
SKB_TABLE_REPORT_VERBOSITY=0 $(SPHINXBUILD) -b markdown $(ALLSPHINXOPTS) $(BUILDDIR)/markdown
cp -r $(BUILDDIR)/markdown/. $(BUILDDIR)/html/_sources/
@echo
@echo "Build finished. The HTML pages are in $(BUILDDIR)/html."

html-noplot:
SKB_TABLE_REPORT_VERBOSITY=0 $(SPHINXBUILD) -D plot_gallery=0 -b html $(ALLSPHINXOPTS) $(BUILDDIR)/html
SKB_TABLE_REPORT_VERBOSITY=0 SKIP_JUPYTERLITE=1 $(SPHINXBUILD) -D markdown_uri_doc_suffix="html.md" -D plot_gallery=0 -b html $(ALLSPHINXOPTS) $(BUILDDIR)/html
# Build markdown sources so llms.txt links point to .md files
SKB_TABLE_REPORT_VERBOSITY=0 SKIP_JUPYTERLITE=1 $(SPHINXBUILD) -D plot_gallery=0 -b markdown $(ALLSPHINXOPTS) $(BUILDDIR)/markdown
cp -r $(BUILDDIR)/markdown/. $(BUILDDIR)/html/_sources/
@echo
@echo "Build finished. The HTML pages are in $(BUILDDIR)/html."

linkcheck:
$(SPHINXBUILD) -b linkcheck $(ALLSPHINXOPTS) $(BUILDDIR)/linkcheck
SKB_TABLE_REPORT_VERBOSITY=0 $(SPHINXBUILD) -b linkcheck $(ALLSPHINXOPTS) $(BUILDDIR)/linkcheck
@echo
@echo "Linkcheck finished. Results are in $(BUILDDIR)/linkcheck."

linkcheck-noplot:
$(SPHINXBUILD) -D plot_gallery=0 -b linkcheck $(ALLSPHINXOPTS) $(BUILDDIR)/linkcheck-noplot
SKB_TABLE_REPORT_VERBOSITY=0 SKIP_JUPYTERLITE=1 $(SPHINXBUILD) -D plot_gallery=0 -b linkcheck $(ALLSPHINXOPTS) $(BUILDDIR)/linkcheck-noplot
@echo
@echo "Linkcheck (no plot) finished. Results are in $(BUILDDIR)/linkcheck-noplot."

markdown:
SKB_TABLE_REPORT_VERBOSITY=0 $(SPHINXBUILD) -b markdown $(ALLSPHINXOPTS) $(BUILDDIR)/markdown
@echo
@echo "Markdown build finished. The markdown files are in $(BUILDDIR)/markdown."

markdown-noplot:
SKB_TABLE_REPORT_VERBOSITY=0 SKIP_JUPYTERLITE=1 $(SPHINXBUILD) -D plot_gallery=0 -b markdown $(ALLSPHINXOPTS) $(BUILDDIR)/markdown
@echo
@echo "Markdown build (no plot) finished. The markdown files are in $(BUILDDIR)/markdown."

# Catch-all target: route all unknown targets to Sphinx using the new
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
%: Makefile
Expand Down
67 changes: 51 additions & 16 deletions doc/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,13 @@

import jinja2

# Allow skipping jupyterlite to speed up builds (e.g. html-noplot)
_SKIP_JUPYTERLITE = os.environ.get("SKIP_JUPYTERLITE", "").strip() in (
"1",
"true",
"yes",
)

# Generate the table report html file for the homepage
sys.path.append(os.path.relpath("."))
from data_ops_report import create_data_ops_report
Expand All @@ -43,14 +50,33 @@
from github_link import make_linkcode_resolve
from sphinx_gallery.notebook import add_code_cell, add_markdown_cell

# -- Copy files for docs --------------------------------------------------
# -- Sync documentation source files from skrub/_docs --------------------
#
# We avoid duplicating the information, but we do not use symlinks to be
# able to build the docs on Windows
# skrub/_docs is the single source of truth for all guide/content RST files
# so they are packaged with the wheel. We copy them into doc/ at build time
# rather than using symlinks (to support Windows builds).
#
# CHANGES.rst, CONTRIBUTING.rst and RELEASE_PROCESS.rst are canonical in the
# project root and are NOT stored in skrub/_docs.
shutil.copyfile("../RELEASE_PROCESS.rst", "RELEASE_PROCESS.rst")
shutil.copyfile("../CHANGES.rst", "CHANGES.rst")
shutil.copyfile("../CONTRIBUTING.rst", "CONTRIBUTING.rst")

_docs_src = Path("../skrub/_docs")

# Copy top-level RST content files
_skip_toplevel = {"CHANGES.rst", "CONTRIBUTING.rst", "RELEASE_PROCESS.rst"}
for _rst_file in _docs_src.glob("*.rst"):
if _rst_file.name not in _skip_toplevel:
shutil.copyfile(_rst_file, _rst_file.name)

# Copy content subdirectories (guides, modules)
for _subdir in ["guides", "modules"]:
shutil.copytree(_docs_src / _subdir, _subdir, dirs_exist_ok=True)

# Copy tutorials source files for sphinx-gallery
shutil.copytree(_docs_src / "tutorials", "tutorials", dirs_exist_ok=True)

# -- General configuration ------------------------------------------------

# If your documentation needs a minimal Sphinx version, state it here.
Expand All @@ -76,27 +102,36 @@
"sphinx_copybutton",
"sphinx_gallery.gen_gallery",
"autoshortsummary",
"sphinx_llms_txt",
"sphinx_markdown_builder",
]

# -- sphinx-llms-txt configuration -------------------------------------------
# Link to Markdown sources in _sources/ (generated by the markdown builder).
llms_txt_uri_template = "{base_url}_sources/{docname}.md"

try:
import sphinxext.opengraph # noqa

extensions.append("sphinxext.opengraph")
except ImportError:
print("ERROR: sphinxext.opengraph import failed")

try:
import jupyterlite_sphinx # noqa: F401

extensions.append("jupyterlite_sphinx")
with_jupyterlite = True
except ImportError:
# In some cases we don't want to require jupyterlite_sphinx to be installed,
# e.g. the doc-min-dependencies build
warnings.warn(
"jupyterlite_sphinx is not installed, you need to install it "
"if you want JupyterLite links to appear in each example"
)
if not _SKIP_JUPYTERLITE:
try:
import jupyterlite_sphinx # noqa: F401

extensions.append("jupyterlite_sphinx")
with_jupyterlite = True
except ImportError:
# In some cases we don't want to require jupyterlite_sphinx to be installed,
# e.g. the doc-min-dependencies build
warnings.warn(
"jupyterlite_sphinx is not installed, you need to install it "
"if you want JupyterLite links to appear in each example"
)
with_jupyterlite = False
else:
with_jupyterlite = False

import sphinx_autosummary_accessors
Expand Down Expand Up @@ -480,7 +515,7 @@ def call_garbage_collector(gallery_conf, fname):
# See https://sphinx-gallery.github.io/stable/configuration.html#link-to-documentation # noqa
},
"filename_pattern": ".*",
"examples_dirs": ["../examples", "tutorials"],
"examples_dirs": ["../skrub/_docs/examples", "tutorials"],
"gallery_dirs": ["auto_examples", "auto_tutorials"],
"within_subsection_order": FileNameSortKey,
"download_all_examples": False,
Expand Down
Loading
Loading