skrub-data · rcap107 · Jun 25, 2026 · Jun 25, 2026 · Jun 25, 2026 · Jun 25, 2026
diff --git a/doc/guides/table_report/02_exporting.rst b/doc/guides/table_report/02_exporting.rst
@@ -3,8 +3,8 @@
 .. |column_associations| replace:: :func:`~skrub.column_associations`
 
 .. _user_guide_table_report_sharing:
-How to export and share the |TableReport|
------------------------------------------
+How to export and share the |TableReport| for use by other tools
+----------------------------------------------------------------
 
 The |TableReport| is generated as a standalone HTML file that includes the report
 data, the plots, and the Javascript necessary to provide interactivity.
@@ -31,7 +31,8 @@ respectively.
 
 The report can be exported in JSON format, which allows structured
 access to the data and statistics used to build the report with
-:func:`~skrub.TableReport.json`.
+:func:`~skrub.TableReport.json`. The schema of the JSON data is reported in
+:ref:`table_report_json_schema`.
 
 .. code-block::
 

diff --git a/doc/modules/data_ops/basics/control_flow.rst b/doc/modules/data_ops/basics/control_flow.rst
@@ -168,9 +168,9 @@ Finally, there are other situations where using :func:`deferred` can be helpful:
 
 .. rubric:: Examples
 
-- See :ref:`sphx_glr_auto_examples_data_ops_1110_data_ops_intro.py` for an introductory
+- See :ref:`sphx_glr_auto_tutorials_1110_data_ops_intro.py` for an introductory
   example on how to use skrub DataOps on a single dataframe.
-- See :ref:`sphx_glr_auto_examples_data_ops_1120_multiple_tables.py` for an example
+- See :ref:`sphx_glr_auto_examples_02_data_ops_1120_multiple_tables.py` for an example
   of how skrub DataOps can be used to process multiple tables using dataframe APIs.
-- See :ref:`sphx_glr_auto_examples_data_ops_1130_choices.py` for an example of
+- See :ref:`sphx_glr_auto_examples_02_data_ops_1130_choices.py` for an example of
   hyper-parameter tuning using skrub DataOps.
diff --git a/doc/modules/data_ops/ml_pipeline/applying_different_transformers.rst b/doc/modules/data_ops/ml_pipeline/applying_different_transformers.rst
@@ -150,4 +150,4 @@ to obtain the final result:
 
 More info on advanced column selection and manipulation be found in
 :ref:`user_guide_selectors` and example
-:ref:`sphx_glr_auto_examples_0090_apply_to_cols.py`.
+:ref:`sphx_glr_auto_examples_0010_apply_to_cols.py`.
diff --git a/doc/modules/data_ops/ml_pipeline/subsampling_data.rst b/doc/modules/data_ops/ml_pipeline/subsampling_data.rst
@@ -26,4 +26,4 @@ set to ``True`` to force using the subsampling when we call them. Note that
 even if we set ``keep_subsampling=True``, subsampling is not applied when using
 ``predict``.
 
-See more details in a :ref:`full example <sphx_glr_auto_examples_data_ops_1140_subsampling.py>`.
+See more details in a :ref:`full example <sphx_glr_auto_examples_02_data_ops_1140_subsampling.py>`.
diff --git a/doc/modules/data_ops/validation/exporting_data_ops.rst b/doc/modules/data_ops/validation/exporting_data_ops.rst
@@ -62,5 +62,5 @@ or in a different environment:
 >>> loaded_learner.fit({"orders": new_orders_df})
 SkrubLearner(data_op=<Apply TableVectorizer>)
 
-See :ref:`sphx_glr_auto_examples_data_ops_1150_use_case.py` for an example of how
+See :ref:`sphx_glr_auto_examples_02_data_ops_1150_use_case.py` for an example of how
 to use the learner in a microservice.
diff --git a/doc/modules/data_ops/validation/hyperparameter_tuning.rst b/doc/modules/data_ops/validation/hyperparameter_tuning.rst
@@ -162,7 +162,7 @@ search respectively), or with the ``choose`` parameter of
 :meth:`.skb.make_learner() <DataOp.skb.make_learner>`.
 
 A full example of how to use hyperparameter search is available in
-:ref:`sphx_glr_auto_examples_data_ops_1130_choices.py`, and a full example using
+:ref:`sphx_glr_auto_examples_02_data_ops_1130_choices.py`, and a full example using
 Optuna is in :ref:`example_optuna_choices`.
 
 |

diff --git a/doc/modules/default_wrangling/apply_to_cols.rst b/doc/modules/default_wrangling/apply_to_cols.rst
@@ -25,18 +25,18 @@ to apply the proper transformers to different datatypes, using it may not be an
 option in all cases. In scikit-learn pipelines, the column selection operation can
 be done with the :class:`~sklearn.compose.ColumnTransformer`.
 
-Skrub provides the |ApplyToCols| transformer to achieve the same results with
-a larger degree of control over which columns are being transformed.
+Skrub provides the |ApplyToCols| transformer and the
+:ref:`selectors<user_guide_selectors>` to achieve the same results with a larger
+degree of control over which columns are being transformed.
 |ApplyToCols| maps a transformer to columns in a dataframe, so that all
-columns that satisfy a certain condition are transformed, while the others are
-left untouched.
+columns that satisfy the condition given by the user are transformed, while the
+others are left untouched.
 
 .. tip::
 
     If a skrub transformer has a ``cols`` parameter to specify a column list,
     that can be a selector as well. Selectors give more control over which columns
-    are being transformed: they are discussed at length in the
-    :ref:`selectors user guide<user_guide_selectors>`.
+    are being transformed.
 
 
 |ApplyToCols| can be used to transform a subset of columns in a dataframe, while

diff --git a/doc/modules/joining_tables/assembling.rst b/doc/modules/joining_tables/assembling.rst
@@ -9,7 +9,22 @@ requires including as much information as possible, often from different sources
 Skrub allows you to join tables on keys of different types (string, numerical,
 datetime) with imprecise correspondence.
 
+.. warning::
 
+    To be considered when using one of the joiners:
+
+    **Joiners are designed for small-to-medium datasets.**
+
+    - **Memory**: The auxiliary table is stored in the transformer state.
+      For tables > 1 million rows, consider using :ref:`skrub Data Ops
+      <user_guide_data_ops_index>` with pandas/polars joins instead.
+
+    - **Computational Cost**: Fuzzy joining requires vectorizing columns
+      and nearest-neighbor search. Test on samples first for large datasets.
+
+    - **Dynamic Data**: If your auxiliary table changes after fitting,
+      you must refit the transformer. Joiners are not suitable for continuously
+      updated tables.
 
 Joining external tables for machine learning
 --------------------------------------------
@@ -58,4 +73,4 @@ in the right table (the table to be added). This is done by estimating the value
 that the missing rows would have by training a machine learning model on the data
 we have access to.
 
-This transformer is explored in more detail in :ref:`this example <sphx_glr_auto_examples_0080_interpolation_join.py>`.
+This transformer is explored in more detail in :ref:`this example <sphx_glr_auto_examples_03_joining_0080_interpolation_join.py>`.
diff --git a/doc/reference/index.rst.template b/doc/reference/index.rst.template
@@ -17,6 +17,7 @@ classes and functions may not be enough to give full guidelines on their use.
 {% for module, _ in API_REFERENCE %}
   {{ module }} <{{ module }}>
 {%- endfor %}
+  TableReport JSON schema <table_report_json_schema>
 {%- if DEPRECATED_API_REFERENCE %}
   deprecated
 {%- endif %}