diff --git a/doc/guides/table_report/02_exporting.rst b/doc/guides/table_report/02_exporting.rst
index 4805d2cf5..2c39b274a 100644
--- a/doc/guides/table_report/02_exporting.rst
+++ b/doc/guides/table_report/02_exporting.rst
@@ -3,8 +3,8 @@
 .. |column_associations| replace:: :func:`~skrub.column_associations`
 
 .. _user_guide_table_report_sharing:
-How to export and share the |TableReport|
------------------------------------------
+How to export and share the |TableReport| for use by other tools
+----------------------------------------------------------------
 
 The |TableReport| is generated as a standalone HTML file that includes the report
 data, the plots, and the Javascript necessary to provide interactivity.
@@ -31,7 +31,8 @@ respectively.
 
 The report can be exported in JSON format, which allows structured
 access to the data and statistics used to build the report with
-:func:`~skrub.TableReport.json`.
+:func:`~skrub.TableReport.json`. The schema of the JSON data is reported in
+:ref:`table_report_json_schema`.
 
 .. code-block::
 
diff --git a/doc/modules/data_ops/basics/control_flow.rst b/doc/modules/data_ops/basics/control_flow.rst
index 7cd1fc31a..6c3ecd63e 100644
--- a/doc/modules/data_ops/basics/control_flow.rst
+++ b/doc/modules/data_ops/basics/control_flow.rst
@@ -168,9 +168,9 @@ Finally, there are other situations where using :func:`deferred` can be helpful:
 
 .. rubric:: Examples
 
-- See :ref:`sphx_glr_auto_examples_data_ops_1110_data_ops_intro.py` for an introductory
+- See :ref:`sphx_glr_auto_tutorials_1110_data_ops_intro.py` for an introductory
   example on how to use skrub DataOps on a single dataframe.
-- See :ref:`sphx_glr_auto_examples_data_ops_1120_multiple_tables.py` for an example
+- See :ref:`sphx_glr_auto_examples_02_data_ops_1120_multiple_tables.py` for an example
   of how skrub DataOps can be used to process multiple tables using dataframe APIs.
-- See :ref:`sphx_glr_auto_examples_data_ops_1130_choices.py` for an example of
+- See :ref:`sphx_glr_auto_examples_02_data_ops_1130_choices.py` for an example of
   hyper-parameter tuning using skrub DataOps.
diff --git a/doc/modules/data_ops/ml_pipeline/applying_different_transformers.rst b/doc/modules/data_ops/ml_pipeline/applying_different_transformers.rst
index 54901bee1..9e44d3317 100644
--- a/doc/modules/data_ops/ml_pipeline/applying_different_transformers.rst
+++ b/doc/modules/data_ops/ml_pipeline/applying_different_transformers.rst
@@ -150,4 +150,4 @@ to obtain the final result:
 
 More info on advanced column selection and manipulation be found in
 :ref:`user_guide_selectors` and example
-:ref:`sphx_glr_auto_examples_0090_apply_to_cols.py`.
+:ref:`sphx_glr_auto_examples_0010_apply_to_cols.py`.
diff --git a/doc/modules/data_ops/ml_pipeline/subsampling_data.rst b/doc/modules/data_ops/ml_pipeline/subsampling_data.rst
index 51a045feb..b6509f6c5 100644
--- a/doc/modules/data_ops/ml_pipeline/subsampling_data.rst
+++ b/doc/modules/data_ops/ml_pipeline/subsampling_data.rst
@@ -26,4 +26,4 @@ set to ``True`` to force using the subsampling when we call them. Note that
 even if we set ``keep_subsampling=True``, subsampling is not applied when using
 ``predict``.
 
-See more details in a :ref:`full example <sphx_glr_auto_examples_data_ops_1140_subsampling.py>`.
+See more details in a :ref:`full example <sphx_glr_auto_examples_02_data_ops_1140_subsampling.py>`.
diff --git a/doc/modules/data_ops/validation/exporting_data_ops.rst b/doc/modules/data_ops/validation/exporting_data_ops.rst
index 2e462c4e5..b70ba1c23 100644
--- a/doc/modules/data_ops/validation/exporting_data_ops.rst
+++ b/doc/modules/data_ops/validation/exporting_data_ops.rst
@@ -62,5 +62,5 @@ or in a different environment:
 >>> loaded_learner.fit({"orders": new_orders_df})
 SkrubLearner(data_op=<Apply TableVectorizer>)
 
-See :ref:`sphx_glr_auto_examples_data_ops_1150_use_case.py` for an example of how
+See :ref:`sphx_glr_auto_examples_02_data_ops_1150_use_case.py` for an example of how
 to use the learner in a microservice.
diff --git a/doc/modules/data_ops/validation/hyperparameter_tuning.rst b/doc/modules/data_ops/validation/hyperparameter_tuning.rst
index 0e2110c81..57df292c4 100644
--- a/doc/modules/data_ops/validation/hyperparameter_tuning.rst
+++ b/doc/modules/data_ops/validation/hyperparameter_tuning.rst
@@ -162,7 +162,7 @@ search respectively), or with the ``choose`` parameter of
 :meth:`.skb.make_learner() <DataOp.skb.make_learner>`.
 
 A full example of how to use hyperparameter search is available in
-:ref:`sphx_glr_auto_examples_data_ops_1130_choices.py`, and a full example using
+:ref:`sphx_glr_auto_examples_02_data_ops_1130_choices.py`, and a full example using
 Optuna is in :ref:`example_optuna_choices`.
 
 |
diff --git a/doc/modules/default_wrangling/apply_to_cols.rst b/doc/modules/default_wrangling/apply_to_cols.rst
index c25eeb418..839329824 100644
--- a/doc/modules/default_wrangling/apply_to_cols.rst
+++ b/doc/modules/default_wrangling/apply_to_cols.rst
@@ -25,18 +25,18 @@ to apply the proper transformers to different datatypes, using it may not be an
 option in all cases. In scikit-learn pipelines, the column selection operation can
 be done with the :class:`~sklearn.compose.ColumnTransformer`.
 
-Skrub provides the |ApplyToCols| transformer to achieve the same results with
-a larger degree of control over which columns are being transformed.
+Skrub provides the |ApplyToCols| transformer and the
+:ref:`selectors<user_guide_selectors>` to achieve the same results with a larger
+degree of control over which columns are being transformed.
 |ApplyToCols| maps a transformer to columns in a dataframe, so that all
-columns that satisfy a certain condition are transformed, while the others are
-left untouched.
+columns that satisfy the condition given by the user are transformed, while the
+others are left untouched.
 
 .. tip::
 
     If a skrub transformer has a ``cols`` parameter to specify a column list,
     that can be a selector as well. Selectors give more control over which columns
-    are being transformed: they are discussed at length in the
-    :ref:`selectors user guide<user_guide_selectors>`.
+    are being transformed.
 
 
 |ApplyToCols| can be used to transform a subset of columns in a dataframe, while
diff --git a/doc/modules/joining_tables/assembling.rst b/doc/modules/joining_tables/assembling.rst
index 51601324c..f2b46859f 100644
--- a/doc/modules/joining_tables/assembling.rst
+++ b/doc/modules/joining_tables/assembling.rst
@@ -9,7 +9,22 @@ requires including as much information as possible, often from different sources
 Skrub allows you to join tables on keys of different types (string, numerical,
 datetime) with imprecise correspondence.
 
+.. warning::
 
+    To be considered when using one of the joiners:
+
+    **Joiners are designed for small-to-medium datasets.**
+
+    - **Memory**: The auxiliary table is stored in the transformer state.
+      For tables > 1 million rows, consider using :ref:`skrub Data Ops
+      <user_guide_data_ops_index>` with pandas/polars joins instead.
+
+    - **Computational Cost**: Fuzzy joining requires vectorizing columns
+      and nearest-neighbor search. Test on samples first for large datasets.
+
+    - **Dynamic Data**: If your auxiliary table changes after fitting,
+      you must refit the transformer. Joiners are not suitable for continuously
+      updated tables.
 
 Joining external tables for machine learning
 --------------------------------------------
@@ -58,4 +73,4 @@ in the right table (the table to be added). This is done by estimating the value
 that the missing rows would have by training a machine learning model on the data
 we have access to.
 
-This transformer is explored in more detail in :ref:`this example <sphx_glr_auto_examples_0080_interpolation_join.py>`.
+This transformer is explored in more detail in :ref:`this example <sphx_glr_auto_examples_03_joining_0080_interpolation_join.py>`.
diff --git a/doc/reference/index.rst.template b/doc/reference/index.rst.template
index 04af24f2d..989c30a7d 100644
--- a/doc/reference/index.rst.template
+++ b/doc/reference/index.rst.template
@@ -17,6 +17,7 @@ classes and functions may not be enough to give full guidelines on their use.
 {% for module, _ in API_REFERENCE %}
   {{ module }} <{{ module }}>
 {%- endfor %}
+  TableReport JSON schema <table_report_json_schema>
 {%- if DEPRECATED_API_REFERENCE %}
   deprecated
 {%- endif %}
diff --git a/doc/reference/table_report_json_schema.rst b/doc/reference/table_report_json_schema.rst
new file mode 100644
index 000000000..3d14fcd01
--- /dev/null
+++ b/doc/reference/table_report_json_schema.rst
@@ -0,0 +1,354 @@
+.. _table_report_json_schema:
+
+TableReport JSON schema
+=======================
+
+:meth:`TableReport.json() <skrub.TableReport.json>` returns a JSON string whose
+top-level object contains the keys described below.
+
+.. note::
+
+   The ``dataframe`` and ``sample_table`` keys, which are present in the
+   internal summary object, are **not** included in the JSON output.
+
+Top-level object
+----------------
+
+.. list-table::
+   :header-rows: 1
+   :widths: 25 15 60
+
+   * - Key
+     - Type
+     - Description
+   * - ``dataframe_module``
+     - string
+     - Name of the dataframe library used. Either ``"pandas"`` or
+       ``"polars"``.
+   * - ``n_rows``
+     - integer
+     - Number of rows in the dataframe.
+   * - ``n_columns``
+     - integer
+     - Number of columns in the dataframe.
+   * - ``dataframe_is_empty``
+     - boolean
+     - ``true`` when the dataframe has no rows or no columns.
+   * - ``plots_skipped``
+     - boolean
+     - ``true`` when ``plot_distributions=False`` was passed to
+       :class:`~skrub.TableReport`. When ``true``, plot keys are absent from
+       column objects, but ``histogram_data`` is still present for numeric and
+       datetime columns.
+   * - ``associations_skipped``
+     - boolean
+     - ``true`` when association computation was skipped (either because
+       ``with_associations=False`` was passed, or because polars was used
+       without pyarrow installed).
+   * - ``cardinality_threshold``
+     - integer
+     - The threshold from :func:`~skrub.get_config` above which a column is
+       considered high-cardinality. Default: 40.
+   * - ``n_constant_columns``
+     - integer
+     - Number of columns whose values are all identical.
+   * - ``columns``
+     - array of :ref:`column objects <table_report_json_schema_column>`
+     - One entry per column, in the original column order.
+   * - ``top_associations``
+     - array of :ref:`association objects <table_report_json_schema_assoc>`
+     - Column-pair association scores (up to 1 000 pairs, sorted by strength).
+       Present only when ``associations_skipped`` is ``false``.
+   * - ``title``
+     - string
+     - *Optional.* Present only when a ``title`` argument was passed to
+       :class:`~skrub.TableReport`.
+   * - ``order_by``
+     - string
+     - *Optional.* Name of the column used for sorting when ``order_by`` was
+       passed to :class:`~skrub.TableReport`.
+
+
+.. _table_report_json_schema_column:
+
+Column object
+-------------
+
+Every entry in ``columns`` contains the following keys. Additional keys are
+present depending on the column's dtype; they are documented in the subsections
+below.
+
+.. list-table::
+   :header-rows: 1
+   :widths: 25 15 60
+
+   * - Key
+     - Type
+     - Description
+   * - ``position``
+     - integer
+     - Zero-based index of the column in the dataframe (same as ``idx``).
+   * - ``idx``
+     - integer
+     - Zero-based index of the column in the dataframe (same as
+       ``position``).
+   * - ``name``
+     - string
+     - Column name.
+   * - ``dtype``
+     - string
+     - Column dtype as a string (e.g. ``"float64"``, ``"object"``,
+       ``"datetime64[ns]"``).
+   * - ``null_count``
+     - integer
+     - Number of null / NaN values.
+   * - ``null_proportion``
+     - number
+     - Fraction of null values in ``[0, 1]``.
+   * - ``nulls_level``
+     - string
+     - Summary severity level: ``"ok"`` (no nulls), ``"warning"`` (some
+       nulls), or ``"critical"`` (all values are null).
+   * - ``value_is_constant``
+     - boolean
+     - ``true`` when every non-null value in the column is identical.
+   * - ``is_ordered``
+     - boolean
+     - ``true`` when the column values are sorted in ascending or descending
+       order.
+   * - ``plot_names``
+     - array of strings
+     - Names of the plot keys that are present on this column object (e.g.
+       ``["histogram_plot"]``). Empty when ``plots_skipped`` is ``true`` or
+       when the column contains only nulls.
+
+
+Columns containing only nulls
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+When ``null_count`` equals ``n_rows`` (all values are null) only the keys of
+the base column object above are present; no statistical keys are added.
+
+
+Categorical / string columns
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Present for non-numeric, non-datetime, non-duration columns that are not
+entirely null.
+
+.. list-table::
+   :header-rows: 1
+   :widths: 25 15 60
+
+   * - Key
+     - Type
+     - Description
+   * - ``n_unique``
+     - integer
+     - Number of distinct values (including null).
+   * - ``unique_proportion``
+     - number
+     - Fraction of distinct values relative to ``n_rows``.
+   * - ``is_high_cardinality``
+     - boolean
+     - ``true`` when ``n_unique`` exceeds ``cardinality_threshold``.
+   * - ``value_counts``
+     - array of ``[value, count]`` pairs
+     - The up to 10 most frequent values and their counts, sorted by
+       frequency descending. Each element is a two-element array
+       ``[value, count]`` where ``value`` is a string and ``count`` is an
+       integer.
+   * - ``most_frequent_values``
+     - array of strings
+     - The up to 10 most frequent values (same order as ``value_counts``).
+   * - ``constant_value``
+     - any
+     - *Present only when* ``value_is_constant`` *is* ``true``. The single
+       value shared by all non-null rows.
+   * - ``value_counts_plot``
+     - string (SVG)
+     - *Present only when* ``plots_skipped`` *is* ``false`` *and*
+       ``value_is_constant`` *is* ``false``. Bar chart of the top value
+       counts as an inline SVG string.
+
+
+Numeric columns
+~~~~~~~~~~~~~~~
+
+Present for columns whose dtype is numeric (integer or float), or duration
+(timedelta). Boolean columns have a subset of these keys (only ``mean``).
+
+.. list-table::
+   :header-rows: 1
+   :widths: 25 15 60
+
+   * - Key
+     - Type
+     - Description
+   * - ``n_unique``
+     - integer
+     - Number of distinct values.
+   * - ``unique_proportion``
+     - number
+     - Fraction of distinct values relative to ``n_rows``.
+   * - ``is_high_cardinality``
+     - boolean
+     - ``true`` when ``n_unique`` exceeds ``cardinality_threshold``.
+   * - ``mean``
+     - number
+     - Arithmetic mean of non-null values.
+   * - ``standard_deviation``
+     - number
+     - Standard deviation of non-null values. ``null`` when it cannot be
+       computed (e.g. single non-null value).
+   * - ``inter_quartile_range``
+     - number
+     - Difference between the 75th and 25th percentiles.
+   * - ``quantiles``
+     - object
+     - Map of quantile level (as a numeric key) to value, for levels
+       ``0.0``, ``0.25``, ``0.5``, ``0.75``, and ``1.0``. Absent when
+       ``value_is_constant`` is ``true``.
+   * - ``constant_value``
+     - number
+     - *Present only when* ``value_is_constant`` *is* ``true``. The single
+       value shared by all non-null rows.
+   * - ``is_duration``
+     - boolean
+     - ``true`` for timedelta / duration columns (values were converted to a
+       numeric unit before statistics were computed).
+   * - ``duration_unit``
+     - string or null
+     - Unit used when ``is_duration`` is ``true``: one of
+       ``"microsecond"``, ``"millisecond"``, ``"second"``, ``"hour"``,
+       ``"day"``, or ``"year"``. ``null`` for non-duration columns.
+   * - ``histogram_data``
+     - :ref:`histogram data object <table_report_json_schema_hist_data>`
+     - Bin counts and edges for the distribution histogram. Always present
+       for non-constant numeric columns (even when ``plots_skipped`` is
+       ``true``).
+   * - ``histogram_plot``
+     - string (SVG)
+     - *Present only when* ``plots_skipped`` *is* ``false`` *and*
+       ``order_by`` *is not set*. Distribution histogram as an inline SVG
+       string.
+   * - ``line_plot``
+     - string (SVG)
+     - *Present only when* ``plots_skipped`` *is* ``false`` *and* ``order_by``
+       *is set*. Line chart of the column values against the sort column as
+       an inline SVG string.
+
+
+Datetime columns
+~~~~~~~~~~~~~~~~
+
+Present for columns with a date or datetime dtype.
+
+.. list-table::
+   :header-rows: 1
+   :widths: 25 15 60
+
+   * - Key
+     - Type
+     - Description
+   * - ``n_unique``
+     - integer
+     - Number of distinct values.
+   * - ``unique_proportion``
+     - number
+     - Fraction of distinct values relative to ``n_rows``.
+   * - ``is_high_cardinality``
+     - boolean
+     - ``true`` when ``n_unique`` exceeds ``cardinality_threshold``.
+   * - ``min``
+     - string (ISO 8601)
+     - Earliest datetime value. Absent when ``value_is_constant`` is
+       ``true``.
+   * - ``max``
+     - string (ISO 8601)
+     - Latest datetime value. Absent when ``value_is_constant`` is
+       ``true``.
+   * - ``constant_value``
+     - string (ISO 8601)
+     - *Present only when* ``value_is_constant`` *is* ``true``. The single
+       datetime value shared by all non-null rows.
+   * - ``histogram_data``
+     - :ref:`histogram data object <table_report_json_schema_hist_data>`
+     - Bin counts and edges. Always present for non-constant datetime
+       columns (even when ``plots_skipped`` is ``true``).
+   * - ``histogram_plot``
+     - string (SVG)
+     - *Present only when* ``plots_skipped`` *is* ``false`` *and*
+       ``value_is_constant`` *is* ``false``. Distribution histogram as an
+       inline SVG string.
+
+
+.. _table_report_json_schema_hist_data:
+
+Histogram data object
+---------------------
+
+The ``histogram_data`` key on numeric and datetime column objects contains an
+object with the following keys.
+
+.. list-table::
+   :header-rows: 1
+   :widths: 25 15 60
+
+   * - Key
+     - Type
+     - Description
+   * - ``bin_counts``
+     - array of integers
+     - Count of values in each bin (length *n*).
+   * - ``bin_edges``
+     - array of numbers
+     - Left and right edges of each bin (length *n + 1*).
+   * - ``n_low_outliers``
+     - integer
+     - Number of values below the plotted range that were excluded from the
+       histogram bins.
+   * - ``n_high_outliers``
+     - integer
+     - Number of values above the plotted range that were excluded from the
+       histogram bins.
+
+
+.. _table_report_json_schema_assoc:
+
+Association object
+------------------
+
+Each entry in the top-level ``top_associations`` array describes the
+association between a pair of columns.
+
+.. list-table::
+   :header-rows: 1
+   :widths: 25 15 60
+
+   * - Key
+     - Type
+     - Description
+   * - ``left_column_name``
+     - string
+     - Name of the first column in the pair.
+   * - ``left_column_idx``
+     - integer
+     - Zero-based index of the first column in the pair.
+   * - ``right_column_name``
+     - string
+     - Name of the second column in the pair.
+   * - ``right_column_idx``
+     - integer
+     - Zero-based index of the second column in the pair.
+   * - ``cramer_v``
+     - number or null
+     - `Cramér's V <https://en.wikipedia.org/wiki/Cram%C3%A9r%27s_V>`_
+       statistic (``[0, 1]``) for the pair. ``null`` when it could not be
+       computed.
+   * - ``pearson_corr``
+     - number or null
+     - `Pearson correlation
+       <https://en.wikipedia.org/wiki/Pearson_correlation_coefficient>`_
+       coefficient (``[-1, 1]``) for the pair. ``null`` when it could not be
+       computed (e.g. one column is not numeric).
diff --git a/doc/tutorials/0000_getting_started.py b/doc/tutorials/0000_getting_started.py
index 4e7cca8f8..d659a77b8 100644
--- a/doc/tutorials/0000_getting_started.py
+++ b/doc/tutorials/0000_getting_started.py
@@ -112,7 +112,8 @@
 # To handle rich tabular data and feed it to a machine learning model, the
 # pipeline returned by |tabular_pipeline| preprocesses and encodes
 # strings, categories and dates using the |TableVectorizer|.
-# See its documentation or :ref:`sphx_glr_auto_examples_0010_encodings.py` for
+# See its documentation or
+# :ref:`sphx_glr_auto_examples_01_encoding_0010_encodings.py` for
 # more details. An overview of the chosen defaults is available in
 # :ref:`user_guide_tabular_pipeline`.
 
@@ -190,7 +191,8 @@
 # which uses pre-trained language models retrieved from the HuggingFace hub to
 # create meaningful text embeddings.
 # See :ref:`user_guide_encoders_index` for more details on all the categorical encoders
-# provided by skrub, and :ref:`sphx_glr_auto_examples_0010_encodings.py` for a
+# provided by skrub, and
+# :ref:`sphx_glr_auto_examples_01_encoding_0010_encodings.py` for a
 # comparison between the different methods.
 #
 
diff --git a/examples/03_joining/0070_join_aggregation.py b/examples/03_joining/0070_join_aggregation.py
index 27426f2b7..f4b779b5b 100644
--- a/examples/03_joining/0070_join_aggregation.py
+++ b/examples/03_joining/0070_join_aggregation.py
@@ -162,7 +162,7 @@
 #
 # We bring this logic into a |TableVectorizer| to vectorize these columns in a
 # single step.
-# See `this example <https://skrub-data.org/stable/auto_examples/01_encodings.html#specializing-the-tablevectorizer-for-histgradientboosting>`_
+# See :ref:`this example <sphx_glr_auto_examples_01_encoding_0010_encodings.py>`
 # for more details about these encoding choices.
 from sklearn.preprocessing import OrdinalEncoder
 
diff --git a/skrub/_apply_to_cols.py b/skrub/_apply_to_cols.py
index 7168cf70b..64d34e9cf 100644
--- a/skrub/_apply_to_cols.py
+++ b/skrub/_apply_to_cols.py
@@ -216,6 +216,25 @@ class ApplyToCols(TransformerMixin, SkrubBaseEstimator):
     skrub.core.RejectColumn: Column 'A' does not have Date or Datetime dtype.
     Transformer DatetimeEncoder.fit_transform failed on column 'A'. See above for the full traceback.
 
+    It is also possible to wrap a :class:`TableVectorizer` or :class:`Cleaner` in
+    ``ApplyToCols`` to select or exclude columns based on patterns. For example,
+    to apply a :class:`TableVectorizer` to all columns except those ending with "_id",
+    we can do:
+
+    >>> import skrub.selectors as s
+    >>> from skrub import ApplyToCols, TableVectorizer
+
+    >>> df = pd.DataFrame(dict(
+    ...     user_id=["A001", "A002"],
+    ...     age=[25, 30],
+    ...     department=["Engineering", "Sales"],
+    ... ))
+    >>> tv = ApplyToCols(TableVectorizer(), cols=~s.glob("*_id"))
+    >>> tv.fit_transform(df)
+        user_id   age   department_Sales
+    0    A001  25.0               0.0
+    1    A002  30.0               1.0
+
     **Accessing fitted transformers**
 
     Depending on the transformer, the fitted transformers
diff --git a/skrub/_reporting/_table_report.py b/skrub/_reporting/_table_report.py
index db491170f..9269e0e58 100644
--- a/skrub/_reporting/_table_report.py
+++ b/skrub/_reporting/_table_report.py
@@ -100,7 +100,10 @@ class TableReport:
 
     This class summarizes a dataframe or numpy array, providing information such as
     the type and summary statistics (mean, number of missing values, etc.) for each
-    column. Numpy arrays are converted to pandas DataFrame or Series.
+    column. Numpy arrays are converted to pandas DataFrame or Series. The computed
+    statistics can be accessed interactively in a Jupyter notebook or web browser.
+    Alternatively, it can be saved or exported in JSON, Markdown, or HTML format
+    for programmatic access or for inclusion in documents.
 
     Parameters
     ----------
@@ -232,7 +235,8 @@ class TableReport:
     # DataFrame Report...
 
     The report can also be obtained in JSON format with :meth:`json`, which can
-    be useful for programmatic access to the report data.
+    be useful for programmatic access to the report data. The schema of the
+    JSON data is reported in :ref:`table_report_json_schema`.
 
     Note that the resulting JSON includes the plots in SVG format, which can be
     quite verbose: plots can be disabled by setting ``plot_distributions=False``
@@ -244,17 +248,20 @@ class TableReport:
 
 
     Advanced configuration: you can add custom column filters that will appear
-    in the report's dropdown menu.
+    in the report's dropdown menu, allowing you to select a subset of columns to
+    display in the report.
 
     >>> filters = {
-    ...         "display_name": ["a", "b"],
+    ...         "my_filter": ["a", "b"],
     ... }
     >>> report = TableReport(df, column_filters=filters)
 
     With the code above, in addition to the default filters such as "All
-    columns", "Numeric columns", etc., the added "Columns with at least 2
-    unique values" will be available in the report, selecting columns "a" and
-    "b".
+    columns", "Numeric columns", etc., the added "my_filter" will be available
+    in the report, selecting both columns "a" and "b".
+    Filters may be specified as a list of column names, a list of column indices,
+    or one of the :ref:`skrub selectors <user_guide_selectors>` objects.
+
     """
 
     def __init__(
@@ -431,6 +438,13 @@ def html_snippet(self):
     def json(self):
         """Get the report data in JSON format.
 
+        By default, the JSON output includes the plots in SVG format, which can
+        be quite verbose. Plots can be disabled by setting
+        ``plot_distributions=False`` when generating the report.
+
+        The schema of the JSON data is reported in :ref:`table_report_json_schema`.
+
+
         Returns
         -------
         str :
diff --git a/skrub/_table_vectorizer.py b/skrub/_table_vectorizer.py
index 3a096a385..20baefff7 100644
--- a/skrub/_table_vectorizer.py
+++ b/skrub/_table_vectorizer.py
@@ -590,9 +590,11 @@ class TableVectorizer(TransformerMixin, SkrubBaseEstimator):
         specified transformer. This disables any preprocessing usually done by
         the TableVectorizer; the columns are passed to the transformer without
         any modification. A column is not allowed to appear twice in
-        ``specific_transformers``. Using ``specific_transformers`` provides
-        similar functionality to what is offered by scikit-learn's
-        :class:`~sklearn.compose.ColumnTransformer`.
+        ``specific_transformers``.
+        Consider wrapping the ``TableVectorizer`` in  :class:`~skrub.ApplyToCols`
+        to select or exclude specific columns from the processing. Alternatively,
+        the :ref:`skrub Data Ops <user_guide_data_ops_index>` allows for more complex
+        pre-processing.
 
     drop_null_fraction : float or None, default=1.0
         Fraction of null above which the column is dropped. If `drop_null_fraction` is