delftdata · kPsarakis · Apr 8, 2026 · Apr 8, 2026 · Apr 8, 2026 · Apr 12, 2026
diff --git a/.github/workflows/bench.yml b/.github/workflows/bench.yml
@@ -0,0 +1,42 @@
+name: bench
+
+on:
+  push:
+    branches: [ master ]
+  pull_request:
+    branches: [ master ]
+
+jobs:
+  bench:
+    runs-on: ubuntu-latest
+    timeout-minutes: 10
+
+    steps:
+      - uses: actions/checkout@v6.0.2
+
+      - name: Set up Python
+        uses: actions/setup-python@v6.2.0
+        with:
+          python-version: '3.14'
+          cache: 'pip'
+
+      - name: Install package
+        run: |
+          python -m pip install --upgrade pip
+          pip install .
+
+      - name: Run accuracy regression check
+        run: |
+          python experiments/bench.py \
+            --quick \
+            --output bench_results.json \
+            --baseline experiments/bench_baseline.json \
+            --accuracy-only
+
+      - name: Upload bench results
+        if: always()
+        uses: actions/upload-artifact@v4
+        with:
+          name: bench-results
+          path: bench_results.json
+          if-no-files-found: warn
diff --git a/.github/workflows/build.yml b/.github/workflows/build.yml
@@ -28,6 +28,7 @@ jobs:
         run: |
           python -m pip install --upgrade pip
           pip install .
+          pip install ".[polars]" || true
           pip install pytest==9.0.2 coverage==7.13.5 ruff==0.15.9
 
       - name: Ruff lint (must pass)

diff --git a/.github/workflows/ci-build-test-publish.yml b/.github/workflows/ci-build-test-publish.yml
@@ -60,6 +60,7 @@ jobs:
         run: |
           python -m pip install --upgrade pip
           pip install .
+          pip install ".[polars]" || true
 
       - name: Install test deps
         run: pip install pytest==9.0.2
@@ -104,6 +105,7 @@ jobs:
           else
             pip install dist/*.tar.gz
           fi
+          pip install polars || true
 
       - name: Install test deps
         run: pip install pytest==9.0.2

diff --git a/.gitignore b/.gitignore
@@ -34,8 +34,6 @@ htmlcov/
 .env
 .env.local
 
-experiments/
-
 # Zensical build output & cache
 site/
 .zensical/
diff --git a/README.md b/README.md
@@ -36,7 +36,7 @@
 
 ---
 
-A Python package for capturing potential relationships among columns of different tabular datasets, given as pandas DataFrames.
+A Python package for capturing potential relationships among columns of different tabular datasets, given as pandas or Polars DataFrames.
 Valentine is based on the paper [**Valentine: Evaluating Matching Techniques for Dataset Discovery**](https://ieeexplore.ieee.org/abstract/document/9458921).
 
 📚 **Full documentation:** <https://delftdata.github.io/valentine/> — getting started, matcher guide, API reference, and migration notes.
@@ -57,9 +57,15 @@ To install Valentine simply run:
 pip install valentine
 ```
 
+To enable **Polars** support, install the optional extra:
+
+```shell
+pip install valentine[polars]
+```
+
 
 ## Usage
-Valentine can be used to find matches among columns of a given pair of pandas DataFrames. 
+Valentine can be used to find matches among columns of a given pair of pandas or Polars DataFrames. You can even mix pandas and Polars frames in the same call — Valentine auto-detects the frame type.
 
 ### Matching methods
 In order to do so, the user can choose one of the following matching methods:
@@ -103,10 +109,10 @@ In order to do so, the user can choose one of the following matching methods:
 
 ### Matching DataFrames
 
-Pass two or more DataFrames as a list (or any iterable) along with a matcher. Valentine will match columns across all unique pairs:
+Pass two or more DataFrames as a list (or any iterable) along with a matcher. Valentine will match columns across all unique pairs. Pandas and Polars frames can be freely mixed:
 
 ```python
-# Match a pair of DataFrames
+# Match a pair of DataFrames (pandas, Polars, or mixed)
 matches = valentine_match([df1, df2], matcher)
 
 # Match multiple DataFrames (computes all N×(N-1)/2 pairs)
@@ -171,7 +177,7 @@ metrics_predefined_set = matches.get_metrics(ground_truth, metrics=METRICS_PRECI
 
 
 ### Example
-The following block of code shows: 1) how to run a matcher from Valentine on two DataFrames storing information about job candidates, and then 2) how to assess its effectiveness based on a given ground truth (a more extensive example is shown in [`valentine_example.py`](https://github.com/delftdata/valentine/blob/master/examples/valentine_example.py)):
+The following block of code shows: 1) how to run a matcher from Valentine on two DataFrames storing information about job candidates, and then 2) how to assess its effectiveness based on a given ground truth. More examples are available in the [`examples/`](https://github.com/delftdata/valentine/tree/master/examples) directory, including a [pandas example](https://github.com/delftdata/valentine/blob/master/examples/valentine_example_pandas.py), a [Polars example](https://github.com/delftdata/valentine/blob/master/examples/valentine_example_polars.py), and a [mixed pandas+Polars example](https://github.com/delftdata/valentine/blob/master/examples/valentine_example_mixed.py).
 
 ```python
 import pandas as pd

diff --git a/docs/api.md b/docs/api.md
@@ -36,20 +36,21 @@ from valentine import (
 
 ```python
 valentine_match(
-    dfs: Iterable[pd.DataFrame],
+    dfs: Iterable[pd.DataFrame | pl.DataFrame],
     matcher: BaseMatcher,
     df_names: list[str] | None = None,
     instance_sample_size: int | None = 1000,
 ) -> MatcherResults
 ```
 
-Match columns across every unique pair of DataFrames.
+Match columns across every unique pair of DataFrames. Accepts both pandas
+and Polars DataFrames, which can be freely mixed within the same call.
 
 **Parameters**
 
 | Name                   | Type                         | Default | Description                                                                                                                                                                                                     |
 |------------------------|------------------------------|---------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
-| `dfs`                  | `Iterable[pd.DataFrame]`     | —       | Two or more DataFrames to match against each other. Any iterable works (list, tuple, generator).                                                                                                               |
+| `dfs`                  | `Iterable[pd.DataFrame \| pl.DataFrame]` | —       | Two or more DataFrames to match against each other. Any iterable works (list, tuple, generator). Pandas and Polars frames may be mixed freely.                                                             |
 | `matcher`              | `BaseMatcher`                | —       | Matcher instance (e.g. `Coma()`, `Cupid()`).                                                                                                                                                                     |
 | `df_names`             | `list[str] \| None`          | `None`  | Optional names for each DataFrame. When `None`, defaults to `"aaa"`, `"bbb"`, `"ccc"`, … (chosen for minimum string similarity so defaults don't influence schema-based matchers). Limited to 26 unnamed tables. |
 | `instance_sample_size` | `int \| None`                | `1000`  | Cap on the number of non-empty rows sampled per column for instance-based matchers (Coma with `use_instances=True`, `DistributionBased`, `JaccardDistanceMatcher`). Pass `None` to use every row. Pass `0` to skip instance data entirely — schema-only matchers are unaffected, but instance-based matchers will see empty columns. |
@@ -666,10 +667,11 @@ are gold matches. One-to-one filtering is **off** by default here.
 ## Data sources (`valentine.data_sources`)
 
 Valentine wraps each DataFrame in a [`DataframeTable`](#dataframetable)
-before handing it to a matcher. Most users never touch this layer —
-[`valentine_match`](#valentine_match) builds the tables for you — but
-the classes are public so that custom matchers and custom data sources
-can be written against the abstractions.
+(pandas) or [`PolarsTable`](#polarstable) (Polars) before handing it to
+a matcher. Most users never touch this layer —
+[`valentine_match`](#valentine_match) auto-detects the frame type and
+builds the tables for you — but the classes are public so that custom
+matchers and custom data sources can be written against the abstractions.
 
 ```python
 from valentine.data_sources import (
@@ -678,6 +680,9 @@ from valentine.data_sources import (
     DataframeTable,
     DataframeColumn,
 )
+
+# With the polars extra installed:
+from valentine.data_sources import PolarsTable, PolarsColumn
 ```
 
 ### `BaseTable`
@@ -756,6 +761,33 @@ Constructed internally by [`DataframeTable`](#dataframetable); exposes
 the column name, detected data type, unique identifier, and sampled
 instance values via the standard [`BaseColumn`](#basecolumn) interface.
 
+### `PolarsTable`
+
+```python
+PolarsTable(
+    df: pl.DataFrame,
+    name: str,
+    instance_sample_size: int | None = 1000,
+)
+```
+
+[`BaseTable`](#basetable) adapter for a Polars DataFrame. Requires the
+`polars` extra (`pip install valentine[polars]`). Has the same interface
+as [`DataframeTable`](#dataframetable).
+
+| Parameter              | Type            | Default | Description                                                                                     |
+|------------------------|-----------------|---------|-------------------------------------------------------------------------------------------------|
+| `df`                   | `pl.DataFrame`  | —       | The Polars DataFrame to wrap.                                                                   |
+| `name`                 | `str`           | —       | Name of the table.                                                                              |
+| `instance_sample_size` | `int \| None`   | `1000`  | Cap on the number of non-empty rows sampled per column. Pass `None` to use the full DataFrame; pass `0` to expose no instance data at all. |
+
+### `PolarsColumn`
+
+[`BaseColumn`](#basecolumn) adapter for a single Polars `Series`.
+Constructed internally by [`PolarsTable`](#polarstable); exposes
+the column name, detected data type, unique identifier, and sampled
+instance values via the standard [`BaseColumn`](#basecolumn) interface.
+
 ### Writing a custom data source
 
 If your data doesn't live in a pandas DataFrame, implement

diff --git a/docs/example.md b/docs/example.md
@@ -13,11 +13,16 @@ ground truth. Every API touched here is documented in the
 !!! note
 
     The same script lives in the repo at
-    [`examples/valentine_example.py`][source].
+    [`examples/valentine_example_pandas.py`][source]. Additional examples:
 
-  [source]: https://github.com/delftdata/valentine/blob/master/examples/valentine_example.py
+    - [`valentine_example_polars.py`][polars] — Polars DataFrames
+    - [`valentine_example_mixed.py`][mixed] — mixing pandas and Polars in the same call
 
-```python title="valentine_example.py"
+  [source]: https://github.com/delftdata/valentine/blob/master/examples/valentine_example_pandas.py
+  [polars]: https://github.com/delftdata/valentine/blob/master/examples/valentine_example_polars.py
+  [mixed]: https://github.com/delftdata/valentine/blob/master/examples/valentine_example_mixed.py
+
+```python title="valentine_example_pandas.py"
 import pprint
 from pathlib import Path
 

diff --git a/docs/faq.md b/docs/faq.md
@@ -7,6 +7,26 @@ icon: lucide/help-circle
 Common questions and gotchas. If yours isn't here, open an issue on
 [GitHub](https://github.com/delftdata/valentine/issues).
 
+## Can I use Polars instead of pandas?
+
+Yes. Install the optional extra with `pip install valentine[polars]` and pass
+Polars DataFrames directly to [`valentine_match`](api.md#valentine_match).
+You can even mix pandas and Polars frames in the same call — Valentine
+auto-detects the frame type and wraps each one in the appropriate data
+source (`DataframeTable` or `PolarsTable`).
+
+```python
+import pandas as pd
+import polars as pl
+from valentine import valentine_match
+from valentine.algorithms import Coma
+
+df_pandas = pd.read_csv("source.csv")
+df_polars = pl.read_csv("target.csv")
+
+matches = valentine_match([df_pandas, df_polars], Coma())
+```
+
 ## Which matcher should I use?
 
 Start with [`Coma`](api.md#coma). It is the strongest default, handles

diff --git a/docs/getting-started.md b/docs/getting-started.md
@@ -29,6 +29,28 @@ command. It requires **Python 3.10 or newer** (and is tested up to 3.14).
     poetry add valentine
     ```
 
+### Polars support
+
+To use Polars DataFrames, install the optional `polars` extra:
+
+=== "pip"
+
+    ```shell
+    pip install valentine[polars]
+    ```
+
+=== "uv"
+
+    ```shell
+    uv add valentine[polars]
+    ```
+
+=== "poetry"
+
+    ```shell
+    poetry add valentine -E polars
+    ```
+
 For local development, clone the repo and install in editable mode:
 
 ```shell
@@ -41,24 +63,61 @@ pip install -e ".[dev]"
 
 The single entry point for matching is
 [`valentine_match`](api.md#valentine_match). It takes an iterable of
-DataFrames and a matcher instance, and returns a
+DataFrames (pandas or Polars) and a matcher instance, and returns a
 [`MatcherResults`](api.md#matcherresults) mapping — see the
 [Matcher results](results.md) guide for everything you can do with it.
 
-```python
-import pandas as pd
-from valentine import valentine_match
-from valentine.algorithms import Coma
+=== "pandas"
 
-df1 = pd.read_csv("source_candidates.csv")
-df2 = pd.read_csv("target_candidates.csv")
+    ```python
+    import pandas as pd
+    from valentine import valentine_match
+    from valentine.algorithms import Coma
 
-matcher = Coma(use_instances=True)
-matches = valentine_match([df1, df2], matcher)
+    df1 = pd.read_csv("source_candidates.csv")
+    df2 = pd.read_csv("target_candidates.csv")
 
-for pair, score in matches.items():
-    print(f"{pair.source_column} <-> {pair.target_column}: {score:.3f}")
-```
+    matcher = Coma(use_instances=True)
+    matches = valentine_match([df1, df2], matcher)
+
+    for pair, score in matches.items():
+        print(f"{pair.source_column} <-> {pair.target_column}: {score:.3f}")
+    ```
+
+=== "Polars"
+
+    ```python
+    import polars as pl
+    from valentine import valentine_match
+    from valentine.algorithms import Coma
+
+    df1 = pl.read_csv("source_candidates.csv")
+    df2 = pl.read_csv("target_candidates.csv")
+
+    matcher = Coma(use_instances=True)
+    matches = valentine_match([df1, df2], matcher)
+
+    for pair, score in matches.items():
+        print(f"{pair.source_column} <-> {pair.target_column}: {score:.3f}")
+    ```
+
+=== "Mixed (pandas + Polars)"
+
+    ```python
+    import pandas as pd
+    import polars as pl
+    from valentine import valentine_match
+    from valentine.algorithms import Coma
+
+    df_pandas = pd.read_csv("source_candidates.csv")
+    df_polars = pl.read_csv("target_candidates.csv")
+
+    matcher = Coma(use_instances=True)
+    matches = valentine_match([df_pandas, df_polars], matcher)
+
+    for pair, score in matches.items():
+        print(f"{pair.source_column} <-> {pair.target_column}: {score:.3f}")
+    ```
 
 !!! note "Table names"
 
@@ -70,8 +129,8 @@ for pair, score in matches.items():
 
 ## Matching many DataFrames
 
-Pass any iterable of DataFrames — list, tuple, generator — and Valentine
-computes all unique pairs:
+Pass any iterable of DataFrames (pandas, Polars, or mixed) — list, tuple,
+generator — and Valentine computes all unique pairs:
 
 ```python
 matches = valentine_match(

diff --git a/docs/index.md b/docs/index.md
@@ -35,15 +35,17 @@ hide:
 </div>
 
 Valentine is a Python package for capturing potential relationships among
-columns of different tabular datasets, given as pandas DataFrames. It
-implements several schema- and instance-based matching algorithms behind a
+columns of different tabular datasets, given as pandas or Polars DataFrames.
+It implements several schema- and instance-based matching algorithms behind a
 single, uniform API, and ships with evaluation metrics so you can measure
-match quality against a ground truth.
+match quality against a ground truth. Pandas and Polars frames can be freely
+mixed in the same call.
 
 ## Installation
 
 ```shell
-pip install valentine
+pip install valentine            # pandas only
+pip install valentine[polars]    # pandas + Polars support
 ```
 
 Requires Python **>=3.10, <3.15**.