DSL Kernels as First-Class CTable Columns#659
Merged
Merged
Conversation
@blosc2.dsl_kernel-decorated functions can now back both virtual computed
columns (add_computed_column) and stored generated columns
(add_generated_column), survive save/open round-trips, and be referenced
inside where() predicates.
API:
- add_computed_column(name, kernel, inputs=[...], dtype=None) binds one
stored scalar column per kernel parameter; the callable form returning a
blosc2.lazyudf(...) is also accepted.
- add_generated_column(..., values=kernel, inputs=[...]) adds a stored DSL
column (new transformer_kind="dsl") with append/extend auto-fill,
refresh_generated_column, and indexing.
- dtype is inferred by NumPy type promotion of the input column dtypes when
omitted; pass dtype explicitly for type-changing kernels (comparisons/casts).
Internals:
- Factor the safe-exec DSL reconstruction into dsl_kernel.kernel_from_source()
and route b2objects.decode_structured_lazyudf through it.
- Persist computed entries as kind:"dsl" + dsl_source (no expression); rebuild
the kernel and LazyUDF on open.
- DSL computed columns materialize via LazyUDF.compute() on access: the
miniexpr DSL path only supports full-array getitem, so reads/where() cannot
slice lazily. Materializing also lets a DSL column join where() as a plain
NDArray operand (chunked staged co-evaluation, not single-kernel fusion).
- Guard the LazyExpr-only sites (compact, info display, materialize, per-row
access, sort, index reuse) and the three materialized eval paths
(row/batch autofill, refresh). Pad length-1 DSL batches to 2 and slice back,
since miniexpr rejects shape-(1,) inputs.
Add tests/ctable/test_ctable_dsl_columns.py (20 tests) covering values,
dtype inference, partial slicing, where() (incl. multi-chunk streaming),
persistence, compact, materialize, and generated-column autofill/refresh/index.
…into dsl-kernels-as-cols
…d= params This lets users switch JIT on/off and change backends entirely from the command line without touching code, which is the natural experimentation workflow.
Contributor
There was a problem hiding this comment.
Pull request overview
This PR makes DSL kernels (functions decorated with @blosc2.dsl_kernel) first-class inputs for CTable derived columns, enabling both virtual computed columns and stored generated/materialized columns to be backed by DSL kernels and to survive save/open round-trips.
Changes:
- Extend
CTable.add_computed_column(),add_generated_column(),materialize_computed_column(), andwhere()to accept DSL kernels /LazyUDFpredicates, with persistence ofdsl_source(+ optionaljit_backend). - Add
dsl_kernel.kernel_from_source()and refactor structuredLazyUDFdecoding to reuse it. - Improve DSL ergonomics:
lazyudf()can inferdtypefor DSL kernels;convert_inputs()unwrapsCTable.Column;BLOSC_ME_JITnow overrides bothjitandjit_backend.
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/ndarray/test_ndarray.py | Adds coverage for blosc2.array() copy semantics vs asarray(). |
| tests/ctable/test_ctable_dsl_columns.py | New end-to-end tests for DSL-backed computed/generated columns and persistence. |
| tests/ctable/test_ctable_computed_cols.py | Expands existing computed-column suite with DSL/LazyUDF behaviors and jit_backend persistence tests. |
| src/blosc2/lazyexpr.py | Adds Column unwrapping in convert_inputs, optional dtype inference for DSL lazyudf, and env var precedence for JIT. |
| src/blosc2/dsl_kernel.py | Introduces kernel_from_source() utility for reconstructing persisted DSL kernels. |
| src/blosc2/ctable.py | Implements DSL transformer normalization, persistence, evaluation paths, and where() support for LazyUDF. |
| src/blosc2/b2objects.py | Refactors DSL LazyUDF deserialization to use kernel_from_source(). |
| examples/ctable/udf-computed-col.py | New example demonstrating DSL computed/generated columns and persistence. |
| doc/reference/ctable.rst | Updates CTable docs to distinguish computed vs generated columns and documents DSL support. |
| doc/getting_started/overview.rst | Adds introductory CTable overview section and JIT performance tip. |
| bench/ctable/query-backends.py | Adds a benchmark for CTable.where() across interpreted/tcc/cc backends. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- lazyexpr.py: avoid double _raw_col property lookup (hasattr + access)
- ctable.py: only materialise referenced DSL computed columns in
_where_expression_operands, not all of them eagerly
- ctable.py: preserve jit_backend in DSL computed column metadata
during _empty_copy (was silently dropped)
- dsl_kernel.py: kernel_from_source() now validates source contains
only a function definition (rejects side-effectful top-level nodes)
- dsl_kernel.py: raise ValueError with clear message when source does
not define the requested function name (was a cryptic KeyError)
- ctable.rst: add security warning about opening .b2d files from
untrusted sources when DSL columns are present
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
DSL kernels (functions decorated with @blosc2.jit) can now be registered as computed columns in CTable, alongside the existing expression-string computed columns.
Key changes:
CTable — computed column extensions (ctable.py)
dsl_kernel.py
lazyexpr.py
Tests & docs