Skip to content

[ENH] Add UnconditionalDistfitRegressor and DeterministicReductionRegressor#994

Open
arnavk23 wants to merge 27 commits intosktime:mainfrom
arnavk23:feature/baseline-unconditional-densities
Open

[ENH] Add UnconditionalDistfitRegressor and DeterministicReductionRegressor#994
arnavk23 wants to merge 27 commits intosktime:mainfrom
arnavk23:feature/baseline-unconditional-densities

Conversation

@arnavk23
Copy link
Copy Markdown
Contributor

@arnavk23 arnavk23 commented Mar 25, 2026

Reference Issues/PRs

Towards #7

What does this implement/fix? Explain your changes.

This PR adds two unconditional baselines for probabilistic regression to improve benchmarking:

UnconditionalDistfitRegressor

  • Fits a univariate distribution to y using distfit.
  • Ignores X entirely by design (feature-agnostic baseline).
  • Supports stable distribution choices (norm, laplace) and histogram mode.

DeterministicReductionRegressor

  • Wraps a deterministic regressor.
  • Uses deterministic predictions as location (mu) and constant uncertainty from training target variance:
  • Gaussian: sigma = sqrt(var(y_train))
  • Laplace: scale = sqrt(var(y_train) / 2)

Additional updates:

  • Aligns behavior with current distfit compatibility constraints.
  • Fixes probabilistic output contract behavior for quantiles/intervals and estimator checks.
  • Cleans up deprecated KDE path handling (not treated as actively supported in this PR).

Limitations

UnconditionalDistfitRegressor

  • Ignores features completely.
  • Cannot model feature-conditional uncertainty or heteroscedasticity.
  • Produces global uncertainty not adapted per sample.

DeterministicReductionRegressor

  • Uses constant scale across all samples.
  • Does not model input-dependent uncertainty.
  • Both current y target univariate y.

Does your contribution introduce a new dependency? If yes, which one?

distfit is used as a soft dependency.

What should a reviewer concentrate their feedback on?

  • Correctness of Gaussian/Laplace parameterization (mu, sigma, scale)
  • Quantile/interval behavior and distribution output contracts
  • Mean consistency between wrapped deterministic predictions and returned distributions
  • Baseline semantics and documented limitations

Did you add any tests for the change?

Yes. Tests cover:

  • Invalid distribution type handling

  • Univariate vs multioutput behavior

  • Distribution parameter correctness (loc/scale, sigma/scale)

  • Mean consistency with deterministic predictions

  • Estimator-wide contract checks for probabilistic outputs

Any other comments?

KDE-related behavior was cleaned up to reflect upstream instability/deprecation in the distfit/SciPy stack. This PR does not present KDE as an actively supported option.

  • Ensures compatibility with distfit 2.0.1: default distr_type is now 'norm' (was 'best').

PR checklist

For all contributions
  • I've added myself to the list of contributors with any new badges I've earned :-)
    How to: add yourself to the all-contributors file in the skpro root directory (not the CONTRIBUTORS.md). Common badges: code - fixing a bug, or adding code logic. doc - writing or improving documentation or docstrings. bug - reporting or diagnosing a bug (get this plus code if you also fixed the bug in the PR).maintenance - CI, test framework, release.
    See here for full badge reference
  • The PR title starts with either [ENH], [MNT], [DOC], or [BUG]. [BUG] - bugfix, [MNT] - CI, test framework, [ENH] - adding or improving code, [DOC] - writing or improving documentation or docstrings.
For new estimators
  • I've added the estimator to the API reference - in docs/source/api_reference/taskname.rst, follow the pattern.
  • I've added one or more illustrative usage examples to the docstring, in a pydocstyle compliant Examples section.
  • If the estimator relies on a soft dependency, I've set the python_dependencies tag and ensured
    dependency isolation, see the estimator dependencies guide.

- Change default distr_type to 'norm' (was 'best'), matching valid distfit options
- Update test to use distr_type='norm' explicitly
- Mark KDE test as xfail due to upstream scipy/distfit incompatibility
- Fix Laplace attribute checks in tests to use 'scale' (not 'b')
- Ensure all baseline regressor tests pass or are correctly handled
arnavk23 added 25 commits March 25, 2026 06:59
…t, and adding get_params to DeterministicReductionRegressor for better sklearn compatibility.
…parenthesis/ellipsis for multi-line import in docstring; ensure all style and test checks pass.
… for the unconditional distfit regressor. This is in preparation for the upcoming release, and to ensure that the baseline regressors are working correctly.
@arnavk23
Copy link
Copy Markdown
Contributor Author

Updated the UnconditionalDistfitRegressor docstring doctest expectation to match current return contract and stable scalar formatting in unconditional_distfit.py.
Hardened plotting test backend selection to avoid tkinter backend issues by forcing Agg in test_proba_basic.py.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant