Skip to content

Release v0.6.0: MMC scoring fix and chart-based W&B diagnostics#37

Merged
Palamabron merged 10 commits into
mainfrom
feat/mmc-wandb-diagnostics
Jun 21, 2026
Merged

Release v0.6.0: MMC scoring fix and chart-based W&B diagnostics#37
Palamabron merged 10 commits into
mainfrom
feat/mmc-wandb-diagnostics

Conversation

@Palamabron

@Palamabron Palamabron commented Jun 16, 2026

Copy link
Copy Markdown
Owner

Summary

  • MMC scoring fixed: load_mmc_validation_frame aligns validation.parquet with meta_model.parquet so W&B metric/mmc is no longer null; HPO merges mmc, mmc_sharpe, and payout_score after train-era holdout evaluation.
  • Bayesian HPO (Optuna TPE): Local --local search now uses Optuna TPESampler (multivariate) instead of random sampling; study is persisted in optuna.db alongside trials.db; --sampler random flag available to fall back to random search.
  • payout_score as default objective: HPO now optimises 0.75 * corr_sharpe + 2.25 * mmc_sharpe by default instead of corr_sharpe.
  • Forced feature routing with RAM guard: Every HPO trial explores a sampled subset of feature groups (size + stat groups); MAX_ROUTED_FEATURES = 1000 hard cap prevents OOM; active_groups, active_groups_count, and routed_feature_count are logged per trial in W&B and the summary table.
  • W&B diagnostics as charts: Raw diagnostics/ tables replaced with matplotlib bar charts, correlation heatmaps, and line charts; NaN metrics are skipped in trial logging.
  • Live W&B training logs: wandb_logging.py bridges loguru to the W&B Logs panel and logs per-round XGBoost metrics during training.
  • MultiTarget diagnostics: pipeline/model_access.py provides iter_trained_models, model_prediction_map, and multitarget_blend_weights for SHAP and ensemble diagnostics.
  • Backward-compat cleanup: Removed metric aliases, legacy meta-model column fallbacks, model type name aliases, and duplicate W&B HPO param keys.

Test plan

  • make fmt — passes
  • make types — passes (283 source files, no errors)
  • make test — 283 passed, 3 skipped
  • Smoke HPO run confirms Bayesian sampler starts (Optuna sampler: tpe) and feature groups appear in log (groups=..., features=...)

Palamabron and others added 2 commits June 16, 2026 11:05
Score MMC on the validation split where meta_model.parquet has ids, merge
those metrics into HPO trials, and replace raw diagnostics tables with
matplotlib bar/heatmap charts. Add MultiTarget model access helpers,
live W&B training logs, and regression tests for MMC and W&B logging.

Co-authored-by: Cursor <cursoragent@cursor.com>
Bump version, document validation MMC scoring and matplotlib diagnostics
in README/CHANGELOG, and stabilize matplotlib state between tests.

Co-authored-by: Cursor <cursoragent@cursor.com>
@Palamabron Palamabron changed the title Fix MMC scoring and improve W&B diagnostics Release v0.6.0: MMC scoring fix and chart-based W&B diagnostics Jun 16, 2026
Palamabron and others added 8 commits June 16, 2026 12:05
Drop metric aliases, legacy meta-model column fallbacks, model type
aliases, and W&B param fallbacks. Delete flaky or duplicate tests and
consolidate W&B diagnostics coverage into multitarget tests.

Co-authored-by: Cursor <cursoragent@cursor.com>
Use Optuna-backed local HPO with persisted study state, keep payout as default objective, and improve routing diagnostics with feature-group metadata in WandB logs.

Co-authored-by: Cursor <cursoragent@cursor.com>
Multi-target/multi-head pipelines stripped era from X without passing
era_train to nested models, breaking Packboost in multi_blend trials.

Co-authored-by: Cursor <cursoragent@cursor.com>
HPO already optimizes payout_score but the console/JSON leaderboard still
sorted and displayed corr_sharpe; align ranking, columns, and EDA view.

Co-authored-by: Cursor <cursoragent@cursor.com>
Add meta-neutralization, bounded ensemble optimization, MMC validation loading,
payout_score objective with fast holdout, and supporting tests for the MMC plan.

Co-authored-by: Cursor <cursoragent@cursor.com>
…ls to 25

Reduce dead Optuna dimensions by suggesting model-specific and routing params only when active, and extend TPE startup exploration before Bayesian optimization.

Co-authored-by: Cursor <cursoragent@cursor.com>
Penalize validation-only overfit in leaderboard/best_config selection, expose
val vs holdout metrics in trial logs, allow --max-models 3 in fast mode, and
warn when resume switches eval mode.

Co-authored-by: Cursor <cursoragent@cursor.com>
Save the full post-worker flat config to TrialDB (minus runtime keys), wire
meta neutralization through to_numerai_predict via numerai_meta_model, and
apply single-model lane preprocessors during feature routing.

Co-authored-by: Cursor <cursoragent@cursor.com>
@Palamabron Palamabron merged commit 62d5f3b into main Jun 21, 2026
0 of 2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant