Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 16 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,22 @@ All notable changes to AlphaPulse are documented here.

---

## [Unreleased] — WandB XAI & Plot Quality Overhaul

- **Universal feature importance:** `compute_universal_feature_importance` extracts and normalizes importance from any supported model type (XGBoost pred_contribs, LightGBM gain, CatBoost PredictionValuesChange, sklearn `feature_importances_`), averages across all models present, and logs a ranked bar chart to WandB.
- **Era-stratified importance:** `_log_era_stratified_importance` slices validation data by era, computes importance per slice, and logs a `line_series` chart showing each feature's importance trajectory over eras — directly reveals temporal stability.
- **Per-era stability report wired:** `compute_feature_report` (LightGBM proxy) now surfaces in WandB via `_log_feature_report`; logs top features by mean importance, top by era stability, and worst by era stability — each with bar charts.
- **Best-trial diagnostics run:** After HPO, the best config is retrained on an 80/20 era split and all expensive diagnostics (`log_era_importance=True`, top-50 importance artifact) are logged to a dedicated `best-trial-diagnostics` WandB run.
- **Prediction histogram fixed:** `_log_prediction_diagnostics` now uses `np.histogram(bins=50)` (50 rows) instead of logging every prediction row (50k+ rows).
- **Per-era line charts fixed:** `era_index` (0, 1, 2…) used as x-axis — fixes alphabetical string sort that scrambled chronological order.
- **Drawdown curve added:** Per-era drawdown from peak cumulative correlation logged alongside the cumulative correlation line chart.
- **Correlation distribution histogram:** Distribution of per-era Spearman correlations logged as a bar chart — directly answers "how many negative eras does this model have?"
- **Missing bar charts added:** Feature exposure top-15, ensemble model-pair correlation (A→B format), worst stability by era — all now have companion bar charts.
- **HPO summary table expanded:** 18 → 30 columns; adds `model_1/2/3_type` (split, for WandB parallel coordinates), XGBoost/LightGBM hyperparams, feature selection, noise injection, augmentation flags.
- **Convergence chart:** `log_hpo_convergence` logs all trial scores and running-best `corr_sharpe` in a single WandB run after the HPO search completes, rendering as a proper convergence curve.
- **String metric bug fixed:** `feature_importance_model_type` moved from `wandb.log()` (coerced to NaN) to `wandb.run.summary`.
- **Duplicate metric removed:** `metric/corr_sharpe` deduplicated in `log_hpo_trial_metrics`.

## [0.5.0] — Production Hardening

- **HPO fault tolerance:** Each local trial runs in an isolated subprocess; a crash marks that trial failed and the sweep continues. A SQLite-backed `TrialDB` (`src/alphapulse/hpo/trial_db.py`) persists trial state across crashes. `--resume` flag skips already-completed trial numbers. `--trial-timeout` caps each subprocess.
Expand Down
70 changes: 47 additions & 23 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,22 @@

AlphaPulse is a config-driven framework for building, training, and deploying ML pipelines for the [Numerai](https://numer.ai) stock-market prediction tournament. It covers the full workflow: dataset download, experiment definition, backtesting, hyperparameter optimization (HPO), and automated weekly submission.

## Architecture

![AlphaPulse Architecture](docs/assets/architecture.drawio.png)

The framework is organized into five layers:

| Layer | Components | Purpose |
|---|---|---|
| **Data** | `NumeraiDataLoader`, parquet files, `features.json` | Downloads and loads Numerai dataset splits (train/validation/live) |
| **Configuration** | `ExperimentV1` YAML schema, HPO search space, `TrialDB`, AutoResearch agent | Defines what to train — via static YAML, automated HPO, or Claude-agent-driven research |
| **Core Pipeline** | Preprocessors, Models, `Pipeline` / `MultiHeadPipeline`, Ensemble, `FeatureNeutralizer` | Fits and combines models; handles feature routing, ensembling, and prediction neutralization |
| **Evaluation** | `Backtester`, `PurgedEraCV`, SHAP report, W&B diagnostics | Computes era-aware metrics (CORR, Sharpe, MMC) and XAI reports |
| **Export & Submission** | `predict.pkl`, live inference, submission validation, Numerai upload | Produces tournament-ready predictions and submits them |

> The diagram is editable — open `docs/assets/architecture.drawio` in [draw.io](https://app.diagrams.net) to modify it.

-----

## Table of Contents
Expand Down Expand Up @@ -347,7 +363,7 @@ evaluation:
### Advanced Features

* **Feature Groups:** Define `features.groups` as a mapping of `group_name -> [columns]`. You can then assign specific models to specific groups using `models[].input_group: group_name`.
* **Available Preprocessors:** `StandardScaler`, `RobustScaler`, `PCA`, `TruncatedSVD`, `GaussianNoise`, `VarianceSelector`, `LGBMImportanceSelector`, `Packboost`, and `GroupedPreprocessor`.
* **Available Preprocessors:** `StandardScaler`, `RobustScaler`, `PCA`, `TruncatedSVD`, `AutoencoderPreprocessor`, `CompressionPreprocessor`, `GaussianNoise`, `VarianceSelector`, `LGBMImportanceSelector`, `EraStableFeatureSelector`, `Packboost`, and `GroupedPreprocessor`.
* **Available Models:**
- **Gradient Boosting:** `XGBoost`, `LightGBM`, `CatBoost`, `Packboost`
- **Tree Ensembles:** `RandomForest`, `ExtraTrees`
Expand Down Expand Up @@ -482,31 +498,35 @@ make eda-lint
├── data/ # Downloaded Numerai parquet files
├── experiments/ # YAML configuration files
├── scripts/ # Executable workflow scripts
│ ├── download_dataset.py
│ ├── run_experiment.py
│ ├── hpo_pipeline.py
│ ├── run_test_pipeline.py
│ ├── export_numerai_pickle.py
│ ├── export_from_yaml.py
│ ├── live_inference.py
│ ├── submit_predictions.py
│ ├── make_feature_groups.py
│ └── autoresearch.py
│ ├── download_dataset.py # Download Numerai dataset
│ ├── run_experiment.py # Run a YAML-defined experiment (+ W&B logging)
│ ├── hpo_pipeline.py # Automated hyperparameter search (Ray Tune)
│ ├── run_test_pipeline.py # Lightweight smoke test
│ ├── export_numerai_pickle.py # Export predict.pkl from HPO result
│ ├── export_from_yaml.py # Export predict.pkl from YAML experiment
│ ├── live_inference.py # Run trained model on live tournament data
│ ├── submit_predictions.py # Upload predictions to Numerai
│ ├── make_feature_groups.py # Generate feature group definitions
│ ├── gpu_smoke_test.py # Verify GPU availability for deep models
│ ├── autoresearch.py # Claude-agent-driven research loop
│ └── wandb_sweep_config.yaml # W&B sweep configuration
├── eda/ # Standalone Streamlit EDA dashboard
│ ├── app.py # Main entry point (streamlit run eda/app.py)
│ ├── pages/ # Multi-page analysis modules
│ └── utils/ # Config & data loading (uses NumeraiDataLoader)
│ ├── pages/ # Multi-page analysis modules (8 pages)
│ └── utils/ # Config, data loading, translations (EN/PL)
├── docs/assets/ # Diagrams and documentation assets
│ └── architecture.drawio.png # Architecture diagram (editable in draw.io)
├── src/alphapulse/ # Core framework source code
│ ├── autoresearch/ # Agent-driven research loop
│ ├── evaluation/ # Backtesting, metrics, diagnostics, export validation
│ ├── experiments/ # YAML schema, runner, data loading
│ ├── hpo/ # HPO objective, search space, builder, registry
│ ├── autoresearch/ # Agent-driven research loop (loop, agent, mutations, state)
│ ├── evaluation/ # Backtesting, metrics, SHAP report, W&B diagnostics, submission validation
│ ├── experiments/ # YAML schema (ExperimentV1), runner
│ ├── hpo/ # HPO objective, search space, builder, registry, TrialDB (SQLite)
│ ├── logging_/ # Leaderboard and W&B helpers
│ ├── models/ # All model implementations + factory
│ ├── pipeline/ # Pipeline, ensemble, neutralizer, stacker
│ ├── preprocessors/ # All preprocessor implementations + factory
│ ├── utils/ # Seed utility (set_global_seed)
│ └── validation/ # Purged era cross-validation
│ ├── pipeline/ # Pipeline, MultiHeadPipeline, MultiTargetPipeline, ensemble, neutralizer, stacker
│ ├── preprocessors/ # All preprocessor implementations + factory (incl. autoencoder, compression, era-stable selector)
│ ├── utils/ # Global seed utility
│ └── validation/ # PurgedEraCV
└── tests/ # Unit tests
```

Expand All @@ -529,13 +549,17 @@ Commit messages: prefer conventional commits (e.g. `feat: ...`, `fix: ...`, `doc

See [CHANGELOG.md](CHANGELOG.md) for completed releases.

**Completed — v0.5.0 (Production Hardening):**
**Completed — v0.5.0 (Production Hardening + XAI):**
- **HPO fault tolerance:** Each local trial runs in an isolated subprocess; crashes mark the trial failed and the sweep continues. A SQLite-backed `TrialDB` persists trial state. `--resume` skips already-completed trials.
- **Provenance artifact:** On every export, a hermetically sealed bundle is written: resolved config, `uv export` dependency snapshot, and git commit hash.
- **Canonical artifact naming:** Exported models follow `<TIMESTAMP>_<ARCH>_<TARGET>_<CONFIG_HASH>.pkl` with a `latest_predict.pkl` symlink.
- **Masked loss for auxiliary targets:** `MultiTargetPipeline` drops NaN rows per-target; targets with fewer than 10 valid rows are skipped entirely.
- **Feature neutralization in eval loop:** `Backtester` and `EraSplitEvaluator` accept an optional `FeatureNeutralizer`; predictions are neutralized before metric computation.
- **Feature neutralization in eval loop:** `Backtester` accepts an optional `FeatureNeutralizer`; predictions are neutralized before metric computation.
- **W&B experiment runner integration:** `scripts/run_experiment.py` logs configs, per-era metrics, and artifact paths to W&B via `--wandb-project`.
- **XAI / SHAP reporting:** `shap_report.py` computes per-era feature importance; `wandb_diagnostics.py` pushes rich HPO and XAI plots to W&B.
- **GPU HPO + foundation models:** `TabPFN3`, `TabICL`, and `TabularDL` (ft_transformer / mlp) with GPU-accelerated HPO via Ray Tune.
- **Universal feature importance + era stability:** `EraStableFeatureSelector` ranks features by blended mean importance / cross-era stability; `feature_report.py` surfaces per-era diagnostics.
- **Autoencoder + compression preprocessors:** `AutoencoderPreprocessor` and `CompressionPreprocessor` for learned low-dimensional representations.

-----

Expand Down
Loading
Loading