AlphaPulse is a config-driven framework for building, training, and deploying ML pipelines for the Numerai stock-market prediction tournament. It covers the full workflow: dataset download, experiment definition, backtesting, hyperparameter optimization (HPO), and automated weekly submission.
The framework is organized into five layers:
| Layer | Components | Purpose |
|---|---|---|
| Data | NumeraiDataLoader, parquet files, features.json |
Downloads and loads Numerai dataset splits (train/validation/live) |
| Configuration | ExperimentV1 YAML schema, HPO search space, TrialDB, AutoResearch agent |
Defines what to train — via static YAML, automated HPO, or Claude-agent-driven research |
| Core Pipeline | Preprocessors, Models, Pipeline / MultiHeadPipeline, Ensemble, FeatureNeutralizer |
Fits and combines models; handles feature routing, ensembling, and prediction neutralization |
| Evaluation | Backtester, PurgedEraCV, SHAP report, W&B diagnostics (charts) |
Era-aware metrics (CORR, Sharpe, MMC on validation split) and matplotlib XAI plots |
| Export & Submission | predict.pkl, live inference, submission validation, Numerai upload |
Produces tournament-ready predictions and submits them |
The diagram is editable — open
docs/assets/architecture.drawioin draw.io to modify it.
- Installation & Setup
- Numerai Competition Pipeline
- Configuring Experiments (Experiment v1 YAML)
- Directory Structure
- Contributing
- Roadmap
- License
Numerai is a crowd-sourced hedge fund and the world's largest stock-market ML tournament. Data scientists submit predictions that Numerai combines into a meta-model for real hedge fund trading.
Key concepts:
| Term | Meaning |
|---|---|
| Era | One week of data; rows within an era are correlated stocks |
| Target | 20-day forward stock-specific alpha, neutral to market/sector |
| CORR | Spearman correlation of your predictions to the target per era |
| MMC | Meta Model Contribution — how much your model improves beyond others |
| Sharpe | Mean per-era correlation ÷ its standard deviation |
| NMR | Numerai's token; stake on your model to earn rewards or get burned |
Requirements: Python 3.12+, Git, uv.
# Core dependencies (required)
uv sync --extra dev
# Optional: Add specific feature sets
uv sync --extra hpo # HPO with Ray Tune
uv sync --extra deep # Deep learning models (pytorch_tabular)
uv sync --extra foundation # TabPFN, TabICL foundation models
uv sync --extra eda # EDA dashboard dependencies
# Install all extras at once
uv sync --all-extras
# Install Git hooks
pre-commit install
pre-commit run --all-filesDependency Groups:
dev— Development tools: ruff, mypy, pytest, pre-commithpo— Hyperparameter optimization: ray[tune], optuna (advanced)deep— Deep learning: pytorch_tabular, torchfoundation— Foundation models: tabpfn, tabicleda— Dashboard: streamlit, plotly, scikit-learn
# Linting and Formatting
uv run ruff check src tests
uv run ruff format .
# Type Checking
uv run mypy src/alphapulse tests
# Tests
uv run pytest tests/ -v --tb=short
# Add/Remove Packages
uv add tenacity loguru
uv remove tenacityAlphaPulse provides Make targets for common development tasks:
# Code Quality
make fmt # Format code with ruff (imports + format)
make lint # Lint code with ruff (check + format --check)
make types # Run mypy type checking
make test # Run pytest with coverage
make check # Run all checks: lint + types + test + deadcode
make deadcode # Find unused code with vulture
# EDA Module
make eda-lint # Lint EDA dashboard code (standalone check)
# Notebooks
make nb-format # Format notebooks with nbqa
make nb-lint # Lint notebooks with nbqa
# Git Hooks
make precommit # Run pre-commit hooks on all filesNote: EDA module (eda/) is excluded from make check, make types, and make test. Use make eda-lint for EDA quality checks.
When dependencies change in pyproject.toml:
# Fast sync
uv sync --extra dev
# Clean rebuild
rm -rf .venv
uv sync --extra devAlphaPulse supports an end-to-end flow: preparing data, defining experiments via YAML, running HPO, and exporting a Numerai-ready predict.pkl.
The downloader expects Numerai API keys in environment variables: NUMERAI_PUBLIC_API_KEY and NUMERAI_PRIVATE_API_KEY.
This repo supports a local .env file (loaded by python-dotenv), or you can export variables in your shell:
export NUMERAI_PUBLIC_API_KEY="..."
export NUMERAI_PRIVATE_API_KEY="..."uv run python scripts/download_dataset.py \
--dataset-version v5.2 \
--output-dir dataExpected files in data/v5.2/: train.parquet, validation.parquet, and features.json.
Use this for manual iteration. Define your architecture in a YAML file and run:
uv run python scripts/run_experiment.py \
--config experiments/example_v1.yaml \
--artifact-dir artifacts/experimentsThis script builds the pipeline, trains on the training set, and outputs backtest metrics for the validation split.
Use scripts/hpo_pipeline.py to search over preprocessing steps, models, and ensemble methods.
uv run python scripts/hpo_pipeline.py \
--data-dir data/v5.2 \
--train-subsample 0.125 \
--num-trials 30 \
--output-dir artifacts/hpo_x8 \
--local \
--wandb-project alphapulse-hpoUseful flags:
--wandb-project <name>— log every trial to Weights & Biases (project name is timestamped and saved for--resume)--resume— skip trials already recorded intrials.db--trial-timeout <sec>— kill stuck trials (default: 1800)--max-hours <h>— stop after a time budget--objective corr_sharpe|mmc_sharpe|payout_score— optimization target
W&B trial runs include scalar metrics (corr_sharpe, mmc_sharpe, metric/mmc) and per-trial
diagnostics/ charts (per-era correlation, feature exposure, SHAP importance). After the search,
a best-trial-diagnostics run and search-convergence / hpo-summary runs are logged to the
same W&B group.
The best resulting configuration will be saved to artifacts/hpo_x8/best_config.json.
Use scripts/autoresearch.py to let a Claude agent drive the research process, iteratively proposing and testing pipeline improvements (adding models, tuning hyperparameters, changing ensembles, etc.).
uv run python scripts/autoresearch.py \
--data-dir data/v5.2 \
--train-subsample 0.125 \
--trials 50 \
--hours 2 \
--output-dir artifacts/autoresearch \
--agent-model claude-sonnet-4-6The agent will:
- Analyze trial results and decide what to try next
- Propose mutations: add/remove models, tune hyperparameters, change ensemble methods, add preprocessors, set neutralization
- Track progress and reasoning in
research_state.json - Output the best configuration to
best_config.jsonand a summary totrials_summary.csv
Optional: Start from an existing config with --seed-config path/to/config.json, or resume with --resume.
Use the "test pipeline" for a lightweight run that trains on a small subsample, backtests, and exports a pickle in one go.
uv run python scripts/run_test_pipeline.py \
--data-dir data/v5.2 \
--train-subsample 0.05 \
--output-dir artifacts/test_runNo W&B account? Prefix the command with
WANDB_MODE=disabledto skip W&B logging. You can also addWANDB_MODE=disabledto your.envfile to make it permanent.
To generate the predict.pkl required for Numerai:
From an HPO result:
uv run python scripts/export_numerai_pickle.py \
--data-dir data/v5.2 \
--best-config-path artifacts/hpo_x8/best_config.json \
--train-subsample 0.125 \
--output-dir artifacts/competition_pickleFrom a YAML Experiment:
Prefer using scripts/run_test_pipeline.py or scripts/export_from_yaml.py for YAML-driven exports.
Run your trained model on live tournament data to generate predictions:
uv run python scripts/live_inference.py \
--model-path artifacts/competition_pickle/predict.pkl \
--data-dir data/v5.2 \
--output-path artifacts/live/predictions.csvParameters:
--model-path: Path to the exportedpredict.pkl(from step 6)--data-dir: Directory containinglive.parquet(download fresh data before each round)--output-path: Where to save the submission CSV--benchmark-col: Benchmark column name (default:v2_equivalent_return)--validate: Run submission format validation (default:true)
Output: CSV file with id and prediction columns ready for submission.
Upload your predictions to the tournament:
uv run python scripts/submit_predictions.py \
--predictions-path artifacts/live/predictions.csv \
--model-name my_model_namePrerequisites:
- Set
NUMERAI_PUBLIC_API_KEYandNUMERAI_PRIVATE_API_KEYin environment or.env - Install numerapi:
pip install numerapi(or add to pyproject.toml)
Parameters:
--predictions-path: Path to the CSV from step 7--model-name: Your model name as it appears in the Numerai dashboard--tournament: Tournament identifier (default:numerai)--validate: Run format validation before upload (default:true)
The script will:
- Validate submission format
- Look up your model ID
- Upload predictions for the current round
- Return submission ID for tracking
End-to-end flow for weekly tournament submissions:
# 1. Download latest data (do this each week before the deadline)
uv run python scripts/download_dataset.py \
--dataset-version v5.2 \
--output-dir data
# 2. Run live inference with your trained model
uv run python scripts/live_inference.py \
--model-path artifacts/competition_pickle/predict.pkl \
--data-dir data/v5.2 \
--output-path artifacts/live/predictions.csv
# 3. Submit to Numerai
uv run python scripts/submit_predictions.py \
--predictions-path artifacts/live/predictions.csv \
--model-name my_model_nameThe YAML format is defined by src/alphapulse/experiments/schema.py.
version: "1"
data:
data_dir: data/v5.2
train_subsample: 0.05
target_col: target
seed: 42
features:
columns: null # null defaults to all features
groups: {}
preprocessing: []
models:
- type: XGBoost
params:
max_depth: 3
learning_rate: 0.05
tree_method: hist
objective: reg:squarederror
ensemble_method: single
ensemble_params: {}
train:
n_rounds: 40
early_stopping_rounds: 5
evaluation:
primary_metric: mean_per_era_correlation- Feature Groups: Define
features.groupsas a mapping ofgroup_name -> [columns]. You can then assign specific models to specific groups usingmodels[].input_group: group_name. - Available Preprocessors:
StandardScaler,RobustScaler,PCA,TruncatedSVD,AutoencoderPreprocessor,CompressionPreprocessor,GaussianNoise,VarianceSelector,LGBMImportanceSelector,EraStableFeatureSelector,Packboost, andGroupedPreprocessor. - Available Models:
- Gradient Boosting:
XGBoost,LightGBM,CatBoost,Packboost - Tree Ensembles:
RandomForest,ExtraTrees - Linear:
Ridge - Foundation Models:
TabPFN— TabPFN v2 in-context learning (n_estimators=8)TabPFN3— TabPFN v3 local OSS (model_path="auto", n_estimators=8)TabPFN3Reasoning— TabPFN v3 API with reasoning (thinking_mode=true, thinking_effort="medium")TabICL— TabICL v2 in-context learning (n_estimators=8, kv_cache=false)
- Deep Learning:
TabularDL— pytorch_tabular wrapper (architecture: "ft_transformer"|"mlp") - Meta Models:
EraEnsemble— V3X-style era partitioning with Ridge meta-learner (n_subs=10)SyntheticDataAugmenter— Diffusion-based data augmentation from elite rows (top_fraction=0.10, n_synthetic=500)
- Gradient Boosting:
- Ensembling:
single,weighted, orstacking(Meta-learners:ridgeorxgboost).
For programmatic access to Numerai data outside of the CLI scripts, use NumeraiDataLoader:
from alphapulse import NumeraiDataLoader
loader = NumeraiDataLoader("data/v5.2", feature_set="medium")
train = loader.load_split("train", subsample=0.1)
train.X # feature DataFrame
train.y # target Series
train.era # era Series (or None)
train.n_rows, train.n_featuresAlphaPulse includes a standalone Streamlit dashboard for exploratory data analysis of Numerai datasets. The dashboard provides interactive visualizations and statistical analysis with English/Polish bilingual support.
Install EDA dependencies:
uv sync --extra edastreamlit run eda/app.pyThe dashboard will start at http://localhost:8501 by default.
Configure via environment variables:
export ALPHAPULSE_DATA_DIR=data/v5.2 # Path to data directory
export ALPHAPULSE_DATASET_VERSION=v5.2 # Dataset version stringOr create a .env file in the project root.
The dashboard provides 8 specialized analysis modules:
- Target Analysis — Target variable distribution, era-wise statistics, stability metrics, ridgeplots
- Feature Analysis — Single/multi-feature exploration, correlation tracking, temporal behavior, batch analysis
- Correlations — Feature-target correlations, inter-feature correlation matrices, network graphs
- Era Analysis — Temporal patterns, era statistics, rolling metrics, feature stability over time
- Feature Distributions — Distribution statistics, chi-square uniformity tests, entropy calculations
- Feature Importance — Pearson correlation and LightGBM importance rankings with visualization
- Clustering — Hierarchical clustering, dendrograms, redundancy detection, correlation heatmaps
- HPO Analysis — Hyperparameter optimization results visualization (loads
all_trials.jsonfrom artifacts)
- Multi-target support — Select any target column from the dataset
- Feature set selection — Choose between small (~20%), medium (~50%), or all (100%) features
- Bilingual interface — Toggle between English and Polish
- Data validation — Sanity checks for discrete vs continuous features
- Interactive visualizations — Plotly-based charts with zoom, pan, and export
- Data export — Download analysis results as CSV
- Caching — Efficient data loading and computation caching for large datasets
eda/
├── app.py # Main entry point
├── pages/ # Analysis modules
│ ├── target_analysis.py
│ ├── feature_analysis.py
│ ├── correlations.py
│ ├── era_analysis.py
│ ├── feature_distributions.py
│ ├── feature_importance.py
│ ├── clustering.py
│ └── hpo_analysis.py
└── utils/ # Utilities
├── config.py # Path resolution and constants
├── data_loader.py # NumeraiDataLoader wrapper
├── translations.py # YAML-based translation system
├── i18n.py # Translation helper (session state access)
└── common.py # Shared analysis utilities
The EDA module is self-contained and excluded from the main quality checks:
# Lint EDA code
make eda-lint
# The EDA module is excluded from:
# - mypy type checking
# - pytest coverage
# - Main make check pipeline├── artifacts/ # Model outputs, pickles, and HPO logs
├── data/ # Downloaded Numerai parquet files
├── experiments/ # YAML configuration files
├── scripts/ # Executable workflow scripts
│ ├── download_dataset.py # Download Numerai dataset
│ ├── run_experiment.py # Run a YAML-defined experiment (+ W&B logging)
│ ├── hpo_pipeline.py # Automated hyperparameter search (Ray Tune)
│ ├── run_test_pipeline.py # Lightweight smoke test
│ ├── export_numerai_pickle.py # Export predict.pkl from HPO result
│ ├── export_from_yaml.py # Export predict.pkl from YAML experiment
│ ├── live_inference.py # Run trained model on live tournament data
│ ├── submit_predictions.py # Upload predictions to Numerai
│ ├── make_feature_groups.py # Generate feature group definitions
│ ├── gpu_smoke_test.py # Verify GPU availability for deep models
│ ├── autoresearch.py # Claude-agent-driven research loop
│ └── wandb_sweep_config.yaml # W&B sweep configuration
├── eda/ # Standalone Streamlit EDA dashboard
│ ├── app.py # Main entry point (streamlit run eda/app.py)
│ ├── pages/ # Multi-page analysis modules (8 pages)
│ └── utils/ # Config, data loading, translations (EN/PL)
├── docs/assets/ # Diagrams and documentation assets
│ └── architecture.drawio.png # Architecture diagram (editable in draw.io)
├── src/alphapulse/ # Core framework source code
│ ├── autoresearch/ # Agent-driven research loop (loop, agent, mutations, state)
│ ├── evaluation/ # Backtesting, metrics, SHAP report, W&B diagnostics, submission validation
│ ├── experiments/ # YAML schema (ExperimentV1), runner, data loaders (incl. MMC validation frame)
│ ├── features/ # Feature/target catalog loaded from features.json
│ ├── hpo/ # HPO objective, search space, builder, registry, TrialDB (SQLite), export
│ ├── logging_/ # Leaderboard, W&B helpers, live loguru → W&B Logs bridge
│ ├── models/ # All model implementations + factory
│ ├── pipeline/ # Pipeline, MultiHeadPipeline, MultiTargetPipeline, model_access, ensemble, neutralizer
│ ├── preprocessors/ # All preprocessor implementations + factory (incl. autoencoder, compression, era-stable selector)
│ ├── utils/ # Global seed utility
│ └── validation/ # PurgedEraCV
└── tests/ # Unit tests
PRs are welcome. Please keep changes focused and ensure the pre-commit hooks pass:
pre-commit install
pre-commit run --all-filesCommit messages: prefer conventional commits (e.g. feat: ..., fix: ..., docs: ...).
See CHANGELOG.md for completed releases.
Completed — v0.6.0 (MMC + W&B Diagnostics):
- MMC on validation split: HPO scores
mmc,mmc_sharpe, andpayout_scoreonvalidation.parquetrows aligned withmeta_model.parquet(train holdout ids do not overlap meta-model ids). - W&B diagnostics as charts:
diagnostics/logs matplotlib bar/heatmap/line charts instead of raw tables; horizontal bar charts for feature importance and exposure; ensemble correlation heatmap. - Live W&B training logs: loguru lines and per-round XGBoost metrics stream to W&B during HPO trials.
- MultiTarget diagnostics:
pipeline/model_access.pyunifies model iteration and prediction collection for SHAP and ensemble diagnostics acrossPipelineandMultiTargetPipeline. - Feature catalog & routing:
features/catalog.pyand HPO feature routing resolvefeatures.jsonsets and YAML groups into per-model column lists. - HPO summary charts: scatter plots for trial corr Sharpe, MMC Sharpe, and runtime in the
hpo-summaryW&B run.
Completed — v0.5.0 (Production Hardening + XAI):
- HPO fault tolerance: Each local trial runs in an isolated subprocess; crashes mark the trial failed and the sweep continues. A SQLite-backed
TrialDBpersists trial state.--resumeskips already-completed trials. - Provenance artifact: On every export, a hermetically sealed bundle is written: resolved config,
uv exportdependency snapshot, and git commit hash. - Canonical artifact naming: Exported models follow
<TIMESTAMP>_<ARCH>_<TARGET>_<CONFIG_HASH>.pklwith alatest_predict.pklsymlink. - Masked loss for auxiliary targets:
MultiTargetPipelinedrops NaN rows per-target; targets with fewer than 10 valid rows are skipped entirely. - Feature neutralization in eval loop:
Backtesteraccepts an optionalFeatureNeutralizer; predictions are neutralized before metric computation. - W&B experiment runner integration:
scripts/run_experiment.pylogs configs, per-era metrics, and artifact paths to W&B via--wandb-project. - XAI / SHAP reporting:
shap_report.pycomputes per-era feature importance;wandb_diagnostics.pypushes rich HPO and XAI plots to W&B. - GPU HPO + foundation models:
TabPFN3,TabICL, andTabularDL(ft_transformer / mlp) with GPU-accelerated HPO via Ray Tune. - Universal feature importance + era stability:
EraStableFeatureSelectorranks features by blended mean importance / cross-era stability;feature_report.pysurfaces per-era diagnostics. - Autoencoder + compression preprocessors:
AutoencoderPreprocessorandCompressionPreprocessorfor learned low-dimensional representations.
alphapulse is distributed under the terms of the MIT license.
