Source code for:
Trong Quy Bui, Tuyet Hue Tran, Khac Toan Nguyen. A context-aware AI early-warning framework for detecting zombie cryptocurrencies. (2026)
A context-aware AI framework that produces a daily top-k risk watchlist for zombie cryptocurrency detection. The system combines:
- K = 9 CatBoost experts - each specialised for a (market-regime, liquidity-tier) context
- Attentive Top-2 router - selects the two most relevant experts per observation
- Two-stage calibration - per-expert (Val-A) then global (Val-B_cal)
zombie-risk/
├── config.yaml # All hyperparameters (aligned with paper Table 2)
├── train.py # Train the full MoE pipeline
├── evaluate.py # Evaluate MoE + baselines, reproduce Tables 3 & 6
├── predict.py # Generate daily top-k watchlist
├── requirements.txt
└── src/
├── labels.py # Zombie label creation (Eq. 1)
├── features.py # Leakage-safe feature engineering
├── regimes.py # Market-regime + liquidity-tier assignment (K=9)
├── experts.py # CatBoost experts with soft partitioning (Eq. 4)
├── router.py # Attentive Top-2 router (Eq. 6–10)
├── calibration.py # Per-expert + global calibration (Eq. 5)
├── baselines.py # Logistic hazard, XGBoost, Random forest, CatBoost single
├── metrics.py # Recall@k, NetVal@k, F_β@30 (Eq. 11)
├── tuning.py # Optuna TPE - 100 trials, Median pruning (Table 2)
└── pipeline.py # End-to-end ZombieRiskPipeline
pip install -r requirements.txtRequired packages: catboost, xgboost, scikit-learn, torch, optuna, pandas, numpy, scipy, pyarrow, pyyaml
Input data columns: coin_id, date, volume, price, marketcap
# Standard training with config defaults
python train.py \
--config config.yaml \
--data data/raw/coingecko_daily.parquet \
--output outputs/models/run_01 \
--horizon 7 # H=7 (primary) or H=28 (robustness)
# With Optuna hyperparameter tuning (100 trials per expert)
python train.py \
--config config.yaml \
--data data/raw/coingecko_daily.parquet \
--output outputs/models/tuned \
--tune
# Skip label + feature engineering if already preprocessed
python train.py --skip-prep --data data/processed/features.parquet ...# Evaluate MoE + all baselines on the test set (reproduces Tables 3 & 6)
python evaluate.py \
--config config.yaml \
--data data/processed/features.parquet \
--model-dir outputs/models/run_01 \
--output outputs/results/ \
--horizon 7Results saved to outputs/results/:
table3_H7.csv- out-of-sample comparison (Recall@k, NetVal@k)table6_ablation_H7.csv- ablation studymoe_metrics_H7.json- MoE summary metrics
python predict.py \
--config config.yaml \
--data data/raw/latest.parquet \
--model-dir outputs/models/run_01 \
--k 30 \
--date 2025-06-03
# Include expert routing weights in output
python predict.py ... --with-routingAll hyperparameters are in config.yaml and aligned with Table 2 of the paper:
| Key | Value | Paper ref |
|---|---|---|
regimes.n_experts |
9 | K = 3 regimes × 3 tiers |
regimes.soft_weight |
0.1 | ρ (Eq. 4) |
router.attention_dim |
16 | d (Table 2) |
router.top_k |
2 | Sparse Top-2 routing |
router.entropy_weight |
1e-3 | λ_ent (Eq. 10) |
router.l2_weight |
1e-4 | λ_2 (Eq. 10) |
metrics.netval_B / C |
0.3036 / 0.0665 | Eq. 11 |
metrics.selection_beta |
2.14 | F_β@30 (Table 2) |
tuning.n_trials |
100 | Optuna TPE (Table 2) |
tuning.pruner |
median | Median pruning (Table 2) |
Train (before 2024) → Val-A (Jan–Apr 2024) per-expert calibration
→ Val-B_gate (May–Aug) router training
→ Val-B_cal (Sep–Dec) global calibration
→ Test (Jan–Oct 2025) final evaluation
Each stage uses a non-overlapping validation split to prevent double-dipping.
The data were provided under license. Data and replication code are available from the corresponding author upon reasonable request, subject to permission from the data provider, contact by email copyright@vlab.io.vn.
@article{bui2026zombie,
title = {A context-aware AI early-warning framework for detecting
zombie cryptocurrencies},
author = {Bui, Trong Quy and Tran, Tuyet Hue and Nguyen, Khac Toan},
year = {2026},
}