Skip to content

vlabvn/zombie-risk

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Zombie Cryptocurrency Early-Warning - Context-Aware MoE Framework

Source code for:

Trong Quy Bui, Tuyet Hue Tran, Khac Toan Nguyen. A context-aware AI early-warning framework for detecting zombie cryptocurrencies. (2026)


Overview

A context-aware AI framework that produces a daily top-k risk watchlist for zombie cryptocurrency detection. The system combines:

  • K = 9 CatBoost experts - each specialised for a (market-regime, liquidity-tier) context
  • Attentive Top-2 router - selects the two most relevant experts per observation
  • Two-stage calibration - per-expert (Val-A) then global (Val-B_cal)

Project Structure

zombie-risk/
├── config.yaml          # All hyperparameters (aligned with paper Table 2)
├── train.py             # Train the full MoE pipeline
├── evaluate.py          # Evaluate MoE + baselines, reproduce Tables 3 & 6
├── predict.py           # Generate daily top-k watchlist
├── requirements.txt
└── src/
    ├── labels.py        # Zombie label creation (Eq. 1)
    ├── features.py      # Leakage-safe feature engineering
    ├── regimes.py       # Market-regime + liquidity-tier assignment (K=9)
    ├── experts.py       # CatBoost experts with soft partitioning (Eq. 4)
    ├── router.py        # Attentive Top-2 router (Eq. 6–10)
    ├── calibration.py   # Per-expert + global calibration (Eq. 5)
    ├── baselines.py     # Logistic hazard, XGBoost, Random forest, CatBoost single
    ├── metrics.py       # Recall@k, NetVal@k, F_β@30 (Eq. 11)
    ├── tuning.py        # Optuna TPE - 100 trials, Median pruning (Table 2)
    └── pipeline.py      # End-to-end ZombieRiskPipeline

Requirements

pip install -r requirements.txt

Required packages: catboost, xgboost, scikit-learn, torch, optuna, pandas, numpy, scipy, pyarrow, pyyaml

Input data columns: coin_id, date, volume, price, marketcap


Usage

1. Train

# Standard training with config defaults
python train.py \
  --config  config.yaml \
  --data    data/raw/coingecko_daily.parquet \
  --output  outputs/models/run_01 \
  --horizon 7          # H=7 (primary) or H=28 (robustness)

# With Optuna hyperparameter tuning (100 trials per expert)
python train.py \
  --config  config.yaml \
  --data    data/raw/coingecko_daily.parquet \
  --output  outputs/models/tuned \
  --tune

# Skip label + feature engineering if already preprocessed
python train.py --skip-prep --data data/processed/features.parquet ...

2. Evaluate

# Evaluate MoE + all baselines on the test set (reproduces Tables 3 & 6)
python evaluate.py \
  --config    config.yaml \
  --data      data/processed/features.parquet \
  --model-dir outputs/models/run_01 \
  --output    outputs/results/ \
  --horizon   7

Results saved to outputs/results/:

  • table3_H7.csv - out-of-sample comparison (Recall@k, NetVal@k)
  • table6_ablation_H7.csv - ablation study
  • moe_metrics_H7.json - MoE summary metrics

3. Predict (daily watchlist)

python predict.py \
  --config    config.yaml \
  --data      data/raw/latest.parquet \
  --model-dir outputs/models/run_01 \
  --k         30 \
  --date      2025-06-03

# Include expert routing weights in output
python predict.py ... --with-routing

Configuration

All hyperparameters are in config.yaml and aligned with Table 2 of the paper:

Key Value Paper ref
regimes.n_experts 9 K = 3 regimes × 3 tiers
regimes.soft_weight 0.1 ρ (Eq. 4)
router.attention_dim 16 d (Table 2)
router.top_k 2 Sparse Top-2 routing
router.entropy_weight 1e-3 λ_ent (Eq. 10)
router.l2_weight 1e-4 λ_2 (Eq. 10)
metrics.netval_B / C 0.3036 / 0.0665 Eq. 11
metrics.selection_beta 2.14 F_β@30 (Table 2)
tuning.n_trials 100 Optuna TPE (Table 2)
tuning.pruner median Median pruning (Table 2)

Sequential Training Protocol

Train (before 2024)  →  Val-A (Jan–Apr 2024)   per-expert calibration
                     →  Val-B_gate (May–Aug)    router training
                     →  Val-B_cal (Sep–Dec)     global calibration
                     →  Test (Jan–Oct 2025)     final evaluation

Each stage uses a non-overlapping validation split to prevent double-dipping.


Data Availability

The data were provided under license. Data and replication code are available from the corresponding author upon reasonable request, subject to permission from the data provider, contact by email copyright@vlab.io.vn.


Citation

@article{bui2026zombie,
  title  = {A context-aware AI early-warning framework for detecting
            zombie cryptocurrencies},
  author = {Bui, Trong Quy and Tran, Tuyet Hue and Nguyen, Khac Toan},
  year   = {2026},
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages