RADAR is a generative adversarial framework for marker-free fine-grained discovery of anomalous cells (ACs) in multi-sample and multimodal single-cell omics data.
RADAR is designed for Cross-sample Fine-grained Anomalous-cell Discovery (CFAD), where anomalous cells are defined as cell populations or cell states that are present in affected tissues but absent from healthy reference tissues. Given a healthy reference dataset and one or more target datasets, RADAR performs three coupled tasks in a unified workflow:
- AC detection: identify anomalous cells in target datasets relative to a healthy reference;
- Multi-sample alignment: reduce cross-sample or cross-modal technical variation while preserving AC-associated biological heterogeneity;
- AC resolution: resolve detected ACs into biologically coherent subtypes.
RADAR contains three collaborative phases:
RADAR first trains a reconstruction-GAN on the healthy reference dataset only. Since the model learns to reconstruct normal cellular states, target cells with larger-than-expected reconstruction deviations are identified as putative anomalous cells.
RADAR excludes Phase-I-detected ACs from alignment training to reduce the risk of treating AC-associated biological variation as batch effects. For predicted normal target cells, RADAR uses an Integrated-Gradients-guided pairing module to identify biologically relevant "kin" reference cells. These kin-cell pairs are then used to train a transferring-GAN that maps target datasets into a common reference space.
Detected ACs are aligned into the reference space and then resolved into fine-grained AC subtypes. RADAR combines post-alignment cellular embeddings with reconstruction-deviation signals, enabling more accurate resolution of biologically distinct AC populations across samples and modalities.
RADAR is implemented in Python and has been tested with Python 3.9.
Main dependencies include:
anndata>=0.10.7
numpy>=1.22.4
pandas>=1.5.1
scanpy>=1.10.1
scikit-learn>=1.2.0
scipy>=1.11.4
torch>=2.0.0
tqdm>=4.64.1
captum
matplotlib
RADAR is developed as a Python package. You will need to install Python, and the recommended version is Python 3.9.
You can download the package from GitHub and install it locally:
git clone https://github.com/Catchxu/RADAR.git
cd RADAR/
pip install .After installation, RADAR can be run through three consecutive phases: anomalous-cell detection, multi-sample alignment, and anomalous-cell resolution.
Train the reference-guided reconstruction model on the healthy reference dataset and predict anomalous cells in the target dataset:
bash scripts/01_phase1.shThis step generates anomalous-cell prediction results, including cell-level anomaly labels and reconstruction-deviation scores.
Use predicted normal target cells to perform reference-guided multi-sample alignment:
bash scripts/02_phase2.shThis step identifies kin reference cells for predicted normal target cells and maps target datasets into the reference-aligned space.
Resolve detected anomalous cells into fine-grained AC subtypes:
bash scripts/03_phase3.shThis step generates AC subtype labels and downstream results for anomalous-cell resolution.
All experimental datasets used in this project are available from their respective original sources. Due to file size and data-usage considerations, raw datasets are not included in this repository.
- Healthy human intestinal epithelial tissue dataset (10xG-hInt-N) is available at GEO: GSE185224.
- Human colorectal cancer tissue datasets (10xG-hCRC-T-A and 10xG-hCRC-T-B) are available at GEO: GSE178341.
- Mouse embryo dataset (10xG-mEmb) is available at GEO: GSE186069.
- Healthy human peripheral blood mononuclear cell dataset (10xG-hHPBMC), used as a cross-modality reference for AC detection, is available from 10x Genomics.
- PBMC datasets for trajectory inference (10xG-hPBMC-A, 10xG-hPBMC-B, and 10xG-hPBMC-C) are available at GEO: GSE146974.
- Systemic lupus erythematosus PBMC datasets for AC alignment (10xG-hPBMCSLE-A, 10xG-hPBMCSLE-B, and 10xG-hPBMCSLE-C) are available at GEO: GSE96583.
- Human skin tissue datasets (10xG-hSCC-N1 to 10xG-hSCC-N5 and 10xG-hSCC-P1 to 10xG-hSCC-P5) are available at GEO: GSE144240.
- Healthy and basal cell carcinoma human peripheral blood mononuclear cell scATAC-seq datasets (10xC-hHPBMC and 10xC-hPBMCBCC) are available from 10x Genomics.
- Mouse brain scATAC-seq datasets (10xC-mBrain-0, 10xC-mBrain-1, and 10xC-mBrain-2) are available from their original sources.
- Slide-seqV2 mouse embryo dataset (ssq-mEmb-33) is available at GEO: GSE197353.
