MetaMorphCells is a bioinformatics research project focused on studying a small population of cancer cells that undergo dedifferentiation into cancer stem cells (CSCs). This repository contains scripts and notebooks for preprocessing, analyzing, and visualizing single-cell sequencing data to explore the molecular mechanisms underlying dedifferentiation, therapy resistance, and stemness in ovarian cancer.
Cancer cells can sometimes "rewind" their development, reverting to a more primitive, stem-like state. This transformation gives them survival advantages such as therapy resistance, higher proliferative capacity, and metastatic potential.
In this project, we combine single-cell RNA-seq, spatial transcriptomics, gene regulatory network inference, and perturbation modeling to map and predict dedifferentiation trajectories in ovarian cancer. By integrating classical methods and foundation models, we aim to identify molecular drivers and therapeutic vulnerabilities.
- Cancer stem cells & dedifferentiation
- Single-cell transcriptomics
- Gene regulatory network (GRN) inference
- Perturbation modeling & foundation models
1. Data Preprocessing
GSE222557_data_preprocess.ipynbh5ad_file_process.py
2. Analysis Pipeline
scanpy_OC.ipynbscPopcorn.ipynbscTour_infer.py,scTour_model_training.py
3. Gene Regulatory Network Inference
GRN_in_house_human.pySCimilarity_Gene_Attribution.ipynb
4. Perturbation & Foundation Models
geneformer_perturbation.pyTutorial_Perturbation.ipynbESM-2.ipynbCell_Type_Classification_Fine_Tuning.ipynb
5. Visualization
Attention_Visualization.ipynbpseudotime_vector_field.png
| Tool/Method | Description |
|---|---|
| ALRA | Low-rank imputation of scRNA-seq data. |
| CytoTRACE2 | Differentiation potential inference. |
| scGPT | Foundation model for scRNA-seq (expression prediction, GRN inference, perturbation). |
| Geneformer | Transformer-based biological foundation model for perturbation and fine-tuning. |
| SCimilarity | Foundation model for cross-dataset similarity & gene attribution. |
| ESM-2 | Protein language model for structural biology and ligand–receptor inference. |
| scPopcorn | Rare/unique cluster identification. |
| scTour | Lineage trajectory inference with deep generative models. |
| Scanpy | Comprehensive scRNA-seq analysis toolkit. |
| scVelo / Velocyto | RNA velocity analysis for dynamic inference. |
- Identification of dedifferentiation markers in ovarian cancer cells.
- RNA velocity and pseudotime maps showing dedifferentiation trajectories.
- Gene regulatory networks highlighting candidate therapeutic targets.
- Perturbation analysis predicting vulnerabilities of WNT5A-CAF crosstalk.
- Early integration with spatial transcriptomics confirming regional CSC enrichment.
This project is still evolving. Future updates may include:
- Large-scale perturbation analysis with foundation models (Geneformer, scGPT).
- Integration of multi-omics (ATAC, CUT&RUN, proteomics) for regulatory inference.
- Structural modeling of ligand–receptor pairs with ESM-2 + AlphaFold2.
- Clinical dataset integration for biomarker discovery and patient stratification.
