PROFET (Particle-based Reconstruction Of generative Force-matched Expression Trajectories) is a computational framework for reconstructing continuous gene expression dynamics from static, time-stamped single-cell RNA sequencing (scRNA-seq) data. Unlike conventional methods that rely on discrete timepoints or assume linear transitions, PROFET models cell state evolution as a two-step generative process grounded in mathematical formalisms.
- Step 1: Particle-based transport generation PROFET constructs transport plans between empirical distributions at different timepoints using a Lipschitz-regularized gradient flow formulation, yielding temporally smooth and distribution-consistent particle trajectories.
- Step 2: Force-matched velocity field learning A time-dependent velocity field is estimated using force matching, resulting in a global vector field that best explains the sampled particle flows across time.
This two-stage approach enables nonlinear, continuous trajectory reconstruction from sparsely sampled gene expression snapshots.
PROFET has been validated on both synthetic and experimental datasets and applied to uncover treatment-induced heterogeneity in breast cancer. By recovering dynamic expression trajectories from static scRNA-seq data, PROFET provides a scalable and principled tool for modeling cell state transitions in development, disease, and therapeutic response.
To use PROFET and perform downstream analysis, follow the instructions below. Note that two key notebooks operate in parallel, both using the same raw input data.
All datasets are stored in the data/ folder. Each dataset is organized into its own subfolder and should include:
- A gene expression matrix (e.g.,
.txt) - A corresponding cell time annotation file
These files serve as inputs for both notebooks described below.
This notebook performs PCA and saves the preprocessed version of each dataset.
- Input: Raw gene expression matrix and time annotations from the
data/subfolders. - Output: Preprocessed dataset saved in
.pklformat in thedata/folder. - These
.pklfiles are required for visualization and trajectory validation.
This notebook runs the full PROFET pipeline:
-
Step 1: Load raw gene expression matrix and time annotations (from
data/subfolders) -
Step 2: Run GPA to generate particle-based trajectories
-
Step 3: Apply force-matching to estimate the time-dependent vector field
-
Output: A
.picklefile containing the reconstructed single-cell trajectory, saved in theassets/folder.
data/ subfolders (not the .pkl files from preprocessing). It runs independently of Preprocess_datasets.ipynb, but both notebooks must be executed for full analysis.
You should now have:
- A preprocessed
.pklfile (fromPreprocess_datasets.ipynb) in thedata/folder - A reconstructed
.picklefile (fromPROFET_full_pipeline.ipynb) in theassets/folder
These two files are required for all downstream analysis and visualization tasks.
You can now explore the following notebooks:
-
Downstream analysis - Trajectory visualization and subtrajectory classification.ipynb
→ Visualize inferred trajectories in PCA space and classify subtrajectories based on cell fates or ancestors -
Downstream analysis - Reconstruction of gene expression dynamics in mESC and EMT data.ipynb
→ Visualize gene expression dynamics for genes of interest in mESC and EMT datasets -
Downstream analysis - Reconstruction of heterogeneity of phenotypic shift in MCF7 cell line and patients data.ipynb
→ Analyze the diversity of phenotypic shifts in both PCA space and gene dynamics in MCF7 and patient datasets
git clone https://github.com/yourusername/PROFET.git
cd PROFET
pip install -r requirements.txtPROFET/
├── scripts/ # Main training and utility scripts
│ ├── train_time_dep_vectorfields.py # Step 2: train time-dependent vector fields (force matching)
│ ├── GPA_NN/ # Step 1: Generative Particle Algorithm (GPA) implementation
│ │ └── GPA_NN.py # Main GPA training script
│ └── util/ # Helper functions: data loading, plotting, etc.
│
├── configs/ # YAML configuration files
│ └── gpa_config.yaml # Example configuration for GPA
│
├── notebooks/ # Jupyter notebooks for complete workflows
│ └── example_pipeline.ipynb # End-to-end example pipeline
│
├── data/ # Input datasets and preprocessed files
│
├── assets/ # Output directory for results and plots
│
└── README.md # Project overview and usage instructions
- Epithelial-Mesenchymal Transition (EMT)
- Synthetic Trajectories
- Stem Cell Differentiation
If you use PROFET in your research, please cite:
@article{cheng2025profet,
title={PROFET Predicts Continuous Gene Expression Dynamics
from scRNA-seq Data to Elucidate Resistance to Cancer Therapy},
author={},
journal={Preprint},
year={2025}
}