This pipeline transforms raw mono-channel microscopy timelapses into multi-channel single-cell datasets. By stacking temporal frames into the channel dimension, the pipeline produces images that also carry temporal information and can then be used for any kind of feature extraction. The output of the pipeline is also well suited for the use with ML-/DL-approaches and was validated with DINO trained ViTs.
-
Temporal Channel Encoding: Converts time-series data into 3D volumes where the Channel-Axis represents time (
$t_0, t_1, \dots, t_n$ ). - Cell-Centered Cropping: Automatic centroid alignment across frames to reduce movement noise.
- Robust-Tracking: Built-in SAD-based outlier detection to handle Field-of-View jumps.
- Optional Expert Annotation Matching: High-precision (>99%) matching of manual labels to tracks for facilitated supervised learning or downstream validation.
This diagram illustrates the flow from raw data acquisition through the core preprocessing steps—Segmentation, Tracking, Matching, and Cropping—culminating in the final multi-channel, cell-centered dataset.
To run the pipeline, you need the following directories somwhere accessible:
input_data/
├── img_data/
│ ├── exp01_001.tif # 2D+T TIFF stacks
│ └── exp01_002.tif
├── annotations/
│ ├── exp01_001.csv # Optional: (x, y, t, filename) expert labels
│ └── exp01_002.csv
└── experiment_info/
└── experiment_info.csv # (Experiment, magnification, Acquisition_frequency(min), Apo_annotation_frequency(min))
The directories can be targeted separately in the config and do not need to be moved into one input_data directory as shown above.
Clone the repository and create the environment using the provided environment.yml:
git clone https://github.com/pertzlab/apoDet.git
cd apoDet
conda env create -f environment.yml
conda activate preprocessingThe final step before you can produce your own dataset is to customize config.yml.
Minimum Check:
RUN_NAME: Defines the name of the output directoryEXTERNAL_PATHS: Paths to source_images_dir/, experiment_info.csv & manual_annotations/ (optional)TARGET_CHANNEL: Choose which channel you want to use here Custom Runs:MIN_NUC_SIZE: Defines threshold used for small object filtering after segmentationMAX_TRACKING_DURATION: Define how much time the crops span (minutes)FRAME_INTERVAL: Define time between "frames" in the produced crops (e.g. 20 minutes with an interval of 5 lead to 5 channels: t0, t5, t10, t15, t20)CROPS_PER_TRACK: Define how many crops are extracted per detected track (-1 also possible if you want all available positions)WINDOW_SIZE: Define how big in the spatial dimension the crops should be.
If you are ready to run the pipeline you can do this either locally or submit it to a cluster.
python pipeline.py # Local execution
sbatch run_pipeline.sh # Cluster submission (Slurm)| Stage | Method/Tool | Performance | Description |
|---|---|---|---|
| 1. Segmentation | StarDist | 11min/file | Segmentation of cell nuclei |
| 2. Tracking | btrack | 7min/file | Generate trajectories |
| 3. Matching | Majority Vote | 0.1s/annot | Success Rate > 99% |
| 4. Cropping | Custom | 70s/file | Generate centered 32x32x5 crops |
The pipeline overall takes less than 10 minutes per GB of input.
The generated output is a multi-channel crop
---
The output from this preprocessing pipeline served as the foundation for self-supervised training of a model with the scDINO framework, which extends DINO to more than three channels.
- Unsupervised Feature Extraction: The temporal channel-encoded crops served as input to the scDINO Vision Transformer (ViT). The model learns representations that capture both morphology and temporal dynamics without any labels.
- Latent Space Exploration (UMAP):
- Biological Clusters (A-D): Specific regions for apoptotic cells (B), and detailed clusters for various phases of mitosis: metaphase to anaphase (A), telophase to cytokinesis (C), and cells captured during/after cytokinesis (D).
- Technical/Artifact Clusters (E-H): Technical failures, underlining the robustness of the temporal encoding. These clusters included crops with tracking errors consistently appearing in specific frames (E and F), cells out of the focal plane (G, epithelial extrusion), and crops displaying a characteristic grainy texture likely due to imaging artifacts (H).
This pipeline relies on several fundamental open-source tools and published methodologies:
- Segmentation was performed using StarDist (Schmidt et al., 2018).
- Tracking utilized btrack (Ulicna et al., 2021) which was adapted for robustness against FoV jumps.
- Downstream Application scDINO (Pfaendler et al., 2023) is an adaptation of the original self-supervised learning model DINO (Caron et al., 2021).
This project was developed as a Master thesis in the PertzLab at the University of Bern, Switzerland. If you like this project, are into automated microscopy or interested in dynamic signalling you might like to have a look at some of our other projects:
- ARCOS automatically detects collective events like waves of protein activity propagating through a tissue. Is also available as a plug-in for napari. Also check out the newest member in the ARCOS-ecosystem, ARCOS.px
- rtm-pymmcore lets you communicate with your microscope in real time in python.
- fabscope turns your microscope into a 3d printer.


