Fine-tuning predicts only one cell type label regardless of input data

**Description:**
I'm having an issue where after fine-tuning scGPT for cell type annotation, the model predicts the same label for every single cell during inference.
**What's happening**
Training seems to go fine - the loss decreases normally and validation metrics look reasonable. But when I actually use the model to predict cell types, everything gets classified as the same type.
I tested with the official MS dataset demo and that works perfectly. But when I swap in my own pancreatic dataset (while using the exact same preprocessing code), I get this single-label prediction problem.
**Potential cause**
The main difference I've noticed is that my raw counts are stored as floats (like 1.0019583, 3.0177166) instead of integers. This might have happened during some upstream preprocessing. Everything else - gene names, batch structure, cell type labels - seems to match the expected format.

**Questions**
- Does scGPT require strictly integer counts? Will float-valued raw counts cause problems?
- Could non-integer values mess up the tokenization/binning in a way that makes all cells look identical to the model?
- Are there other dataset requirements I might be missing that could cause this?

I'm trying to figure out what's making the model treat all my cells as the same class. Any pointers would be really helpful.
Thanks in advance for any guidance!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fine-tuning predicts only one cell type label regardless of input data #345

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Fine-tuning predicts only one cell type label regardless of input data #345

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions