Skip to content

Fine-tuning predicts only one cell type label regardless of input data #345

@Echozhao225

Description

@Echozhao225

Description:
I'm having an issue where after fine-tuning scGPT for cell type annotation, the model predicts the same label for every single cell during inference.
What's happening
Training seems to go fine - the loss decreases normally and validation metrics look reasonable. But when I actually use the model to predict cell types, everything gets classified as the same type.
I tested with the official MS dataset demo and that works perfectly. But when I swap in my own pancreatic dataset (while using the exact same preprocessing code), I get this single-label prediction problem.
Potential cause
The main difference I've noticed is that my raw counts are stored as floats (like 1.0019583, 3.0177166) instead of integers. This might have happened during some upstream preprocessing. Everything else - gene names, batch structure, cell type labels - seems to match the expected format.

Questions

  • Does scGPT require strictly integer counts? Will float-valued raw counts cause problems?
  • Could non-integer values mess up the tokenization/binning in a way that makes all cells look identical to the model?
  • Are there other dataset requirements I might be missing that could cause this?

I'm trying to figure out what's making the model treat all my cells as the same class. Any pointers would be really helpful.
Thanks in advance for any guidance!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions