Skip to content

Output anndata files #1

@ilan-gold

Description

@ilan-gold

I'd like to tackle this once I'm back from vacation.

I need to read the STARSolo manual closely to understand what can be handled cleanly and what can't but I think this is a good medium-term goal to output all cell/feature alignable innformation in anndata. Long-term, having some sort of "aligner schema" at a high level where we output information for identifying how a count matrix was generated would be great but is a stretch.

For anndata, there are now readers in most bioinformatics-relevant languages I can think of and the format is well-documented: https://github.com/kaizhang/anndata-rs, https://github.com/ilan-gold/anndata.js, https://github.com/scverse/anndataR, and maybe others I'm missing (Julia?).

Ideally we would output zarr because it is both the fastest (ideally with sharding + v3) and best cross-language (with the main outlier being R, where there is ongoing work: scverse/anndataR#190). Relatedly, hdf5 is not cloud friendly (i.e., browser or local + no prospect for support in JavaScript unlike in R for zarr) and is generally slower from what I have observed (in rust + python at least, multithreading in zarr is the default and works well). However, providing options for both is probably realistic.

The other thing about zarr is that SpatialData only supports zarr.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions