Output `anndata` files

I'd like to tackle this once I'm back from vacation.

I need to read [the STARSolo manual](https://github.com/6c512b3b-fa9d-4cd7-b46f-d4b9b4b11aa6) closely to understand what can be handled cleanly and what can't but I think this is a good medium-term goal to output all cell/feature alignable innformation in `anndata`.  Long-term, having some sort of "aligner schema" at a high level where we output information for identifying how a count matrix was generated would be great but is a stretch.

For `anndata`, there are now readers in most bioinformatics-relevant languages I can think of and [the format is well-documented](https://anndata.readthedocs.io/en/latest/fileformat-prose.html): https://github.com/kaizhang/anndata-rs, https://github.com/ilan-gold/anndata.js, https://github.com/scverse/anndataR, and maybe others I'm missing (Julia?).

Ideally we would output zarr because it is both [the fastest (ideally with sharding + v3)](https://github.com/zarrs/zarr_benchmarks) and best cross-language (with the main outlier being R, where there is ongoing work: https://github.com/scverse/anndataR/pull/190).  Relatedly, hdf5 is not cloud friendly (i.e., browser or local + no prospect for support in JavaScript unlike in R for zarr) and is generally slower from what I have observed (in rust + python at least, multithreading in zarr is the default and works well).  However, providing options for both is probably realistic.

The other thing about zarr is that [`SpatialData`](https://github.com/scverse/spatialdata) only supports zarr.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Output `anndata` files #1

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Output anndata files #1

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

Output `anndata` files #1