-
Notifications
You must be signed in to change notification settings - Fork 0
Description
I'd like to tackle this once I'm back from vacation.
I need to read the STARSolo manual closely to understand what can be handled cleanly and what can't but I think this is a good medium-term goal to output all cell/feature alignable innformation in anndata. Long-term, having some sort of "aligner schema" at a high level where we output information for identifying how a count matrix was generated would be great but is a stretch.
For anndata, there are now readers in most bioinformatics-relevant languages I can think of and the format is well-documented: https://github.com/kaizhang/anndata-rs, https://github.com/ilan-gold/anndata.js, https://github.com/scverse/anndataR, and maybe others I'm missing (Julia?).
Ideally we would output zarr because it is both the fastest (ideally with sharding + v3) and best cross-language (with the main outlier being R, where there is ongoing work: scverse/anndataR#190). Relatedly, hdf5 is not cloud friendly (i.e., browser or local + no prospect for support in JavaScript unlike in R for zarr) and is generally slower from what I have observed (in rust + python at least, multithreading in zarr is the default and works well). However, providing options for both is probably realistic.
The other thing about zarr is that SpatialData only supports zarr.