GPU-boosted implementation of PhenoGraph

I'm writing to share [my GPU-boosted implementation of PhenoGraph](https://gitlab.com/eburling/grapheno). Instead of using the CPU-bound libraries numpy, scipy.sparse, and sklearn as in the legacy implementation, I use the GPU-bound libraries cupy, cupyx.sparse, and cudf/cuml from NVIDIA's RAPIDS library to reduce execution time by orders of magnitude for large datasets. For especially large datasets or dataset compilations (~3 million cells x 50 features), the kNN search can be distributed to multiple GPUs, if they are available. For a synthetic dataset of 1 million cells x 30 features, the CPU implementation executes in ~6 hours, whereas the GPU implementation run on a single V100 GPU executes in ~40 seconds (~500-fold speed-up):

<img width="611" alt="benchmark" src="https://user-images.githubusercontent.com/19787532/88108183-464bef00-cb5d-11ea-99af-a03374a09726.png">

Modularity is comparable between GPU and CPU implementations:

<img width="1006" alt="tsne" src="https://user-images.githubusercontent.com/19787532/88108517-d9852480-cb5d-11ea-96ab-6f927d033fec.png">

Please feel free to link to the repo if interested: [https://gitlab.com/eburling/grapheno](https://gitlab.com/eburling/grapheno)

Thanks and sorry for the spam! I hope the community finds it useful.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPU-boosted implementation of PhenoGraph #22

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

GPU-boosted implementation of PhenoGraph #22

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions