Leverage Ray from Berkeley for Distributed Training

*DEMOCRATIZING PRODUCTION-SCALE DISTRIBUTED DEEP LEARNING*

https://arxiv.org/pdf/1811.00143.pdf

To address the above challenges, we discuss a system webuilt at Apple known asAlchemist.   Alchemist adopts acloud-native architecture and is portable among private andpublic clouds.   It supports multiple training frameworkslike Tensorflow or PyTorch and multiple distributed trainingparadigms. The compute cluster is managed by, but not lim-ited to, Kubernetes2. We chose a containerized workflowto ensure uniformity and repeatability of the software envi-ronment.  In the following sections, we refer to engineers,researchers, and data scientists using Alchemist asusers.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Leverage Ray from Berkeley for Distributed Training #5

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Leverage Ray from Berkeley for Distributed Training #5

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions