-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
DEMOCRATIZING PRODUCTION-SCALE DISTRIBUTED DEEP LEARNING
https://arxiv.org/pdf/1811.00143.pdf
To address the above challenges, we discuss a system webuilt at Apple known asAlchemist. Alchemist adopts acloud-native architecture and is portable among private andpublic clouds. It supports multiple training frameworkslike Tensorflow or PyTorch and multiple distributed trainingparadigms. The compute cluster is managed by, but not lim-ited to, Kubernetes2. We chose a containerized workflowto ensure uniformity and repeatability of the software envi-ronment. In the following sections, we refer to engineers,researchers, and data scientists using Alchemist asusers.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels