metaflow-modal

Work in progress. The API is functional but not yet stable. Breaking changes may occur before 1.0.

Run your GPU-hungry Metaflow steps on Modal without managing any infrastructure.

The problem

Running ML training or GPU inference inside a Metaflow pipeline means either paying for always-on GPU instances (expensive when idle) or wiring up a custom remote execution backend (time-consuming, fragile). The built-in @batch and @kubernetes decorators solve this for AWS and Kubernetes, but not for Modal's serverless GPU cloud — leaving a gap for teams who want on-demand GPU execution without the overhead.

Quick start

pip install metaflow-modal

from metaflow import FlowSpec, step, modal

class TrainFlow(FlowSpec):

    @step
    def start(self):
        self.next(self.train)

    @modal(gpu="A10G", memory=8192)
    @step
    def train(self):
        import torch
        self.device = str(torch.cuda.get_device_name(0))
        self.next(self.end)

    @step
    def end(self):
        print(f"Trained on: {self.device}")

if __name__ == "__main__":
    TrainFlow()

METAFLOW_DEFAULT_DATASTORE=s3 python train_flow.py run

Install

pip install metaflow-modal

Requires Modal credentials — run modal token new once to authenticate, or set:

export MODAL_TOKEN_ID=...
export MODAL_TOKEN_SECRET=...

A remote datastore (S3, GCS, or Azure Blob) is required — @modal sandboxes can't reach the local filesystem.

Usage

CPU-only step with custom image

@modal(cpu=4, memory=16384, image="python:3.11-slim", timeout=1800)
@step
def preprocess(self):
    ...

GPU step with Modal secrets

@modal(gpu="H100", secrets=["my-hf-token"])
@step
def finetune(self):
    import os
    token = os.environ["HF_TOKEN"]  # injected from Modal secret
    ...

Multi-GPU

@modal(gpu="A100:4")   # 4x A100 40GB
@step
def distributed_train(self):
    ...

Supported GPUs

Value	Hardware
`"T4"`	NVIDIA T4
`"A10G"`	NVIDIA A10G
`"A100"`	NVIDIA A100 40GB
`"A100-80GB"`	NVIDIA A100 80GB
`"H100"`	NVIDIA H100
`"L4"`	NVIDIA L4
`"TYPE:N"`	N of TYPE (e.g. `"A10G:2"`)

How it works

@modal intercepts the Metaflow step execution and redirects it through Modal's sandbox API. Your flow's code package is uploaded to your remote datastore (S3/GCS/Azure), then downloaded and executed inside an ephemeral Modal container. Artifacts and metadata are written back to the same datastore, so the rest of your flow continues normally on your local machine.

@modal step  ->  upload code package to S3
             ->  modal.Sandbox.create("bash", "-c", step_cmd)
             ->  stream stdout/stderr live
             ->  sandbox exits  ->  sync artifacts + metadata from S3

No Modal Functions to deploy, no apps to register. Each step spins up a fresh container and exits.

Configuration

Parameter	Default	Description
`cpu`	`1`	Number of CPUs
`memory`	`1024`	Memory in MB
`gpu`	`None`	GPU type (see table above)
`image`	Modal debian_slim	Docker image URL
`timeout`	`600`	Sandbox timeout in seconds
`env`	`{}`	Extra environment variables
`secrets`	`[]`	Modal secret names to mount

Environment variable	Purpose
`MODAL_TOKEN_ID`	Modal API token ID
`MODAL_TOKEN_SECRET`	Modal API token secret
`MODAL_ENVIRONMENT`	Modal environment name (optional)
`METAFLOW_DEFAULT_DATASTORE`	Must be `s3`, `gs`, or `azure`

Development

git clone https://github.com/npow/metaflow-modal
cd metaflow-modal
pip install -e ".[dev]"

# Unit tests (no credentials needed)
pytest tests/unit/

# Architecture enforcement tests
pytest tests/structural/ -m structural

# Lint + type check
ruff check src/ tests/
mypy src/

License

Apache 2.0

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.github/workflows		.github/workflows
docs		docs
src/metaflow_extensions/modal		src/metaflow_extensions/modal
tests		tests
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
smoke_test.py		smoke_test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

metaflow-modal

The problem

Quick start

Install

Usage

CPU-only step with custom image

GPU step with Modal secrets

Multi-GPU

Supported GPUs

How it works

Configuration

Development

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

metaflow-modal

The problem

Quick start

Install

Usage

CPU-only step with custom image

GPU step with Modal secrets

Multi-GPU

Supported GPUs

How it works

Configuration

Development

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages