A CLI tool for launching Kubernetes jobs with environment variable and secret management.
- Install uv:
curl -LsSf https://astral.sh/uv/install.sh | shAlternatively, you can install uv using pip:
pip install uv- Use
uvxto use the cli (theuvxcommand invokes a tool without installing it to the local .venv):
uvx kblaunch --helpWhen using the kblaunch command always prepend with uvx command.
Run the setup command to configure the tool (email and slack webhook):
uvx kblaunch setupThis will go through the following steps:
- Set the user (optional): This is used to identify the user and required by the cluster. The default is set to $USER.
- Set the email (required): This is used to identify the user and required by the cluster.
- Set up Slack notifications (optional): This will send a test message to the webhook, and setup the webhook in the config. When your job starts you will receive a message at the webhook. Note a slack webhook is also required for automatic vscode tunnelling.
- Set up a PVC (optional): This will create a PVC for the user to use in their jobs.
- Set the default PVC to use (optional): Note only one pod can use the PVC at a time. The default pvc will be passed to the job. The pvc will always be mounted at
/pvc. - Set up git credentials (optional): If the user has set up a git/rsa key on the head node. We can export it as a secret for them and automatically load it and setup git credentials in their launched pods. This requires having setup git/rsa credentials before hand.
The outcome of kblaunch setup is a .json file stored in `.cache/.kblaunch/config.json. It should look something like this:
{
"email": "XXX@ed.ac.uk",
"user": "sXXX-infk8s",
"slack_webhook": "https://hooks.slack.com/services/XXX/XXX/XXX",
"default_pvc": "sXXX-infk8s-pvc",
"git_secret": "sXXX-infk8s-git-ssh"
}When you later use kblaunch to launch a job, it will use the values stored in that config.json.
Launch a simple job:
uvx kblaunch launch
--job-name myjob \
--command "python script.py"-
From local environment:
export PATH=... export OPENAI_API_KEY=... # pass the environment variables to the job kblaunch launch \ --job-name myjob \ --command "python script.py" \ --local-env-vars PATH,OPENAI_API_KEY
-
From Kubernetes secrets:
uvx kblaunch launch \ --job-name myjob \ --command "python script.py" \ --secrets-env-vars mysecret1,mysecret2 -
From .env file (default behavior):
uvx kblaunch launch \ --job-name myjob \ --command "python script.py" \ --load-dotenvIf a .env exists in the current directory, it will be loaded and passed as environment variables to the job.
Specify GPU requirements:
uvx kblaunch launch \
--job-name gpu-job \
--command "python train.py" \
--gpu-limit 2 \
--gpu-product "NVIDIA-A100-SXM4-80GB"Launch an interactive job:
uvx kblaunch launch \
--job-name interactive \
--interactiveLaunch command options:
--email: User email (overrides config)--job-name: Name of the Kubernetes job [required]--docker-image: Docker image (default: "nvcr.io/nvidia/cuda:12.0.0-devel-ubuntu22.04")--namespace: Kubernetes namespace (default: $KUBE_NAMESPACE)--queue-name: Kueue queue name (default: $KUBE_QUEUE_NAME)--interactive: Run in interactive mode (default: False)--command: Command to run in the container [required if not interactive]--cpu-request: CPU request (default: "1")--ram-request: RAM request (default: "8Gi")--gpu-limit: GPU limit (default: 1)--gpu-product: GPU product type (default: "NVIDIA-A100-SXM4-40GB")- Available options:
- NVIDIA-A100-SXM4-80GB
- NVIDIA-A100-SXM4-40GB
- NVIDIA-H100-80GB-HBM3
- Available options:
--secrets-env-vars: List of secret environment variables (default: [])--local-env-vars: List of local environment variables (default: [])--load-dotenv: Load environment variables from .env file (default: True)--nfs-server: NFS server address (default: set to environment variable $INFK8S_NFS_SERVER_IP)--pvc-name: Persistent Volume Claim name (default:default_pvcif present inconfig.json)--dry-run: Print job YAML without creating it (default: False)--priority: Priority class name (default: "default")- Available options:
default,batch,short
- Available options:
--vscode: Install VS Code CLI in container (default: False)--tunnel: Start VS Code SSH tunnel on startup (requires$SLACK_WEBHOOKand --vscode flag)--startup-script: Path to startup script to run in container
Monitor command options:
--namespace: Kubernetes namespace (default: $KUBE_NAMESPACE)
The kblaunch monitor command provides several subcommands to monitor cluster resources:
Displays aggregate GPU statistics for the cluster:
uvx kblaunch monitor gpusDisplays queued jobs (jobs which are waiting for GPUs):
uvx kblaunch monitor queueDisplays per-user statistics:
uvx kblaunch monitor usersDisplays per-job statistics:
uvx kblaunch monitor jobsNote that users and jobs commands will run nvidia-smi on pods to obtain GPU usage is not recommended for frequent use.
- Kubernetes job management
- Environment variable handling from multiple sources
- Kubernetes secrets integration
- GPU job support
- Interactive mode
- Automatic job cleanup
- Slack notifications (when configured)
- Persistent Volume Claim (PVC) management
- VS Code integration (with Code tunnelling support)
- Monitoring commands