diff --git a/AI/README.md b/AI/README.md index b4a1f6e22..e938a5fd8 100644 --- a/AI/README.md +++ b/AI/README.md @@ -28,6 +28,10 @@ We are particularly interested in examples that are: * Modular and showcase best practices. * Cover a diverse range of tools and MLOps stages. -## Current Status +## Available Examples -_This section is currently being populated. Check back soon for our first set of AI/ML examples!_ +| Example | Description | GPU Required | +|---|---|---| +| [Model Inference with Scikit-Learn](model-inference-sklearn/) | Minimal inference API (FastAPI + scikit-learn) with Kubernetes best practices (probes, resource limits, security context) | No | +| [TensorFlow Model Serving](model-serving-tensorflow/) | Deploy TensorFlow Serving with PersistentVolumes and Ingress | No (CPU mode) | +| [vLLM Inference Server](vllm-deployment/) | Serve large language models (Gemma) with vLLM and optional HPA | Yes | diff --git a/AI/model-inference-sklearn/README.md b/AI/model-inference-sklearn/README.md new file mode 100644 index 000000000..b8908493c --- /dev/null +++ b/AI/model-inference-sklearn/README.md @@ -0,0 +1,290 @@ +# Minimal AI Model Inference on Kubernetes (Scikit-Learn + FastAPI) + +## Purpose / What You'll Learn + +This example demonstrates how to deploy a lightweight AI/ML model for +real-time inference on Kubernetes -- without GPUs, specialized hardware, or +heavy ML platforms. You'll learn how to: + +- Train and package a [scikit-learn](https://scikit-learn.org/) model inside a + container image. +- Serve predictions through a [FastAPI](https://fastapi.tiangolo.com/) + REST API. +- Deploy the inference server to Kubernetes using a + [Deployment](https://kubernetes.io/docs/concepts/workloads/controllers/deployment/) + with **resource requests/limits**, **liveness and readiness probes**, and + **security hardening** (non-root user, read-only filesystem). +- Expose the server with a + [Service](https://kubernetes.io/docs/concepts/services-networking/service/). +- Send test predictions and verify the setup. + +This is the simplest possible "AI on Kubernetes" pattern -- ideal for learning +the fundamentals before moving to GPU-accelerated serving solutions such as +[TensorFlow Serving](../model-serving-tensorflow/) or +[vLLM](../vllm-deployment/). + +--- + +## Table of Contents + +- [Prerequisites](#prerequisites) +- [Quick Start / TL;DR](#quick-start--tldr) +- [Detailed Steps & Explanation](#detailed-steps--explanation) + - [1. Build the Container Image](#1-build-the-container-image) + - [2. Deploy to Kubernetes](#2-deploy-to-kubernetes) + - [3. Expose the Service](#3-expose-the-service) +- [Verification / Seeing it Work](#verification--seeing-it-work) +- [Configuration Customization](#configuration-customization) +- [Cleanup](#cleanup) +- [Troubleshooting](#troubleshooting) +- [Further Reading / Next Steps](#further-reading--next-steps) + +--- + +## Prerequisites + +| Requirement | Details | +|---|---| +| Kubernetes cluster | v1.27 or later (tested with v1.31) | +| `kubectl` | Configured and in your `PATH` | +| Container runtime | Docker or a compatible builder (Podman, etc.) | +| Container registry | Any registry your cluster can pull from | +| `curl` | For sending test requests | + +> **Note:** This example does **not** require GPUs. It runs on any standard +> CPU node, making it easy to try on Minikube, kind, or a managed cluster. + +--- + +## Quick Start / TL;DR + +```shell +# 1. Clone the repo and build the image (replace ) +git clone --depth 1 https://github.com/kubernetes/examples.git +cd examples/AI/model-inference-sklearn +docker build -t /sklearn-inference:v1.0.0 image/ +docker push /sklearn-inference:v1.0.0 + +# 2. Update the image reference in deployment.yaml, then apply all manifests +# Replace with your actual registry, e.g. docker.io/myuser +kubectl apply -f deployment.yaml -f service.yaml -f pdb.yaml + +# 3. Wait for rollout and test +kubectl wait --for=condition=Available deployment/sklearn-inference --timeout=120s +kubectl port-forward service/sklearn-inference 8080:80 & +curl -s -X POST http://localhost:8080/predict \ + -H "Content-Type: application/json" \ + -d '{"instances": [[5.1, 3.5, 1.4, 0.2]]}' +``` + +--- + +## Detailed Steps & Explanation + +### 1. Build the Container Image + +The `image/` directory contains everything needed to build the inference +server: + +``` +image/ +- Dockerfile # Multi-stage build: train -> serve +- app.py # FastAPI inference server +- train_model.py # Trains & saves the scikit-learn model +- requirements.txt # Pinned Python dependencies +``` + +**Multi-stage Dockerfile explained:** + +- **Stage 1 (builder):** installs Python dependencies, runs + `train_model.py` to train a Random Forest classifier on the + [Iris dataset](https://scikit-learn.org/stable/auto_examples/datasets/plot_iris_dataset.html), + and saves the model as `iris_model.joblib`. +- **Stage 2 (runtime):** copies only the application code and the trained + model into a slim image, creates a non-root user, and starts the + FastAPI server. + +Clone the repository and build from the `image/` directory: + +```shell +git clone --depth 1 https://github.com/kubernetes/examples.git +cd examples/AI/model-inference-sklearn +docker build -t /sklearn-inference:v1.0.0 image/ +docker push /sklearn-inference:v1.0.0 +``` + +### 2. Deploy to Kubernetes + +Before applying, update the `image` field in `deployment.yaml` to point to +your registry: + +```yaml +# In deployment.yaml -> spec.template.spec.containers[0] +image: /sklearn-inference:v1.0.0 +``` + +Then apply the manifests: + +```shell +kubectl apply -f deployment.yaml -f service.yaml -f pdb.yaml +``` + +**What the manifests provide:** + +| Feature | How | +|---|---| +| Replicas | `replicas: 2` for basic availability | +| Resource governance | CPU/memory requests **and** limits | +| Readiness probe | `GET /readyz` -- traffic is routed only after the model loads | +| Liveness probe | `GET /healthz` -- container restarts if the process hangs | +| Security | `runAsNonRoot`, `readOnlyRootFilesystem`, all capabilities dropped | +| Writable temp storage | `emptyDir` mounted at `/tmp` for Python runtime | +| Disruption budget | [PodDisruptionBudget](https://kubernetes.io/docs/tasks/run-application/configure-pdb/) keeps at least 1 replica during node drains | +| Pod spreading | `topologySpreadConstraints` distributes replicas across nodes | + +Wait for the rollout: + +```shell +kubectl wait --for=condition=Available deployment/sklearn-inference --timeout=120s +``` + +Check pod status: + +```shell +kubectl get pods -l app=sklearn-inference +``` + +Expected output: + +``` +NAME READY STATUS RESTARTS AGE +sklearn-inference-6f9b8d7c5f-abc12 1/1 Running 0 30s +sklearn-inference-6f9b8d7c5f-def34 1/1 Running 0 30s +``` + +### 3. Expose the Service + +The included `service.yaml` creates a `ClusterIP` service on port 80. +Access it from your workstation via port-forward: + +```shell +kubectl port-forward service/sklearn-inference 8080:80 +``` + +--- + +## Verification / Seeing it Work + +With the port-forward running, send a prediction request: + +```shell +curl -s -X POST http://localhost:8080/predict \ + -H "Content-Type: application/json" \ + -d '{"instances": [[5.1, 3.5, 1.4, 0.2], [6.7, 3.0, 5.2, 2.3]]}' +``` + +Expected output: + +```json +{ + "predictions": [ + { + "label": "setosa", + "probability": 1.0 + }, + { + "label": "virginica", + "probability": 0.96 + } + ] +} +``` + +You can also verify the health endpoints: + +```shell +# Liveness +curl -s http://localhost:8080/healthz +``` + +```json +{"status": "alive"} +``` + +```shell +# Readiness +curl -s http://localhost:8080/readyz +``` + +```json +{"status": "ready"} +``` + +Check the container logs: + +```shell +kubectl logs -l app=sklearn-inference --tail=20 +``` + +Expected output: + +``` +INFO:inference-server:Loading model from /model/iris_model.joblib ... +INFO:inference-server:Model loaded successfully. +INFO: Started server process [1] +INFO: Waiting for application startup. +INFO: Application startup complete. +INFO: Uvicorn running on http://0.0.0.0:8080 (Press CTRL+C to quit) +``` + +--- + +## Configuration Customization + +| Parameter | How to Change | +|---|---| +| **Model** | Replace `train_model.py` with your own training script. Update `app.py` to match the new model's input/output schema. Rebuild the image. | +| **Replicas** | Edit `spec.replicas` in `deployment.yaml`. | +| **Resource limits** | Adjust `resources.requests` and `resources.limits` in `deployment.yaml` to match your model's footprint. | +| **Port** | Set the `PORT` environment variable in `deployment.yaml` and update the `containerPort` accordingly. | +| **External access** | Change the Service `type` to `LoadBalancer` or add an [Ingress](https://kubernetes.io/docs/concepts/services-networking/ingress/) resource. | + +--- + +## Cleanup + +Remove all resources created by this example: + +```shell +kubectl delete -f pdb.yaml -f service.yaml -f deployment.yaml +``` + +--- + +## Troubleshooting + +| Symptom | Likely Cause | Fix | +|---|---|---| +| Pod stays in `CrashLoopBackOff` | Model file missing or corrupt | Rebuild the image and verify `iris_model.joblib` exists at `/model/` | +| Readiness probe fails | Application hasn't started yet | Increase `initialDelaySeconds` in the readiness probe | +| `ImagePullBackOff` | Wrong image reference or registry auth | Verify the `image` field and ensure your cluster has pull access | +| `curl` returns connection refused | Port-forward not active | Re-run `kubectl port-forward service/sklearn-inference 8080:80` | + +--- + +## Further Reading / Next Steps + +- [Kubernetes Deployments](https://kubernetes.io/docs/concepts/workloads/controllers/deployment/) +- [Configure Liveness, Readiness and Startup Probes](https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/) +- [Resource Management for Pods and Containers](https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/) +- [Pod Security Standards](https://kubernetes.io/docs/concepts/security/pod-security-standards/) +- [PodDisruptionBudgets](https://kubernetes.io/docs/tasks/run-application/configure-pdb/) +- [scikit-learn Documentation](https://scikit-learn.org/stable/) +- [FastAPI Documentation](https://fastapi.tiangolo.com/) +- More AI examples in this repo: + [TensorFlow Serving](../model-serving-tensorflow/) | + [vLLM Inference](../vllm-deployment/) + +--- + +**Last Validated Kubernetes Version:** v1.31 diff --git a/AI/model-inference-sklearn/deployment.yaml b/AI/model-inference-sklearn/deployment.yaml new file mode 100644 index 000000000..96544f148 --- /dev/null +++ b/AI/model-inference-sklearn/deployment.yaml @@ -0,0 +1,106 @@ +# Deployment for the scikit-learn Iris inference API. +# +# Key best practices demonstrated: +# - Explicit resource requests and limits +# - Liveness and readiness probes +# - Non-root security context +# - Read-only root filesystem +# - Specific image tag (no ":latest") +# +# For more on Deployments see: +# https://kubernetes.io/docs/concepts/workloads/controllers/deployment/ +apiVersion: apps/v1 +kind: Deployment +metadata: + name: sklearn-inference + labels: + app: sklearn-inference +spec: + replicas: 2 + selector: + matchLabels: + app: sklearn-inference + template: + metadata: + labels: + app: sklearn-inference + spec: + # Security best practice: do not run containers as root. + # https://kubernetes.io/docs/concepts/security/pod-security-standards/ + securityContext: + runAsNonRoot: true + runAsUser: 10001 + runAsGroup: 10001 + seccompProfile: + type: RuntimeDefault + containers: + - name: inference-server + # --------------------------------------------------------------- + # IMPORTANT: Replace the image reference below with your own + # registry and tag after building the container image. + # + # To build and push: + # docker build -t /sklearn-inference:v1.0.0 image/ + # docker push /sklearn-inference:v1.0.0 + # --------------------------------------------------------------- + image: /sklearn-inference:v1.0.0 + ports: + - containerPort: 8080 + name: http + protocol: TCP + # Environment variables consumed by the application. + env: + - name: MODEL_PATH + value: "/model/iris_model.joblib" + - name: PORT + value: "8080" + # Resource requests and limits ensure predictable scheduling + # and protect against runaway resource consumption. + # https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/ + resources: + requests: + cpu: "250m" + memory: "256Mi" + limits: + cpu: "500m" + memory: "512Mi" + # Readiness probe: Kubernetes will not route traffic to the pod + # until the model is loaded and the /readyz endpoint returns 200. + # https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/ + readinessProbe: + httpGet: + path: /readyz + port: http + initialDelaySeconds: 5 + periodSeconds: 10 + failureThreshold: 3 + # Liveness probe: Kubernetes will restart the container if + # /healthz stops responding. + livenessProbe: + httpGet: + path: /healthz + port: http + initialDelaySeconds: 10 + periodSeconds: 15 + failureThreshold: 3 + # Container-level security context. + securityContext: + allowPrivilegeEscalation: false + readOnlyRootFilesystem: true + capabilities: + drop: + - ALL + volumeMounts: + - name: tmp + mountPath: /tmp + volumes: + - name: tmp + emptyDir: {} + # Spread pods across nodes to ensure high availability. + topologySpreadConstraints: + - maxSkew: 1 + topologyKey: kubernetes.io/hostname + whenUnsatisfiable: ScheduleAnyway + labelSelector: + matchLabels: + app: sklearn-inference diff --git a/AI/model-inference-sklearn/image/Dockerfile b/AI/model-inference-sklearn/image/Dockerfile new file mode 100644 index 000000000..6419f1c50 --- /dev/null +++ b/AI/model-inference-sklearn/image/Dockerfile @@ -0,0 +1,57 @@ +# --------------------------------------------------------- +# Multi-stage Dockerfile for the scikit-learn inference API. +# +# Stage 1 -- "builder": installs dependencies and trains +# the model so the artifact is baked +# into the image (no external storage +# required for this demo). +# Stage 2 -- "runtime": copies only what is needed to serve. +# --------------------------------------------------------- + +# ---------- Stage 1: build & train ---------- +FROM python:3.12-slim AS builder + +WORKDIR /build + +COPY requirements.txt . +RUN pip install --no-cache-dir -r requirements.txt + +COPY train_model.py . +RUN python train_model.py + +# ---------- Stage 2: runtime ---------- +FROM python:3.12-slim + +ARG APP_UID=10001 +ARG APP_GID=10001 + +LABEL org.opencontainers.image.description="Minimal scikit-learn inference server (Iris model)" \ + org.opencontainers.image.source="https://github.com/kubernetes/examples/tree/master/AI/model-inference-sklearn" + +# Run as non-root for security best practices. +RUN groupadd -g ${APP_GID} -r appuser && \ + useradd -u ${APP_UID} -r -g appuser -d /app -s /sbin/nologin appuser + +WORKDIR /app + +COPY requirements.txt . +RUN pip install --no-cache-dir -r requirements.txt + +# Copy application code and the pre-trained model from the builder stage. +COPY app.py . +COPY --from=builder /build/iris_model.joblib /model/iris_model.joblib + +# Environment variables with sensible defaults. +ENV MODEL_PATH="/model/iris_model.joblib" \ + PORT="8080" \ + PYTHONDONTWRITEBYTECODE="1" \ + PYTHONUNBUFFERED="1" \ + TMPDIR="/tmp" + +# Switch to non-root user. +USER appuser + +EXPOSE 8080 + +# Start the FastAPI server via uvicorn. +CMD ["python", "app.py"] diff --git a/AI/model-inference-sklearn/image/app.py b/AI/model-inference-sklearn/image/app.py new file mode 100644 index 000000000..5359783a4 --- /dev/null +++ b/AI/model-inference-sklearn/image/app.py @@ -0,0 +1,156 @@ +""" +FastAPI inference server for a scikit-learn Iris classification model. + +This lightweight API loads a pre-trained model at startup and exposes +a /predict endpoint that accepts feature vectors and returns predicted +class labels with probabilities. + +Endpoints: + GET /healthz - Liveness probe (always returns 200). + GET /readyz - Readiness probe (returns 200 once the model is loaded). + POST /predict - Accepts feature vectors and returns predictions. +""" + +import logging +import os +from contextlib import asynccontextmanager +from typing import List + +import joblib +import numpy as np +from fastapi import FastAPI, HTTPException +from pydantic import BaseModel, Field + +# --------------------------------------------------------------------------- +# Configuration +# --------------------------------------------------------------------------- +MODEL_PATH = os.environ.get("MODEL_PATH", "/model/iris_model.joblib") +PORT = int(os.environ.get("PORT", "8080")) + +logging.basicConfig(level=logging.INFO) +logger = logging.getLogger("inference-server") + +# --------------------------------------------------------------------------- +# Application state +# --------------------------------------------------------------------------- +model = None +class_names: List[str] = [] + + +@asynccontextmanager +async def lifespan(application: FastAPI): + """Load the model once at startup and release on shutdown.""" + global model, class_names + logger.info("Loading model from %s ...", MODEL_PATH) + try: + model = joblib.load(MODEL_PATH) + # Iris class names corresponding to label indices 0, 1, 2. + class_names = ["setosa", "versicolor", "virginica"] + logger.info("Model loaded successfully.") + except Exception as exc: + logger.error("Failed to load model: %s", exc) + raise + yield + logger.info("Shutting down inference server.") + + +app = FastAPI( + title="Scikit-Learn Inference Server", + version="1.0.0", + lifespan=lifespan, +) + +# --------------------------------------------------------------------------- +# Request / response schemas +# --------------------------------------------------------------------------- + + +class PredictRequest(BaseModel): + """Input payload for the /predict endpoint. + + Each entry in *instances* is a list of four numerical features + corresponding to sepal length, sepal width, petal length, and petal width. + """ + + instances: List[List[float]] = Field( + ..., + min_length=1, + json_schema_extra={ + "example": [[5.1, 3.5, 1.4, 0.2], [6.7, 3.0, 5.2, 2.3]] + }, + ) + + +class Prediction(BaseModel): + label: str + probability: float + + +class PredictResponse(BaseModel): + predictions: List[Prediction] + + +# --------------------------------------------------------------------------- +# Health endpoints (used by Kubernetes probes) +# --------------------------------------------------------------------------- + + +@app.get("/healthz", status_code=200) +def liveness(): + """Liveness probe -- the process is alive.""" + return {"status": "alive"} + + +@app.get("/readyz", status_code=200) +def readiness(): + """Readiness probe -- the model is loaded and ready to serve.""" + if model is None: + raise HTTPException(status_code=503, detail="Model not loaded yet") + return {"status": "ready"} + + +# --------------------------------------------------------------------------- +# Prediction endpoint +# --------------------------------------------------------------------------- + + +@app.post("/predict", response_model=PredictResponse) +def predict(request: PredictRequest): + """Return class predictions and confidence for each input instance.""" + if model is None: + raise HTTPException(status_code=503, detail="Model not loaded yet") + + try: + data = np.array(request.instances) + if data.ndim != 2 or data.shape[1] != 4: + raise HTTPException( + status_code=422, + detail="Each instance must have exactly 4 features.", + ) + + predictions = model.predict(data) + probabilities = model.predict_proba(data) + + results = [] + for idx, label_idx in enumerate(predictions): + results.append( + Prediction( + label=class_names[label_idx], + probability=round(float(probabilities[idx][label_idx]), 4), + ) + ) + return PredictResponse(predictions=results) + except HTTPException: + raise + except Exception as exc: + logger.exception("Prediction failed") + raise HTTPException(status_code=500, detail=str(exc)) + + +# --------------------------------------------------------------------------- +# Entrypoint (used by the container CMD) +# --------------------------------------------------------------------------- +if __name__ == "__main__": + import uvicorn + + uvicorn.run(app, host="0.0.0.0", port=PORT) diff --git a/AI/model-inference-sklearn/image/requirements.txt b/AI/model-inference-sklearn/image/requirements.txt new file mode 100644 index 000000000..192b140d4 --- /dev/null +++ b/AI/model-inference-sklearn/image/requirements.txt @@ -0,0 +1,6 @@ +fastapi==0.115.12 +uvicorn==0.34.2 +scikit-learn==1.6.1 +joblib==1.4.2 +numpy==2.2.3 +pydantic==2.10.6 diff --git a/AI/model-inference-sklearn/image/train_model.py b/AI/model-inference-sklearn/image/train_model.py new file mode 100644 index 000000000..0fdb67bdd --- /dev/null +++ b/AI/model-inference-sklearn/image/train_model.py @@ -0,0 +1,45 @@ +""" +Train a simple scikit-learn Random Forest classifier on the Iris dataset +and persist it as a joblib file. + +The resulting model file (iris_model.joblib) is embedded in the container +image so the inference server can load it at startup without any external +storage dependency. + +Usage: + python train_model.py +""" + +import joblib +from sklearn.datasets import load_iris +from sklearn.ensemble import RandomForestClassifier +from sklearn.model_selection import train_test_split + +# ------------------------------------------------------------------ +# 1. Load the built-in Iris dataset +# ------------------------------------------------------------------ +iris = load_iris() +X, y = iris.data, iris.target # 4 features, 3 classes + +# ------------------------------------------------------------------ +# 2. Split into training and test sets +# ------------------------------------------------------------------ +X_train, X_test, y_train, y_test = train_test_split( + X, y, test_size=0.2, random_state=42 +) + +# ------------------------------------------------------------------ +# 3. Train a lightweight Random Forest classifier +# ------------------------------------------------------------------ +clf = RandomForestClassifier(n_estimators=50, random_state=42) +clf.fit(X_train, y_train) + +accuracy = clf.score(X_test, y_test) +print(f"Test accuracy: {accuracy:.4f}") + +# ------------------------------------------------------------------ +# 4. Persist the trained model +# ------------------------------------------------------------------ +model_path = "iris_model.joblib" +joblib.dump(clf, model_path) +print(f"Model saved to {model_path}") diff --git a/AI/model-inference-sklearn/pdb.yaml b/AI/model-inference-sklearn/pdb.yaml new file mode 100644 index 000000000..a8a0ce4b5 --- /dev/null +++ b/AI/model-inference-sklearn/pdb.yaml @@ -0,0 +1,14 @@ +# PodDisruptionBudget ensures that at least one replica is always available +# during voluntary disruptions (e.g., node drains, cluster upgrades). +# +# For more on PDBs see: +# https://kubernetes.io/docs/tasks/run-application/configure-pdb/ +apiVersion: policy/v1 +kind: PodDisruptionBudget +metadata: + name: sklearn-inference-pdb +spec: + minAvailable: 1 + selector: + matchLabels: + app: sklearn-inference diff --git a/AI/model-inference-sklearn/service.yaml b/AI/model-inference-sklearn/service.yaml new file mode 100644 index 000000000..96770a453 --- /dev/null +++ b/AI/model-inference-sklearn/service.yaml @@ -0,0 +1,23 @@ +# Service that exposes the scikit-learn inference Deployment inside the +# cluster on port 80 (mapped to container port 8080). +# +# To access the service from outside the cluster you can use: +# kubectl port-forward service/sklearn-inference 8080:80 +# +# For more on Services see: +# https://kubernetes.io/docs/concepts/services-networking/service/ +apiVersion: v1 +kind: Service +metadata: + name: sklearn-inference + labels: + app: sklearn-inference +spec: + selector: + app: sklearn-inference + type: ClusterIP + ports: + - name: http + protocol: TCP + port: 80 + targetPort: http # refers to the named port on the container