Skip to content

587: Add minimal AI model inference example (scikit-learn + FastAPI)#591

Open
NIRANJAN0125 wants to merge 1 commit intokubernetes:masterfrom
NIRANJAN0125:niranjan0125_587_ai_example
Open

587: Add minimal AI model inference example (scikit-learn + FastAPI)#591
NIRANJAN0125 wants to merge 1 commit intokubernetes:masterfrom
NIRANJAN0125:niranjan0125_587_ai_example

Conversation

@NIRANJAN0125
Copy link

Closes #587

What This PR Adds

A new example under AI/model-inference-sklearn/ demonstrating how to deploy
a lightweight AI/ML model for real-time inference on Kubernetes -- without GPUs,
specialized hardware, or heavy ML platforms.

This covers the core deployment pattern requested in #587:

  • Lightweight trained model: scikit-learn Random Forest classifier on the
    Iris dataset, trained during the container build and baked into the image.
  • Small inference API: FastAPI server with /predict, /healthz, and
    /readyz endpoints.
  • Kubernetes manifests: Deployment, Service, and PodDisruptionBudget.
  • Basic best practices: resource requests/limits, liveness and readiness
    probes, non-root user, read-only root filesystem, seccomp profile,
    capabilities dropped, topology spread constraints.

Files

AI/model-inference-sklearn/
├── README.md              # Full walkthrough with prerequisites, quick start,
│                          # detailed steps, verification, cleanup, and
│                          # troubleshooting
├── deployment.yaml        # Deployment with probes, resource limits, security
│                          # context, topology spreading, and /tmp emptyDir
├── service.yaml           # ClusterIP Service
├── pdb.yaml               # PodDisruptionBudget (minAvailable: 1)
└── image/
    ├── Dockerfile         # Multi-stage: train in stage 1, serve in stage 2
    ├── app.py             # FastAPI inference server
    ├── train_model.py     # Trains and saves the scikit-learn model
    └── requirements.txt   # Pinned Python dependencies

Also updates AI/README.md to add an "Available Examples" table listing
this example alongside the existing TensorFlow Serving and vLLM examples.

Verified Locally

  • train_model.py produces a valid model (test accuracy: 1.0).
  • FastAPI server starts, loads the model, and serves all endpoints correctly.
  • /healthz returns 200, /readyz returns 200 after model load.
  • /predict returns correct class labels and probabilities.
  • Error cases (wrong feature count, empty input, invalid JSON) all return
    appropriate 422 responses.
  • All YAML manifests parse as valid Kubernetes resources.
  • Cross-file consistency verified: labels, ports, UIDs, probe paths, and
    model paths all match across Deployment, Service, PDB, Dockerfile, and
    application code.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Feb 8, 2026
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: NIRANJAN0125
Once this PR has been reviewed and has the lgtm label, please assign soltysh for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot
Copy link
Contributor

Welcome @NIRANJAN0125!

It looks like this is your first PR to kubernetes/examples 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes/examples has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot k8s-ci-robot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Feb 8, 2026
@Champbreed
Copy link

Hi @NIRANJAN0125, thanks for this PR.

One thought for the 'Further Reading' section: Since baking the model into the image works so well here for the Iris dataset, it might be worth adding a small note about using InitContainers or PersistentVolumes for users who are planning to scale this pattern to multi-gigabyte models. It would help them transition from this example to production-scale AI.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature Request: Minimal AI model inference example for Kubernetes

3 participants