Prerequisites WSL 2 (Windows Subsystem for Linux) → install from Microsoft Store: https://apps.microsoft.com/detail/9p9tqf7mrm4r?hl=en-US&gl=TW
CPU-optimized inference (ONNX Runtime) Model: MobileNetV3-Small (ImageNet) Hardware: Colab CPU Latency (avg): ~4.23 ms/img Throughput: ~236.27 FPS Export: PyTorch → ONNX (opset 13, dynamic batch), ORT graph optimizations ON
Minimal FastAPI service for ONNX inference on CPU (MobileNetV3-Small example).
docker build -t onnx-fastapi:cpu .
docker run --rm -p 8000:8000 onnx-fastapi:cpu
# open http://localhost:8000/docscurl http://localhost:8000/health
# {"status":"ok"}# Replace with your image path
curl -X POST "http://localhost:8000/predict" \
-H "accept: application/json" \
-H "Content-Type: multipart/form-data" \
-F "file=@samples/cat.jpg"{
"top1": {"label": "tabby_cat", "prob": 0.87},
"top5": [
{"label": "tabby_cat", "prob": 0.87},
{"label": "tiger_cat", "prob": 0.07},
{"label": "Egyptian_cat", "prob": 0.03},
{"label": "lynx", "prob": 0.02},
{"label": "cougar", "prob": 0.01}
]
}flowchart LR
A[Client / cURL / App] -->|HTTP: /predict| B[FastAPI Service]
B -->|NumPy tensors| C[ONNX Runtime]
C -->|CPU| D[(model.onnx)]
subgraph DockerContainer["Docker Container"]
B
C
D
end
E[(Host FS)]
D -->|"bind-mount (optional)"| E
FastAPI exposes /health and /predict.
ONNX Runtime (CPU) runs inference—no GPU required.
Model can be baked into the image or bind-mounted at runtime for quick swaps.