KServe - Production ML Serving on Kubernetes, from sklearn to LLMs

Sun, 03 May 2026 00:00:00 +0000

In November 2025, KServe joined the CNCF as an incubating project. For a project that started life as KFServing inside Kubeflow back in 2019, this was the formal recognition that ML model serving on Kubernetes had grown up. KServe is now the closest thing the cloud-native ecosystem has to a standard for putting trained models behind an API: scikit-learn, XGBoost, PyTorch, TensorFlow, ONNX, Triton, and increasingly Large Language Models all served through the same InferenceService CRD with the same scale-to-zero, autoscaling, traffic splitting, and observability primitives.

Genai on SREKubeCraft | Nikos Nikolakakis

KServe - Production ML Serving on Kubernetes, from sklearn to LLMs