Knative - The Platform Engineer's Guide to Serverless on Kubernetes :: SREKubeCraft

Every platform team eventually faces the same tension: developers want the simplicity of serverless, but the organization needs the control and portability of Kubernetes. Cloud-specific solutions like AWS Lambda or Google Cloud Run solve the developer experience problem, but they lock you into a single vendor’s ecosystem. When you’re running workloads across multiple clouds or on-premises, that’s a non-starter.

Knative bridges this gap. It’s an open-source, Kubernetes-native platform that brings serverless capabilities -automatic scaling, scale-to-zero, event-driven architectures -to any Kubernetes cluster. No vendor lock-in, no proprietary runtimes, just standard OCI containers running on infrastructure you already manage.

In this post, I’ll break down Knative’s architecture, walk through its autoscaling mechanics, cover the eventing system, and share practical guidance for platform engineers evaluating it for production use.

Who Should Read This?#

This post is for:

Platform Engineers building internal developer platforms who want to offer serverless abstractions on Kubernetes
SREs managing multi-cluster environments looking for efficient resource utilization through scale-to-zero
DevOps Teams evaluating serverless solutions without cloud vendor lock-in
Teams running AI/ML workloads that need to scale GPU-backed inference services efficiently

If you’re tired of managing Deployments, Services, Ingress, and HPA configs for every microservice, read on.

What is Knative?#

Knative is a CNCF Incubating project that extends Kubernetes with a set of middleware components for building, deploying, and managing modern serverless workloads. It provides higher-level abstractions that handle the operational complexity of scaling, networking, and event routing automatically.

Knative consists of three core components that can be used independently or together:

Component	Purpose	Key Feature
Serving	HTTP-triggered container runtime	Autoscaling from zero to thousands
Eventing	Asynchronous event routing layer	CloudEvents over HTTP, broker/trigger model
Functions	Developer-focused function framework	`func` CLI for building without Dockerfiles

Knative vs Cloud Serverless#

Aspect	AWS Lambda	Google Cloud Run	Knative
Runs On	AWS only	GCP only	Any Kubernetes cluster
Container Support	Limited (custom runtimes)	Full OCI containers	Full OCI containers
GPU Support	No	Limited	Yes (K8s native GPU scheduling)
Vendor Lock-in	High	High	None
Event System	EventBridge (proprietary)	Eventarc (proprietary)	CloudEvents (open standard)
Networking Control	None	Minimal	Full (Istio, Kourier, Contour)
Scale-to-Zero	Yes	Yes (with caveats)	Yes (KPA)
Cold Start Control	Limited	Limited	Configurable (minScale, stable window)

Bottom line: If you need serverless on your own infrastructure, across clouds, or with GPU workloads, Knative is the only option that gives you full control without lock-in.

What Knative is NOT#

Not a CI/CD pipeline: It doesn’t build or test your code (use Tekton, GitHub Actions, or ArgoCD for that).
Not a service mesh: It uses networking layers like Istio or Kourier but doesn’t replace them.
Not for running stateful applications: Your Knative Services must be stateless and HTTP-triggered. Kafka and RabbitMQ run as separate infrastructure backing the Eventing layer, not as Knative Services themselves.
Not a FaaS replacement for all use cases: If your team is fully on AWS and Lambda works, don’t switch just for the sake of it.

Knative Serving Architecture#

Serving is the heart of Knative. It manages the complete lifecycle of stateless HTTP services with automatic scaling, traffic splitting, and revision management.

Core CRDs#

Knative Serving introduces four Custom Resource Definitions that work together:

flowchart TB
    subgraph Serving["Knative Serving CRDs"]
        Service["Service<br/><i>serving.knative.dev</i>"] -->|manages| Configuration["Configuration"]
        Service -->|manages| Route["Route"]
        Configuration -->|creates on each update| Revision["Revision<br/><i>(immutable snapshot)</i>"]
        Route -->|routes traffic to| Revision
    end

Service: The top-level orchestrator. Every update creates a new Configuration and Route automatically.
Configuration: Maintains the desired state, separating code from config. Any change creates a new Revision.
Revision: An immutable, point-in-time snapshot of code and configuration. Revisions are what actually get scaled up and down.
Route: Maps network endpoints to one or more Revisions, enabling traffic splitting for canary and blue-green deployments.

Request Flow#

Understanding how a request reaches your application is critical for debugging latency and scaling issues:

flowchart LR
    Client["Client"] --> Ingress["Ingress Layer<br/>(Kourier/Istio/Contour)"]

    Ingress -->|scaled to zero| Activator["Activator"]
    Ingress -->|running at scale| Pod["Pod"]

    Activator -->|triggers| Autoscaler["Autoscaler"]
    Activator -->|forwards when ready| Pod

    subgraph Pod["Application Pod"]
        QP["Queue-Proxy<br/>(sidecar)"] --> App["App Container"]
    end

Ingress Layer: The request enters through a pluggable networking layer (Kourier, Istio, or Contour)
Routing Decision: If the service is scaled to zero or at low traffic, the request goes to the Activator. If the service is running with spare capacity, traffic routes directly to pods
Activator: Buffers requests during cold starts, signals the Autoscaler to spin up pods, and forwards queued requests once pods are ready
Queue-Proxy: A sidecar in every pod that measures concurrency, reports metrics to the Autoscaler, enforces hard concurrency limits, and handles graceful shutdown
Application Container: Your code processes the request

Why This Matters: The Activator bypass at high scale is a key design choice. It means Knative adds near-zero latency overhead for warm services -the Queue-Proxy sidecar is the only additional hop.

The Activator and Queue-Proxy#

These two components are what make Knative’s scaling magic possible:

Component	Role	Key Responsibilities
Activator	Data-plane buffer	Queues requests for scaled-to-zero services, triggers autoscaler, acts as burst buffer
Queue-Proxy	Per-pod sidecar	Measures concurrency/RPS, enforces hard limits, health probing, graceful shutdown

Autoscaling Deep Dive#

Knative’s autoscaling is its most powerful and nuanced feature. It goes far beyond standard Kubernetes HPA.

KPA vs HPA#

Knative supports two autoscaler implementations:

Feature	KPA (Knative Pod Autoscaler)	HPA (Horizontal Pod Autoscaler)
Default	Yes	No
Scale-to-Zero	Yes	No
Metrics	Concurrency, RPS	CPU, Memory
Panic Mode	Yes	No
Best For	Request-based workloads	CPU-intensive workloads

Stable vs Panic Mode#

The KPA operates in two modes to balance stability with responsiveness:

Stable Mode (default) Scaling decisions are based on an average of metrics over a 60-second window. This prevents churn from brief traffic spikes.

Panic Mode When traffic exceeds 200% of current capacity, the KPA switches to a 6-second window for rapid scale-up. While in panic mode, the service will not scale down to avoid losing capacity during a burst.

flowchart LR
    subgraph Stable["Stable Mode"]
        S1["60s averaging window"]
        S2["Gradual scale up/down"]
    end

    subgraph Panic["Panic Mode"]
        P1["6s averaging window"]
        P2["Rapid scale up only"]
        P3["No scale down"]
    end

    Stable -->|"traffic > 200% capacity"| Panic
    Panic -->|"stable for 60s"| Stable

Concurrency Targets#

Knative uses two types of concurrency limits:

Soft Limit (target) The average number of in-flight requests per pod the autoscaler targets. Can be briefly exceeded during bursts. Default: 100
Hard Limit (containerConcurrency) A strict upper bound enforced by the Queue-Proxy. Requests beyond this limit are queued. Use sparingly -low hard limits can cause unnecessary cold starts

Key Autoscaling Annotations#

apiVersion: serving.knative.dev/v1
kind: Service
metadata:
  name: my-service
spec:
  template:
    metadata:
      annotations:
        # Autoscaler class: kpa or hpa
        autoscaling.knative.dev/class: "kpa.autoscaling.knative.dev"
        # Metric: concurrency (default) or rps
        autoscaling.knative.dev/metric: "concurrency"
        # Soft target: average concurrent requests per pod
        autoscaling.knative.dev/target: "100"
        # Scale bounds
        autoscaling.knative.dev/min-scale: "0"
        autoscaling.knative.dev/max-scale: "50"
        # Stable window duration
        autoscaling.knative.dev/window: "60s"
        # Panic threshold percentage
        autoscaling.knative.dev/panic-threshold-percentage: "200"
    spec:
      # Hard concurrency limit (enforced by queue-proxy)
      containerConcurrency: 0
      containers:
        - image: gcr.io/my-project/my-app:v1

Tip: Place autoscaling annotations in the revision template metadata, not the top-level Service metadata. Annotations at the Service level will not propagate to new Revisions.

Configuration Reference#

Parameter	Per-Revision Annotation	Global ConfigMap Key	Default
Autoscaler Class	`autoscaling.knative.dev/class`	`pod-autoscaler-class`	`kpa.autoscaling.knative.dev`
Scaling Metric	`autoscaling.knative.dev/metric`	-	`concurrency`
Target Value	`autoscaling.knative.dev/target`	`container-concurrency-target-default`	100
Stable Window	`autoscaling.knative.dev/window`	`stable-window`	60s
Panic Window %	`autoscaling.knative.dev/panic-window-percentage`	`panic-window-percentage`	10.0
Panic Threshold %	`autoscaling.knative.dev/panic-threshold-percentage`	`panic-threshold-percentage`	200.0
Min Scale	`autoscaling.knative.dev/min-scale`	`min-scale`	0
Max Scale	`autoscaling.knative.dev/max-scale`	`max-scale`	0 (unlimited)

Knative Eventing#

Eventing provides the asynchronous, event-driven side of Knative. It enables loose coupling between event producers and consumers using the CloudEvents specification.

Architecture#

flowchart LR
    subgraph Sources["Event Sources"]
        API["ApiServerSource"]
        Ping["PingSource"]
        Kafka["KafkaSource"]
        Custom["Custom Source"]
    end

    subgraph Routing["Event Routing"]
        Broker["Broker"]
        T1["Trigger<br/>(filter: type=order.created)"]
        T2["Trigger<br/>(filter: type=user.signup)"]
    end

    subgraph Consumers["Sinks"]
        Svc1["Order Service<br/>(Knative Service)"]
        Svc2["Welcome Service<br/>(K8s Service)"]
    end

    Sources -->|CloudEvents HTTP POST| Broker
    Broker --> T1
    Broker --> T2
    T1 --> Svc1
    T2 --> Svc2

Core Components#

Component	Role	Description
Source	Event producer	Detects changes in external systems and generates CloudEvents
Broker	Event router	Receives events and forwards them based on Trigger filters
Trigger	Event filter	Defines which events a Broker delivers to a specific Sink
Sink	Event consumer	Processes received events (Knative Service, K8s Service, or HTTP endpoint)
Channel	Point-to-point delivery	Lower-level primitive for direct producer-to-consumer delivery
Subscription	Channel binding	Connects a producer to a consumer through a Channel

Broker Implementations#

Broker	Use Case	Features
InMemory	Development/testing only	Simple, no external dependencies, not production-safe
Kafka	Production workloads	Ordered delivery, binary content mode, reduced network hops
RabbitMQ	Production workloads	Familiar AMQP model, good for existing RabbitMQ users
MT-Channel-based	Basic production	Multi-tenant, uses Channels for routing

Note: The InMemory broker loses events on restart. Always use Kafka or RabbitMQ for production event-driven architectures.

Installing Kafka for Knative Eventing#

The Kafka integration provides three components: Kafka Broker, KafkaChannel, and KafkaSource. All require Knative Eventing to be installed first.

Prerequisites:

A running Apache Kafka cluster (use Strimzi for Kubernetes-native deployment)
Knative Eventing core already installed

Install the Kafka Broker:

# Install the Kafka controller
kubectl apply -f https://github.com/knative-extensions/eventing-kafka-broker/releases/latest/download/eventing-kafka-controller.yaml

# Install the Kafka Broker data plane
kubectl apply -f https://github.com/knative-extensions/eventing-kafka-broker/releases/latest/download/eventing-kafka-broker.yaml

# Verify components are running
kubectl get pods -n knative-eventing -l app=kafka-controller
kubectl get pods -n knative-eventing -l app=kafka-broker-receiver
kubectl get pods -n knative-eventing -l app=kafka-broker-dispatcher

Install the KafkaSource:

# Install the Apache Kafka Source
kubectl apply -f https://github.com/knative-extensions/eventing-kafka-broker/releases/latest/download/eventing-kafka-source.yaml

Alternatively, if you use the Knative Operator, add the source to your KnativeEventing CR:

spec:
  source:
    kafka:
      enabled: true

Or via the kn CLI:

kn operator install --source kafka

Key ConfigMaps for Kafka:

ConfigMap	Purpose
`config-kafka-broker-data-plane`	Producer/consumer settings like commit intervals
`config-kafka-features`	Broker interaction with Kafka clusters
`kafka-channel`	KafkaChannel instance configuration
`kafka-config-logging`	Log levels for Kafka data plane components

Tip: The Kafka Broker uses binary content mode by default, mapping CloudEvent attributes directly to Kafka record headers. This makes it compatible with systems that do not natively understand CloudEvents.

Installing RabbitMQ for Knative Eventing#

The RabbitMQ integration provides the RabbitMQ Broker and RabbitMQSource for event routing and consumption.

Prerequisites:

Knative Eventing core already installed
cert-manager v1.5.4 or higher
RabbitMQ Messaging Topology Operator
A running RabbitMQ instance (use the RabbitMQ Cluster Kubernetes Operator for Kubernetes-native deployment)

Install the RabbitMQ Broker:

# Install the RabbitMQ controller for Knative
kubectl apply -f https://github.com/knative-extensions/eventing-rabbitmq/releases/latest/download/rabbitmq-broker.yaml

# Verify components are running
kubectl get pods -n knative-eventing -l app=rabbitmq-broker-controller
kubectl get pods -n knative-eventing -l app=rabbitmq-broker-webhook

Install the RabbitMQ Source:

Using the Knative Operator, add the source to your KnativeEventing CR:

spec:
  source:
    rabbitmq:
      enabled: true

Or via the kn CLI:

kn operator install --source rabbitmq

Choosing Between Kafka and RabbitMQ#

Aspect	Kafka Broker	RabbitMQ Broker
Ordering	Ordered delivery via partitioning	No guaranteed ordering
Throughput	Higher throughput, log-based	Lower throughput, queue-based
Content Mode	Binary (CloudEvent attrs in headers)	Structured (full CloudEvent in body)
Replay	Supports event replay from offset	No replay after consumption
Dependencies	Kafka cluster (Strimzi)	RabbitMQ + Topology Operator + cert-manager
Best For	High-volume event streaming, audit trails	Request/reply patterns, existing RabbitMQ users

Bottom line: Choose Kafka for high-throughput, ordered event streaming with replay capabilities. Choose RabbitMQ if your team already runs RabbitMQ or you need simpler request/reply patterns.

Converting a Kubernetes Deployment to Knative#

One of Knative’s strengths is how much boilerplate it eliminates. Here’s what the conversion looks like.

Suitability Check#

Before converting, verify your workload meets these criteria:

HTTP-triggered -All work must be initiated by HTTP requests
Stateless -No local state; use external storage
Volumes -Only Secret and ConfigMap volumes are supported

Before: Traditional Kubernetes#

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app
spec:
  replicas: 3
  selector:
    matchLabels:
      app: my-app
  template:
    metadata:
      labels:
        app: my-app
    spec:
      containers:
        - name: my-app
          image: gcr.io/my-project/my-app:v1
          ports:
            - containerPort: 8080
          resources:
            requests:
              cpu: 100m
              memory: 128Mi
            limits:
              cpu: 500m
              memory: 256Mi
---
apiVersion: v1
kind: Service
metadata:
  name: my-app
spec:
  selector:
    app: my-app
  ports:
    - protocol: TCP
      port: 80
      targetPort: 8080
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: my-app
spec:
  rules:
    - host: my-app.example.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: my-app
                port:
                  number: 80

After: Knative Service#

apiVersion: serving.knative.dev/v1
kind: Service
metadata:
  name: my-app
spec:
  template:
    metadata:
      annotations:
        autoscaling.knative.dev/min-scale: "1"
        autoscaling.knative.dev/max-scale: "10"
    spec:
      containers:
        - image: gcr.io/my-project/my-app:v1
          resources:
            requests:
              cpu: 100m
              memory: 128Mi
            limits:
              cpu: 500m
              memory: 256Mi

What changed:

3 resources → 1 -Deployment, Service, and Ingress are replaced by a single Knative Service
No replicas -The autoscaler manages pod count based on traffic
No selectors or labels -Knative handles pod identity and routing
No port definitions -Knative automatically configures networking
Automatic TLS and DNS -Configured at the platform level, not per-service

Traffic Splitting for Canary Deployments#

Knative makes canary rollouts trivial with the traffic block:

apiVersion: serving.knative.dev/v1
kind: Service
metadata:
  name: my-app
spec:
  template:
    spec:
      containers:
        - image: gcr.io/my-project/my-app:v2
  traffic:
    - revisionName: my-app-v1
      percent: 90
    - latestRevision: true
      percent: 10
      tag: canary

This sends 90% of traffic to the stable revision and 10% to the new canary. The tag: canary creates a dedicated URL (canary.my-app.example.com) for testing the new version directly.

Using the kn CLI makes this even simpler:

# Update traffic split
kn service update my-app --traffic my-app-v1=90 --traffic @latest=10

# Roll forward to 100%
kn service update my-app --traffic @latest=100

Installation#

Prerequisites#

Kubernetes cluster (supported version)
kubectl with cluster-admin permissions
kn CLI (recommended)
Single-node: 6 CPUs, 6 GB RAM, 30 GB disk minimum
Multi-node: 2 CPUs, 4 GB RAM, 20 GB disk per node

Installation Methods#

Method	Best For	Complexity	Updates
YAML Manifests	GitOps (Flux/ArgoCD)	Higher	Manual
Knative Operator	Automated management	Medium	Automated via CRs
kn CLI Plugin	CLI-driven workflows	Lowest	CLI commands

Example: YAML Install with Kourier#

# Install Serving CRDs and core
kubectl apply -f https://github.com/knative/serving/releases/latest/download/serving-crds.yaml
kubectl apply -f https://github.com/knative/serving/releases/latest/download/serving-core.yaml

# Install Kourier networking layer
kubectl apply -f https://github.com/knative/net-kourier/releases/latest/download/kourier.yaml

# Configure Knative to use Kourier
kubectl patch configmap/config-network \
  --namespace knative-serving \
  --type merge \
  --patch '{"data":{"ingress-class":"kourier.ingress.networking.knative.dev"}}'

# Install Eventing CRDs and core
kubectl apply -f https://github.com/knative/eventing/releases/latest/download/eventing-crds.yaml
kubectl apply -f https://github.com/knative/eventing/releases/latest/download/eventing-core.yaml

# Verify installation
kubectl get pods -n knative-serving
kubectl get pods -n knative-eventing

Networking Options#

Option	Best For	Features
Kourier	Lightweight default	Simple HTTP routing, low resource overhead
Istio	Service mesh requirements	mTLS, authorization policies, observability
Contour	Multi-team delegation	Envoy-based, HTTPProxy resources

DNS Configuration#

Magic DNS (sslip.io): Automatic DNS using the default-domain Job. Good for development.
Real DNS: Configure a wildcard A/CNAME record pointing to your ingress IP, then update config-domain ConfigMap.
No DNS: Use curl with Host headers for evaluation.

Best Practices and Common Pitfalls#

Pitfalls to Avoid#

Pitfall	Problem	Solution
Editing `_example` key in ConfigMaps	Changes are ignored; webhook monitors checksum	Create new keys at the root level of the ConfigMap data
Annotations on Service metadata	Won’t propagate to Revisions	Place annotations in the revision template metadata
Low hard concurrency limits	Causes request buffering and unnecessary cold starts	Use soft limits (target) for most cases; only set hard limits when strictly necessary
Skipping minor versions on upgrade	Operator only supports last 3 minor releases	Upgrade one minor version at a time (1.18 → 1.19 → 1.20)
Using InMemory broker in production	Events lost on restart	Use Kafka or RabbitMQ brokers for production workloads

Production Best Practices#

Use GitOps Store YAML manifests in Git with Flux or ArgoCD for auditability and version control.
Monitor with OpenTelemetry Knative integrates with Prometheus, Grafana, and Jaeger out of the box.
Set appropriate scale bounds Use min-scale: "1" for latency-sensitive services to avoid cold starts.
Choose Kourier unless you need Istio Kourier is lighter weight and sufficient for most use cases.
Validate ConfigMaps Use kubectl apply --dry-run=server before applying changes.
Test autoscaling behavior Load test to verify panic mode thresholds and scale-up times match your SLOs.

Monitoring the Queue-Proxy#

The Queue-Proxy sidecar is your window into Knative’s autoscaling decisions. It exposes metrics on:

Request concurrency -In-flight requests per pod
Request count -RPS per pod
Response latency -Broken down by application vs infrastructure time
Queue depth -Requests waiting to be processed

These metrics are reported to the Autoscaler and can be scraped by Prometheus for dashboards and alerting.

When to Use Knative#

Use Knative When:#

You need serverless on Kubernetes without cloud vendor lock-in
Your workloads are HTTP-triggered and stateless
You want scale-to-zero to optimize resource costs
You’re building event-driven architectures with CloudEvents
You need to serve AI/ML inference with GPU scaling
Your team operates multi-cloud or on-premises Kubernetes clusters

Don’t Use Knative When:#

Your workloads are stateful (databases, message queues)
You’re fully invested in a single cloud and Lambda/Cloud Run works fine
Your team is small and managing additional CRDs adds operational overhead
You need CPU/memory-based autoscaling exclusively (standard HPA is simpler)
Your applications are long-running background workers, not request-driven

Conclusion#

Knative fills a critical gap in the Kubernetes ecosystem: it brings serverless developer experience to platform teams that need to maintain control over their infrastructure. The combination of Serving’s autoscaling (including scale-to-zero), Eventing’s CloudEvents-based routing, and Functions’ simplified development workflow creates a comprehensive serverless platform that runs anywhere Kubernetes does.

For platform engineers, the key takeaways are:

Start with Serving -It delivers the most immediate value through automatic scaling and simplified deployments
Use KPA over HPA unless you specifically need CPU-based scaling -KPA’s scale-to-zero and panic mode are purpose-built for request-driven workloads
Adopt Eventing incrementally -Start with a Kafka or RabbitMQ broker and add sources as your event-driven patterns mature
Choose your networking layer carefully -Kourier for simplicity, Istio only if you need mTLS and advanced policies
Invest in observability -The Queue-Proxy metrics are essential for understanding autoscaling behavior and debugging latency issues

Knative won’t replace every workload on your cluster, but for the HTTP-triggered, stateless services that make up the majority of modern microservices, it dramatically reduces operational complexity while giving your developers the serverless experience they want.

Hands-On Demo Repository#

I’ve built a complete demo that implements everything discussed in this post:

srekubecraft-demo/knative

What’s Included#

Directory	Contents
`app/`	Go HTTP server with `/`, `/burn` (CPU load), `/healthz`, and CloudEvents receiver
`kubernetes/cluster/`	Kind config: 3 nodes, Cilium CNI, no kube-proxy
`kubernetes/serving/`	Knative Service manifests for all three Serving demos
`kubernetes/eventing/`	Broker, PingSource, ApiServerSource, Triggers
`kubernetes/flux/`	Flux Kustomizations for GitOps-based Knative install

Four Demos#

Demo	Task	What You’ll See
Serving basics	`task demo:serving`	Deploy a service, test it, watch it scale to zero after 60s of idle
Autoscaling	`task demo:autoscale`	KPA with target=5, generate load with `hey`, watch pods scale up
Canary traffic split	`task demo:canary`	Deploy v1, update to v2 with 90/10 split, send requests to see distribution
Eventing	`task demo:eventing`	Broker + PingSource + ApiServerSource routing events to a sink

Quick Start#

The demo uses Taskfile for automation with two setup paths:

git clone https://github.com/nicknikolakakis/srekubecraft-demo.git
cd srekubecraft-demo/knative

# Direct install: Kind + Cilium + Knative via kubectl
task setup

# Run any demo
task demo:serving
task demo:autoscale
task demo:canary
task demo:eventing

# Cleanup
task clean

If you found this useful, you might also enjoy my related posts on Kubernetes scaling and platform engineering:

Knative - The Platform Engineer’s Guide to Serverless on Kubernetes