Post

Optimizing Kubernetes Autoscaling with Karpenter

Autoscaling is a fundamental feature of Kubernetes, ensuring that workloads receive the required compute resources dynamically. Traditionally, Kubernetes provides Horizontal Pod Autoscaler (HPA) and Vertical Pod Autoscaler (VPA) to scale applications based on CPU and memory metrics. However, these solutions often lead to additional resource consumption, necessitating a robust cluster autoscaler to manage node provisioning.

Cluster Autoscaler (CA) solutions are widely used across cloud providers like AWS, Azure, and GCP. However, they come with significant drawbacks, including slow provisioning times (up to 10 minutes) and rigid instance type management. Karpenter is an open-source autoscaler designed to overcome these limitations by dynamically provisioning nodes and optimizing cluster efficiency in real time.

Challenges of Traditional Cluster Autoscalers

When workloads scale dynamically, cluster autoscalers often struggle with the following challenges:

  • Fixed Node Pool Constraints: Managed node groups require predefined instance types, limiting flexibility.
  • Slow Scaling: Cluster Autoscaler follows a multi-step API process, significantly delaying new node provisioning.
  • Inefficient Resource Utilization: CA provisions nodes based on fixed scaling groups, often leading to underutilized resources.
  • Lack of Cost Optimization: Scaling is often inefficient, leading to unnecessary costs.

What is Karpenter?

Karpenter is a next-generation Kubernetes autoscaler that efficiently provisions and deprovisions nodes directly through the cloud provider’s API, eliminating the need for managed node groups. Key advantages of Karpenter include:

  • Faster Scaling: Bypasses the traditional node group API calls, reducing provisioning time.
  • Dynamic Instance Selection: Automatically provisions the optimal instance type based on workload requirements.
  • Improved Cost Efficiency: Uses consolidation strategies to optimize node utilization and reduce waste.
  • Multi-Cloud Support: Supports AWS, Azure, Alibaba Cloud, and Cluster API-based providers.

How Karpenter Works

Karpenter continuously monitors unschedulable pods and provisions nodes that meet their constraints. It operates in four key phases:

  1. Monitoring: Watches for pending pods that cannot be scheduled due to resource constraints.
  2. Evaluation: Matches pod requirements with available instance types, taking into account CPU, memory, affinity, and taints.
  3. Provisioning: Launches the most efficient compute resources directly through the cloud provider’s API.
  4. Deprovisioning: Detects underutilized nodes and consolidates workloads to reduce unnecessary compute usage.

Core Karpenter Components

Karpenter introduces new Custom Resource Definitions (CRDs) to manage autoscaling:

  • Node Class: Defines configurations for cloud-specific nodes (e.g., EC2 for AWS, AKS for Azure).
  • Node Pool: Defines constraints for provisioning and scaling nodes dynamically.
  • Node Claim: Created dynamically by Karpenter when new nodes are required, based on workload demand.

Deploying Karpenter

1. Install Karpenter

1
2
3
4
5
6
7
8
9
# Ensure you are logged out of public ECR registry before logging in again
helm registry logout public.ecr.aws

# Install Karpenter using Helm with the provided values.yaml file
helm upgrade --install karpenter oci://public.ecr.aws/karpenter/karpenter \
  --namespace kube-system \
  --create-namespace \
  --values values.yaml \
  --wait

2. Configure a Node Pool

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: apps
spec:
  template:
    metadata:
      labels:
        app-type: default
    spec:
      nodeClassRef:
        name: ec2nodeclass-apps
        kind: EC2NodeClass
        group: karpenter.k8s.aws
      requirements:
        - key: "node.kubernetes.io/instance-type"
          operator: In
          values:
            - m5.4xlarge
            - c5.4xlarge
            - r5.2xlarge
        - key: "karpenter.sh/capacity-type"
          operator: In
          values:
            - on-demand
        - key: "kubernetes.io/arch"
          operator: In
          values:
            - amd64
  limits:
    cpu: "32"
    memory: "128Gi"
  disruption:
    consolidateAfter: 30s
    expireAfter: null

3. Deploy Workloads

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
apiVersion: apps/v1
kind: Deployment
metadata:
  name: example-app
spec:
  replicas: 3
  selector:
    matchLabels:
      app: example-app
  template:
    metadata:
      labels:
        app: example-app
    spec:
      nodeSelector:
        app-type: default
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
              - matchExpressions:
                  - key: "node.kubernetes.io/instance-type"
                    operator: In
                    values:
                      - m5.4xlarge
                      - c5.4xlarge
                      - r5.2xlarge
                  - key: "karpenter.sh/capacity-type"
                    operator: In
                    values:
                      - on-demand
        podAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            - labelSelector:
                matchLabels:
                  app: example-app
              topologyKey: "kubernetes.io/hostname"
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
            - weight: 100
              podAffinityTerm:
                labelSelector:
                  matchLabels:
                    app: example-app
                topologyKey: "topology.kubernetes.io/zone"
      tolerations:
        - key: "dedicated"
          operator: "Equal"
          value: "apps"
          effect: "NoSchedule"
      containers:
        - name: example-container
          image: nginx
          resources:
            requests:
              cpu: "500m"
              memory: "512Mi"
            limits:
              cpu: "1"
              memory: "1Gi"

Observability in Karpenter

Karpenter provides observability through metrics endpoints, allowing you to monitor node provisioning efficiency. By integrating with Prometheus and Grafana, you can track key metrics such as:

  • Pending pods waiting for scheduling
  • Node provisioning times
  • Resource utilization across nodes
  • Consolidation efficiency
1
sum(kube_pod_status_phase{phase="Pending"}) by (namespace)
1
histogram_quantile(0.95, rate(karpenter_provisioning_duration_seconds_bucket[5m]))
1
2
3
sum(rate(node_cpu_seconds_total{mode!="idle"}[5m])) by (node)
/
sum(kube_node_status_capacity_cpu_cores) by (node) * 100
1
2
3
sum(kube_node_status_allocatable_memory_bytes - kube_node_status_allocatable_memory_bytes{job="node-exporter"})
/
sum(kube_node_status_allocatable_memory_bytes) * 100
1
sum(increase(karpenter_nodes_deleted_total[5m]))

Karpenter vs Cluster Autoscaler

FeatureKarpenterCluster Autoscaler
Provisioning TimeSeconds (Direct API Calls)Minutes (Multi-layer API)
Node Type FlexibilityDynamic & AdaptivePredefined Node Pools
Cost OptimizationYes (Spot Instances, Consolidation)Limited
Multi-Cloud SupportAWS, Azure, Alibaba, Cluster APIAWS, GCP, Azure
Node ConsolidationYesNo

Advanced Use Cases for SREs

Karpenter provides additional capabilities that are useful for Site Reliability Engineers (SREs) managing large-scale infrastructure:

  • Custom Node Expiration Policies: Automatically recycle nodes based on security requirements.
  • Drift Detection and Automated Remediation: Detect configuration drifts and replace outdated nodes.
  • Spot Instance Optimization: Configure graceful fallback to on-demand instances when spot capacity is unavailable.
  • Multi-Region Failover Strategies: Define region-based affinity rules to optimize failover mechanisms.
  • Custom Scheduling Algorithms: Use Karpenter’s flexible configuration to optimize for latency, cost, or performance.

Demo: Hands-on with Karpenter

To help you get started with Karpenter, I have prepared a demo repository that showcases a complete setup, including:

  • IaC (via OpenTofu) configuration to provision an EKS cluster.
  • Helm values for Karpenter installation with optimized settings.
  • Node Pool and EC2 Node Class configurations for efficient autoscaling.
  • Example workloads to test Karpenter’s dynamic provisioning.

GitHub Repository 🔗 Karpenter Demo Repository

Conclusion

Karpenter is a game-changer for Kubernetes autoscaling, addressing the shortcomings of traditional Cluster Autoscaler solutions. With its direct API calls, faster provisioning, dynamic instance selection, and intelligent workload consolidation, Karpenter enhances both performance and cost efficiency in cloud-native environments.

For SRE teams looking to improve cluster efficiency, reduce provisioning times, and optimize compute costs, Karpenter is a powerful alternative to traditional scaling solutions.

Karpenter

This post is licensed under CC BY 4.0 by the author.