Improving Cost Efficiency with Karpenter 1.0: An Upgrade Guide

Karpenter has emerged as a game-changer for Kubernetes cluster management, offering dynamic node provisioning and cost optimization. Originally created to change how Kubernetes clusters are scaled and managed, Karpenter was intended to provide a high-performance, flexible alternative to the Kubernetes Cluster Autoscaler. Over the past several years, however, it has now evolved into a more comprehensive node lifecycle manager that’s native to Kubernetes. With the release of Karpenter 1.0, organizations can now leverage stable APIs and enhanced features to automate infrastructure scaling with greater precision.

This guide explores the origin story of Karpenter, key features driving adoption, the basics of how to upgrade to Karpenter 1.x, and best practices for maximizing value in production environments.

A Steadily Maturing Autoscaler

Initially, the Fairwinds team deployed Karpenter as a custom solution for a specific client, but as it has matured, we’ve rolled it out to our other customers. AWS Karpenter was first introduced as ready for production in November 2021. It was built with AWS as an open source, high performance Kubernetes cluster autoscaler and has evolved. It was donated to the Cloud Native Computing Foundation (CNCF) through the Kubernetes Special Interest Group (SIG) on auto-scaling, and is now an open source tool fully supported in Amazon EKS and Azure AKS, with support for other cloud providers planned in the future.

Early adopters faced some challenges with Karpenter, including rapid API changes during the alpha/beta phases that required frequent adjustments. In addition, paid alternatives like Spotinst Ocean and CAST.AI offered comparable savings but lacked Kubernetes-native integration. With v1.0, Karpenter’s APIs are much more stable. This makes it a valuable standard add-on for all of Fairwinds’ clients due to its ability to deliver 30-50% Amazon Elastic Compute Cloud (Amazon Web Service EC2) cost reductions through intelligent consolidation and Spot instance utilization.

For those more familiar with Cluster Autoscaler, both provide automated scaling for Kubernetes clusters, but they differ significantly in architecture, flexibility, and operational efficiency. Cluster Autoscaler is a mature, widely adopted tool that operates at the node group level, adjusting the size of predefined node groups based on pending pods and user-specified scaling policies. This offers granular control and broad compatibility across multiple cloud providers and self-hosted environments, making it a good choice for organizations that require multi-cloud support and detailed placement strategies. However, Cluster Autoscaler can be more complex and slower to scale, because it relies on cloud provider abstractions, such as Auto Scaling Groups, and requires manual configuration of instance types and group sizes.

In contrast, Karpenter interacts directly with cloud provider APIs, dynamically provisioning the optimal node types in real time based on actual workload requirements. This allows for much faster and more cost-efficient scaling, advanced workload consolidation, and first-class support for Spot instances, reducing both operational overhead and infrastructure costs. Karpenter is particularly well-suited for AWS environments and excels in scenarios where rapid scaling and resource optimization are priorities, though it is less mature in multi-cloud support compared to Cluster Autoscaler.

Features Driving Karpenter Adoption

1. Enhanced Disruption Controls

Granular disruption budgets allow targeting specific scenarios (underutilized, drifted, empty), balancing cost savings with application stability.

2. Node Lifecycle Controls

The consolidateAfter parameter delays the ability for underutilized nodes to consolidate so that the service is able to handle sudden changes in traffic without inefficiently scaling during temporary workload spikes, while terminationGracePeriod paired with expireAfter ensure the cluster always has fresh nodes, which reduces the chances of having outdated configuration or undocumented manual long running nodes in the cluster.

3. Drift Management

Automated node replacement when infrastructure deviates from desired state (for example, outdated Amazon Machine Images (AMIs)) is now a stable feature. This simplifies the maintenance of the cluster nodes without the risk of manual changes existing on nodes.

4. Multi-Cloud Flexibility

While optimized for AWS, Karpenter 1.0 supports hybrid/multi-cloud environments, enabling consistent scaling across providers such as AWS, Alibaba Cloud, and Azure.

Upgrade Roadmap: v0.37.x to v1.1.x

Before you begin, please be aware that Karpenter v1.0 is a major release and contains breaking changes. Please review the full changelog before proceeding with the upgrade.

Ensure you are on a compatible version of Kubernetes as well as a version of the 0.37.x Karpenter release that supports backwards compatibility (in case the upgrade fails and a rollback is needed).
Once validated, upgrade from 0.37.x to 1.0.9, which supports both the v1 and v1beta1 APIs. Ensure that the webhook is enabled in the Custom Resource Definitions (CRDs) and is pointing to the correct namespace in which Karpenter is deployed.
Migrate your EC2Nodeclasses and Nodepools to the Karpenter v1 APIs. The changelog above highlights most of the changes that are taking place. Some of those changes include:
- Kubelet field has moved from the NodePool to the EC2NodeClass
- NodePool’s ConsolidationPolicy WhenUnderutilized renamed to WhenEmptyOrUnderutilized
- ExpireAfter has moved from the Spec.Disruption block to the Spec.Template.Spec block and is now driftable
- Ubuntu AMIFamily removed
- Many more changes have been documented in the changelog above
Once all the custom resources are migrated, add any additional policies to the node role
Finally, upgrade Karpenter from 1.0.9 to 1.1.x, which drops support for the v1beta1 APIs.

Validating Upgrade Success

Deploy a test workload to verify node provisioning:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: karpenter-test
spec:
  replicas: 5
  template:
    spec:
      nodeSelector:
        karpenter.sh/capacity-type: spot
      tolerations:
        - key: "karpenter"
          value: "true"
          effect: NoSchedule
      containers:
        - name: stress-test
          image: public.ecr.aws/nginx/nginx:latest
          resources:
            requests:
              cpu: "2"
              memory: 4Gi

Monitor nodes spinning up via the following cli command:

kubectl get nodeclaims -owide

Best Practices for Ensuring Maximum Value

Environment-Specific Tuning
- Use consolidationPolicy: WhenEmptyOrUnderutilized for clusters
- Set terminationGracePeriod: 1h for batch processing workloads
Security Hardening

Enable IMDSv2 and restrict pod-level metadata access
Implement OpenID Connect (OIDC) integration for Identity and Access Management( IAM) roles
Lock down AMIs in production clusters by pinning to specific versions

Cost Monitoring

Tag nodes with karpenter.sh/node-pool for granular cost allocation
Combine with Fairwinds Insights, OpenCost, or Datadog for cross-cluster visibility

Stability

Ensure Karpenter runs on a node that is not managed by Karpenter to prevent scheduling issues
Enable interruption handling when utilizing spot nodes for graceful shutdown of workloads

Real-World Impact for Karpenter Users

Karpenter 1.x brings significant benefits to Kubernetes production environments, including dynamic, real-time node provisioning, cost optimization through intelligent workload consolidation and Spot Instance usage, and enhanced disruption controls for maintaining high availability and reliability.

These features allow teams to efficiently scale infrastructure, reduce operational overhead, and ensure workloads are always matched with the most appropriate resources. However, realizing the full value of Karpenter—and Kubernetes at large—requires ongoing attention to best practices, such as carefully managing AMI versions, setting accurate resource requests, and monitoring both infrastructure and workloads for efficiency and security.

For many organizations, the complexity of Kubernetes operations—including frequent upgrades, add-on management, and the need for 24/7 monitoring—can quickly outpace internal resources, especially as clusters scale or organizations adopt multi-cloud strategies.

If you need help, Fairwinds Managed Kubernetes-as-a-Service handles all Kubernetes upgrades, add-on updates, and security patching. It also includes expert SRE support, proactive monitoring, and rapid response for infrastructure issues, freeing your internal teams from the burden of day-to-day maintenance and allowing them to focus on innovation and business differentiation.