Kubernetes

Horizontal Scaling in Kubernetes: From Pods to Nodes

Horizontal scaling adds more instances of your application rather than making existing instances bigger. In Kubernetes, this means adding pod replicas (Horizontal Pod Autoscaler) and cluster nodes (Cluster Autoscaler). Together, they let your infrastructure grow and shrink with demand automatically.

Need this done for your project?

We implement, you ship. Async, documented, done in days.

Start a Brief

Pod-Level Horizontal Scaling

The Horizontal Pod Autoscaler (HPA) adjusts replica count based on metrics. Configure it with `kubectl autoscale deployment my-app --min=2 --max=20 --cpu-percent=70`. When average CPU across all pods exceeds 70% of their requested amount, HPA adds replicas. When usage drops, it removes them. The scaling algorithm is proportional: if current utilization is 140% of target, HPA doubles the replicas. Set a stabilization window to prevent flapping: `behavior.scaleDown.stabilizationWindowSeconds: 300` means HPA waits 5 minutes before scaling down, ensuring the load drop is sustained.

Node-Level Scaling with Cluster Autoscaler

When HPA wants to create new pods but no node has enough resources, the pods enter Pending state. The Cluster Autoscaler detects pending pods and provisions new nodes from the cloud provider. Configure it with `--scale-down-delay-after-add=10m` to avoid scaling down nodes that were just added, and `--scale-down-unneeded-time=10m` to wait before removing underutilized nodes. The autoscaler respects pod disruption budgets when draining nodes, ensuring your application stays available during scale-down events.

Designing Applications for Horizontal Scale

Not all applications scale horizontally without changes. Your application must be stateless or use external state stores (databases, caches, object storage). Sessions should be stored in Redis or a database, not in-memory. Background jobs should use distributed job queues (BullMQ, Celery, Sidekiq) with proper locking to prevent duplicate processing. Database connections should go through a connection pooler like PgBouncer to avoid exhausting connection limits as replicas increase. Health checks must be fast and independent so the load balancer can route traffic correctly across replicas.

Why Anubiz Engineering

100% async — no calls, no meetings

Delivered in days, not weeks

Full documentation included

Production-grade from day one

Security-first approach

Post-delivery support included

Ready to get started?

Skip the research. Tell us what you need, and we'll scope it, implement it, and hand it back — fully documented and production-ready.

Start a Brief Kubernetes Deployment Service