Kubernetes

Vertical Scaling in Kubernetes: Right-Size Your Pods with VPA

Vertical scaling makes individual pods bigger by allocating more CPU and memory, rather than adding more pods. The Vertical Pod Autoscaler (VPA) analyzes actual resource usage and recommends or automatically adjusts pod resource requests. This is essential for workloads that cannot scale horizontally, and for optimizing resource efficiency across all workloads.

Need this done for your project?

We implement, you ship. Async, documented, done in days.

Start a Brief

How the Vertical Pod Autoscaler Works

VPA has three components: the Recommender analyzes metrics-server data and computes optimal resource requests, the Admission Controller mutates new pods to apply recommendations, and the Updater evicts pods that need resizing so they are recreated with new resources. VPA operates in three modes: `Off` (recommendations only, no changes), `Initial` (set resources only when pods are created), and `Auto` (evict and recreate pods to apply new resources). Start with `Off` to observe recommendations, then move to `Auto` for non-critical workloads.

When to Use VPA vs HPA

Use VPA for workloads that cannot scale horizontally: single-instance databases, legacy monoliths, stateful applications with leader election, or batch jobs where adding replicas causes duplicate processing. Use HPA for stateless web services and API servers that benefit from more instances. Do not combine VPA and HPA on the same resource target (both targeting CPU), as they will conflict. You can, however, use VPA for memory right-sizing and HPA for CPU-based horizontal scaling, since they target different metrics.

Resource Right-Sizing Best Practices

VPA recommendations are based on historical usage data. Let VPA collect at least 7 days of data before trusting its recommendations. Set `minAllowed` and `maxAllowed` bounds in the VPA spec to prevent extreme adjustments. For JVM applications, be careful with memory: the JVM pre-allocates heap, so actual RSS usage may not change much with load. Set memory requests to the JVM heap size plus 50% for non-heap memory. For Node.js and Python applications, VPA's memory recommendations work well since these runtimes allocate memory more dynamically. Monitor VPA-triggered evictions and ensure your pod disruption budgets allow them.

Why Anubiz Engineering

100% async — no calls, no meetings

Delivered in days, not weeks

Full documentation included

Production-grade from day one

Security-first approach

Post-delivery support included

Ready to get started?

Skip the research. Tell us what you need, and we'll scope it, implement it, and hand it back — fully documented and production-ready.

Start a Brief Kubernetes Deployment Service