Performance & Optimization

Kubernetes Resource Optimization — Stop Overpaying for Idle CPU and Memory

Your Kubernetes pods request 500m CPU and 512Mi memory because someone copied that from a blog post. Actual usage? 50m CPU and 128Mi memory. You are paying for 10x more compute than you need. We analyze actual resource consumption, right-size requests and limits, configure autoscaling, and restructure node pools to eliminate waste without risking stability.

Need this done for your project?

We implement, you ship. Async, documented, done in days.

Start a Brief

The Kubernetes Cost Problem

Kubernetes resource management has a built-in incentive problem. Developers set resource requests high to ensure their pods get scheduled and do not get OOM-killed. Operations sets limits even higher as a safety margin. The result is clusters running at 15-25% actual utilization while paying for 100% of the provisioned capacity.

The core issue is that requests reserve capacity on the node. If a pod requests 500m CPU but uses 50m, those 450m are reserved but unused — no other pod can use them. The node shows 90% allocated but 15% utilized. The cluster autoscaler sees 90% allocation and adds more nodes. Your bill grows while your pods idle.

Right-sizing is not as simple as setting requests equal to observed usage. Applications have variable load — a service that uses 50m CPU at baseline might spike to 400m during traffic peaks. Requests should be set to handle baseline plus normal variance. Limits should handle peaks. Getting this wrong in either direction causes problems: too low and pods get throttled or killed, too high and you waste money.

Our Optimization Process

Usage Analysis: We install Prometheus with custom recording rules that track actual CPU and memory usage per pod over a 7-14 day window. We analyze p50, p95, and p99 usage patterns to understand baseline, normal peaks, and outlier spikes. The Kubernetes metrics-server provides real-time data, but Prometheus gives us the historical view needed for right-sizing decisions.

Right-Sizing: We set CPU requests to p95 usage (handles normal load without throttling), CPU limits to 2-4x requests (allows burst but prevents runaway processes), memory requests to p99 usage (memory is incompressible — OOM kills are worse than CPU throttling), and memory limits equal to requests for critical services (prevents memory overcommit that can destabilize the node). We use Goldilocks or the Vertical Pod Autoscaler in recommendation mode to generate initial suggestions, then validate them against actual usage patterns.

VPA (Vertical Pod Autoscaler): For services with highly variable resource needs, we deploy VPA in "Auto" mode so it adjusts requests and limits based on observed usage. VPA restarts pods when it needs to adjust resources, so we configure it with appropriate update policies (only during maintenance windows or only when the change exceeds a threshold) to minimize disruption.

HPA (Horizontal Pod Autoscaler): For services that should scale horizontally rather than vertically, we configure HPA with custom metrics from Prometheus. CPU-based HPA is a fallback — we prefer scaling on request rate, queue depth, or application-specific metrics that better represent load. We configure appropriate scale-up and scale-down stabilization windows to prevent flapping.

Node Pool Optimization: We analyze the total resource footprint and choose node instance types that minimize waste. If your pods are memory-heavy, we use memory-optimized instances (AWS r-series). If CPU-heavy, compute-optimized (c-series). We configure multiple node pools for different workload types and use node affinity to schedule pods on appropriate nodes. Spot/preemptible nodes are used for fault-tolerant workloads to reduce costs further.

What You Get

A cost-optimized Kubernetes cluster with right-sized workloads:

Resource audit — per-pod usage analysis with requests vs. actual consumption comparison
Right-sized manifests — updated requests and limits based on observed usage data
VPA deployment — vertical autoscaling for variable workloads with appropriate update policies
HPA configuration — horizontal scaling on relevant metrics with stabilization windows
Node pool restructuring — instance types matched to workload profiles with spot integration
Cost dashboard — per-namespace and per-service cost tracking via Kubecost or custom Prometheus metrics
Savings report — documented before/after resource allocation with projected annual savings

Why Anubiz Engineering

100% async — no calls, no meetings

Delivered in days, not weeks

Full documentation included

Production-grade from day one

Security-first approach

Post-delivery support included

Ready to get started?

Skip the research. Tell us what you need, and we'll scope it, implement it, and hand it back — fully documented and production-ready.

Start a Brief Kubernetes Deployment Service