Kubernetes

Canary Deployments on Kubernetes: Ship Safely with Gradual Rollouts

Canary deployment releases a new version to a small percentage of users first, monitors key metrics, and gradually increases traffic if everything looks healthy. Unlike blue-green, which switches 100% of traffic at once, canary gives you time to detect issues before they affect all users. This is the preferred deployment strategy for high-traffic production systems.

Need this done for your project?

We implement, you ship. Async, documented, done in days.

Start a Brief

Canary with Native Kubernetes Resources

A basic canary uses two Deployments behind a single Service. Deploy `my-app-stable` with 9 replicas and `my-app-canary` with 1 replica, both matching the Service's label selector. Kubernetes distributes traffic roughly proportionally: the canary receives about 10% of requests. To increase canary traffic, scale up the canary and scale down stable. This approach is simple but coarse-grained: you can only achieve traffic percentages that correspond to replica ratios, and it requires extra compute for the additional replicas. It works for simple use cases without a service mesh.

Canary with Istio Traffic Splitting

Istio VirtualService resources provide precise traffic splitting independent of replica count. Define two subsets in a DestinationRule (stable and canary) and set weights in the VirtualService: `weight: 95` for stable, `weight: 5` for canary. Adjust weights gradually: 5% to 10% to 25% to 50% to 100%. You can also pin specific users to the canary based on headers (e.g., internal users with a specific cookie), enabling targeted testing before broader rollout. This approach is more flexible than replica-based canary and does not waste compute on extra replicas.

Automated Canary Analysis with Argo Rollouts

Argo Rollouts automates the canary process end-to-end. Define steps in the Rollout spec: set weight to 5%, pause for 5 minutes, run an AnalysisTemplate that queries Prometheus for error rate and latency, then increase to 20%, pause, analyze again, and so on. If any analysis step fails (error rate > 1%, P99 latency > 500ms), the rollout automatically aborts and scales the canary to zero. This removes human judgment from the deployment process and catches regressions that would be invisible in manual monitoring.

Choosing Between Canary and Blue-Green

Canary is better for high-traffic services where you want gradual exposure and metric-based validation. It catches issues that only manifest under real user traffic at scale. Blue-green is better for low-traffic services where a percentage split would not generate enough traffic for meaningful metrics, or when you need instant atomic switchover with instant rollback. Some teams use canary for application services and blue-green for infrastructure components. The key factor is whether you have enough traffic to detect problems during a partial rollout.

Why Anubiz Engineering

100% async — no calls, no meetings

Delivered in days, not weeks

Full documentation included

Production-grade from day one

Security-first approach

Post-delivery support included

Ready to get started?

Skip the research. Tell us what you need, and we'll scope it, implement it, and hand it back — fully documented and production-ready.

Start a Brief Kubernetes Deployment Service