Kubernetes
Canary Deployments on Kubernetes: Ship Safely with Gradual Rollouts
Canary deployment releases a new version to a small percentage of users first, monitors key metrics, and gradually increases traffic if everything looks healthy. Unlike blue-green, which switches 100% of traffic at once, canary gives you time to detect issues before they affect all users. This is the preferred deployment strategy for high-traffic production systems.
Need this done for your project?
We implement, you ship. Async, documented, done in days.
Canary with Native Kubernetes Resources
A basic canary uses two Deployments behind a single Service. Deploy `my-app-stable` with 9 replicas and `my-app-canary` with 1 replica, both matching the Service's label selector. Kubernetes distributes traffic roughly proportionally: the canary receives about 10% of requests. To increase canary traffic, scale up the canary and scale down stable. This approach is simple but coarse-grained: you can only achieve traffic percentages that correspond to replica ratios, and it requires extra compute for the additional replicas. It works for simple use cases without a service mesh.
Canary with Istio Traffic Splitting
Istio VirtualService resources provide precise traffic splitting independent of replica count. Define two subsets in a DestinationRule (stable and canary) and set weights in the VirtualService: `weight: 95` for stable, `weight: 5` for canary. Adjust weights gradually: 5% to 10% to 25% to 50% to 100%. You can also pin specific users to the canary based on headers (e.g., internal users with a specific cookie), enabling targeted testing before broader rollout. This approach is more flexible than replica-based canary and does not waste compute on extra replicas.
Automated Canary Analysis with Argo Rollouts
Argo Rollouts automates the canary process end-to-end. Define steps in the Rollout spec: set weight to 5%, pause for 5 minutes, run an AnalysisTemplate that queries Prometheus for error rate and latency, then increase to 20%, pause, analyze again, and so on. If any analysis step fails (error rate > 1%, P99 latency > 500ms), the rollout automatically aborts and scales the canary to zero. This removes human judgment from the deployment process and catches regressions that would be invisible in manual monitoring.
Choosing Between Canary and Blue-Green
Canary is better for high-traffic services where you want gradual exposure and metric-based validation. It catches issues that only manifest under real user traffic at scale. Blue-green is better for low-traffic services where a percentage split would not generate enough traffic for meaningful metrics, or when you need instant atomic switchover with instant rollback. Some teams use canary for application services and blue-green for infrastructure components. The key factor is whether you have enough traffic to detect problems during a partial rollout.
Why Anubiz Engineering
100% async — no calls, no meetings
Delivered in days, not weeks
Full documentation included
Production-grade from day one
Security-first approach
Post-delivery support included
Ready to get started?
Skip the research. Tell us what you need, and we'll scope it, implement it, and hand it back — fully documented and production-ready.