Zero Downtime Deployment: The Engineering Reality Behind Never Going Offline
Why “zero downtime” is less about deployment strategy and more about handling state, traffic, and real-world system behavior
You push code to production, users don’t notice.
No maintenance windows.
No “we’ll be back shortly” pages. ,
No late deploys timed for when traffic is lowest.
Just continuous, seamless updates flowing into a live system while customers keep clicking, buying, streaming, whatever they came to do.
That’s the promise of zero downtime deployment.
The reality is pretty awful. I’ve watched teams celebrate their first blue-green deployment only to discover their database migrations still take the whole system down. I’ve seen canary releases that looked perfect in staging catastrophically fail because production traffic patterns are nothing like synthetic tests. One team I worked with had “zero downtime” deployments that technically never dropped a request, but response times spiked so badly during rollouts that users experienced it as an outage anyway.
Zero downtime isn’t a single technique. It’s a combination of infrastructure patterns, deployment strategies, and operational discipline that all have to work together. Miss one piece and you’ve got a system that mostly doesn’t go down, which in production terms means it definitely goes down.
In order for that to happen, let’s understand what’s happening under the hood so that when it breaks, and it will break, you know where to look.
Hi I’m Maxine, a cloud infrastructure engineer who spends my days scaling databases, debugging production incidents, and writing about what actually works in production.
You can get a copy of my LLMs for Humans: From Prompts to Production (at 30% off right now) ←
Or for free when you become a paid subscriber.
It’s 20 chapters of practical applied AI with real production context, not theory. And it’ll help you get smarter about using AI tools in infrastructure workflows.
Checkout my work:
Plus, if you’re thinking about making a career move into cloud or DevOps and want a structured path to get there, get a copy of my The DevOps Career Switch Blueprint.
Okay, let’s get into it
The Deployment Strategies That Actually Work
There’s no single “zero downtime” architecture. There are patterns that enable it, each with different tradeoffs in complexity, cost, and failure modes.
Blue-Green Deployment
You maintain two identical production environments. Blue is live, serving all traffic. Green sits idle with the previous version. When you deploy, you push to Green, run your smoke tests, then flip the load balancer to point at Green. Blue becomes your rollback target.
The cutover looks like this:
Deploy to Green → Validate Green → Switch traffic → Green becomes Blue → Old Blue becomes new Green
Simple in concept. In practice, the “identical environments” part is where it gets expensive. You’re paying for double the compute capacity, even though half of it is sitting idle most of the time. For teams running lean, that’s a hard sell.
Canary Releases
Instead of all-or-nothing traffic switches, you route a small percentage of requests to the new version first. Maybe five percent. You watch error rates, latency percentiles, business metrics. If everything looks good, you gradually increase the percentage until the new version is serving all traffic.
5% → 25% → 50% → 100%
Each step includes a soak period. You’re not just checking that the new code doesn’t crash. You’re checking that it behaves correctly under real load over time.
Rolling Updates
This is what most Kubernetes deployments do by default. You have a pool of pods running v1. The deployment controller starts spinning up v2 pods while terminating v1 pods, maintaining a minimum number of healthy instances throughout. Traffic shifts gradually as the load balancer health checks detect the new pods.
Rolling updates are the cheapest option in terms of infrastructure overhead. You only need enough spare capacity to run a few extra pods during the transition, not a whole duplicate environment.
Feature Flags
This one’s different. The deployment itself is boring. The code ships disabled. You turn it on later through a flag system, either all at once or gradually per user segment. The deployment and the release become separate events.
I’ve seen teams combine these. Canary for the deployment, feature flags for the actual functionality. It’s belt and suspenders, but when you’re shipping changes to payment processing, belt and suspenders feels appropriate.
What Production Actually Looks Like
Let me walk through what a mature zero downtime setup involves. This isn’t theoretical. This is what you’re actually maintaining.
The Load Balancer Layer
Your load balancer needs to understand health. Not “is this port open” health. Real health. Can this instance serve requests correctly right now?
Health checks need to hit an endpoint that actually exercises your application stack. Database connection working? Cache reachable? Downstream services responding? If any of those are broken, the instance should fail its health check and stop receiving traffic.
Here’s a typical health check configuration in Terraform for an AWS ALB:
resource "aws_lb_target_group" "app" {
name = "app-tg"
port = 8080
protocol = "HTTP"
vpc_id = var.vpc_id
health_check {
enabled = true
healthy_threshold = 2
unhealthy_threshold = 3
timeout = 5
interval = 10
path = "/health/ready"
matcher = "200"
}
deregistration_delay = 30
}
That deregistration_delay matters more than most people realize. When an instance gets pulled from the target group, it keeps receiving in-flight requests for 30 seconds before the load balancer stops sending new ones. Too short and you drop active connections. Too long and your deployments feel slow.
The Application Layer
Your application needs to handle graceful shutdown. When it receives SIGTERM, it should stop accepting new requests, finish processing current ones, then exit cleanly.
In Go, that looks something like:
srv := &http.Server{Addr: ":8080", Handler: router}
go func() {
sigChan := make(chan os.Signal, 1)
signal.Notify(sigChan, syscall.SIGTERM, syscall.SIGINT)
<-sigChan
ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
defer cancel()
srv.Shutdown(ctx)
}()
If your application just dies on SIGTERM, every deployment drops requests. You’ll have zero downtime in your metrics because the load balancer eventually routes around it, but actual users will see errors.
The Database Layer
This is where zero downtime gets hard.
Schema migrations that lock tables will take your application down. Doesn’t matter how clever your deployment strategy is. If your migration runs ALTER TABLE users ADD COLUMN on a table with fifty million rows, and your database takes a table-level lock during that operation, you’re going down.
The pattern that works: additive, backward-compatible migrations only during deployment. Add columns, never remove them. Add tables, never drop them. Make new columns nullable or provide defaults. Then clean up later in a separate operation when you’ve confirmed the old code is gone.
Your migration and your deployment are different events, and they need to be sequenced carefully.
The Failure Modes That Bite You
The Health Check That Lies
Your health endpoint returns 200, but your application can’t actually serve traffic correctly. Maybe it can reach the database but the connection pool is exhausted. Maybe the cache is responding but returning stale data that breaks your business logic.
What’s happening: Most health checks are too shallow. They verify the process is running and can respond to HTTP requests, but they don’t verify the application can do useful work. The load balancer sees healthy instances. Users see errors.
Fix this by making your readiness endpoint actually test what matters. If your app needs a database, the health check should run a simple query. If it needs cache, verify you can read a known key. This adds latency to your health checks, so you may need to adjust your intervals, but a slower honest health check beats a fast lying one.
The Deployment That’s Faster Than the Load Balancer
You terminate old pods before the load balancer has finished draining connections from them. Kubernetes says the pod is gone. The load balancer hasn’t caught up yet. Traffic still routes to an IP that no longer exists.
What’s happening: There’s a race condition between your deployment controller and your ingress. Kubernetes marks the pod as terminating, removes it from the Service endpoints, and kills it. But the load balancer’s health checks run on an interval. If the pod dies before the next health check, the LB doesn’t know it’s gone.
The fix involves preStop hooks that delay container termination:
lifecycle:
preStop:
exec:
command: ["sleep", "15"]
That sleep gives your load balancer time to notice the pod is draining and stop sending it new traffic. Ugly? Yes. Necessary? Also yes.
The Canary That Looked Fine
Your canary ran for an hour at five percent traffic. Error rates were flat. Latency was normal. You rolled it out to 100%. Fifteen minutes later, alarms everywhere.
What’s happening: Your canary traffic wasn’t representative. Maybe that five percent skewed toward a particular user segment. Maybe the problematic code path only triggers for specific request patterns that didn’t appear in your sample. Maybe the bug only manifests under sustained load, and your canary never got enough traffic to trigger it.
You need to extend canary duration for changes touching critical paths. Watch business metrics, not just technical ones. And consider traffic shaping that ensures your canary gets a representative sample, not just random selection.
The Database Migration That Took Down Everything
You ran your migration as part of the deployment. It was supposed to be quick. It took eleven minutes. Your application couldn’t write to that table the entire time. Users experienced complete failure for any workflow touching that data.
What’s happening: Migrations that seem fast on your staging database with ten thousand rows behave very differently against production tables with tens of millions of rows. The operation that took 200 milliseconds in staging takes eleven minutes in production because it’s doing a full table scan or waiting for lock acquisition on a high-traffic table.
Always test migrations against production-scale data. Run them as separate operations from deployments. Consider online schema change tools like gh-ost or pt-online-schema-change that perform alterations without locking.
Annoying Tidbits
Infrastructure as Code is supposed to make deployments repeatable. It does. It also introduces its own failure modes.
State Drift During Deployment
You’re mid-deployment. Someone else runs a Terraform apply for an unrelated change. Their state file write conflicts with yours. Now your deployment is stuck in a partial state.
Use state locking. Always. S3 backend with DynamoDB locking is the standard for AWS:
terraform {
backend "s3" {
bucket = "mycompany-terraform-state"
key = "prod/app/terraform.tfstate"
region = "us-east-1"
dynamodb_table = "terraform-locks"
encrypt = true
}
}
The Auto Scaling Group Replacement Dance
You update your launch template. Terraform wants to replace the ASG. But replacement means destroying the old ASG before creating the new one.
Set create_before_destroy in your lifecycle block:
resource "aws_autoscaling_group" "app" {
# ... configuration ...
lifecycle {
create_before_destroy = true
}
}
This tells Terraform to spin up the new ASG, let it become healthy, then tear down the old one. Without it, you have a window where no ASG exists.
Target Group Association Ordering
Your new instances launch, but they’re not in the target group yet. Health checks haven’t passed. Traffic still routes to old instances. Then Terraform destroys the old instances before the new ones are healthy.
You need explicit dependencies or aws_autoscaling_attachment resources that ensure the new instances are registered and healthy before old ones get terminated. Terraform’s implicit dependency resolution doesn’t always get the ordering right here.
When It Actually Works
The quiet win that never showed up on any dashboard: I worked with a team that went from deploying weekly to deploying fifty times a week. Their bug rate didn’t change much. But their time-to-fix dropped dramatically. They could ship a fix in minutes instead of waiting for the next deployment window. Customers noticed. Not the deploys themselves. The responsiveness.
What’s the deployment strategy that’s worked best for your team, and what did you have to learn the hard way to make it reliable?
I’d love to hear about it in the comments.
With Love and DevOps,
Maxine
If you made it this far and you’re managing cloud infrastructure with Terraform, you might want to keep this one close too.
What Is Infrastructure as Code? A Beginner’s Guide to Terraform and Cloud Infrastructure
is where I start people who are new to IaC or who understand it conceptually but haven’t had to debug it in a real environment yet. It covers the mental model behind declarative infrastructure so that articles like this one make sense end to end, not just the code snippets.
And if you’re working with AI in your stack or trying to understand where LLMs actually fit in a production system without the hype, LLMs for Humans: From Prompts to Production is the guide I wish existed when I started. Written by an engineer for engineers, covering RAG, function calling, and the operational reality of running AI in real systems.
Let’s stay connected
I post about cloud infrastructure, DevOps, and AI in production a few times a week on LinkedIn. The real stuff: what I’m debugging, what I’m deploying, and the occasional thing that broke in a way nobody documented anywhere.
Come say hi. I actually respond.
Last Updated: May 2026
Sources and Further Reading
The Twelve-Factor App: Build, Release, Run
Kubernetes Documentation: Performing a Rolling Update
Martin Fowler: Blue-Green Deployment





