If you’ve ever deployed to Kubernetes and seen users briefly hit a blank page or a 502 Bad Gateway error, you’re not alone. I ran into this exact issue on a production GKE cluster running a Next.js frontend with multiple replicas behind a Google Cloud Load Balancer.
The symptom: during a rolling update, for a few seconds, some users would see a black screen with a “failed upstream” message. Then it would resolve on its own. Annoying, intermittent, and hard to reproduce locally.
The root cause is a race condition in how Kubernetes terminates pods.
How pod termination works Link to heading
When Kubernetes decides to terminate a pod (during a rolling update, scale-down, or node drain), it does two things in parallel:
- Sends
SIGTERMto the container’s main process - Removes the pod from the Endpoints object (which tells Services and load balancers to stop sending traffic)
The key word is in parallel. These two operations don’t wait for each other. Here’s what the timeline looks like:
t=0 Kubernetes marks pod as Terminating
├── SIGTERM sent to container process
└── Pod removal from Endpoints begins (async)
├── kube-proxy updates iptables rules
└── Cloud load balancer health check notices (SLOW)
The problem is that step 1 can complete before step 2 propagates. Your application receives SIGTERM, starts shutting down (closing listeners, refusing new connections), but the load balancer doesn’t know yet. It keeps sending traffic to a pod that’s already shutting down. Those requests get a connection refused or a timeout — which the load balancer surfaces as a 502.
The race condition in detail Link to heading
Let’s trace a concrete scenario on GKE with a Google Cloud Load Balancer:
- You push a new image, triggering a rolling update
- Kubernetes creates a new pod, waits for it to be Ready
- Kubernetes marks an old pod as Terminating
- SIGTERM is sent to the old pod’s process immediately
- The Endpoints controller removes the pod from the Service’s endpoint list
- kube-proxy updates iptables on all nodes
- The GKE load balancer’s health check runs on its configured interval (every 10 seconds in our case)
- Only after the health check fails does the load balancer stop routing to the old pod’s node
Between steps 4 and 8, there’s a window where:
- The application has received SIGTERM and may have already stopped listening
- The load balancer is still sending requests to that pod
- Those requests fail with 502
The GKE load balancer is especially susceptible because it operates at the infrastructure level, outside the Kubernetes networking stack. It doesn’t watch the Endpoints API — it relies on its own HTTP health checks, which only run periodically.
The fix: preStop hooks Link to heading
The solution is deceptively simple. Add a preStop hook that sleeps for a few seconds before the container receives SIGTERM:
lifecycle:
preStop:
exec:
command: ["/bin/sh", "-c", "sleep 10"]
This changes the termination timeline to:
t=0 Kubernetes marks pod as Terminating
├── preStop hook starts (sleep 10)
└── Pod removal from Endpoints begins (async)
├── kube-proxy updates iptables
└── Load balancer health check detects pod is gone
t=10 preStop hook completes
└── SIGTERM sent to container process
└── Application begins graceful shutdown
The sleep gives the entire networking stack time to converge. By the time SIGTERM actually reaches your application, the load balancer has already stopped sending traffic to it. No more 502s.
The full picture Link to heading
The preStop hook alone isn’t enough if your other settings don’t support it. Here’s the complete configuration I ended up with:
spec:
terminationGracePeriodSeconds: 45
containers:
- name: frontend
lifecycle:
preStop:
exec:
command: ["/bin/sh", "-c", "sleep 10"]
startupProbe:
httpGet:
path: /api/healthz
port: http
initialDelaySeconds: 5
periodSeconds: 5
failureThreshold: 12
readinessProbe:
httpGet:
path: /api/healthz
port: http
periodSeconds: 10
failureThreshold: 3
livenessProbe:
httpGet:
path: /api/healthz
port: http
periodSeconds: 30
failureThreshold: 3
Let me explain each piece:
terminationGracePeriodSeconds: 45 — This is the total budget Kubernetes gives the pod to shut down. It must be longer than your preStop sleep + the time your application needs to drain connections. With a 10s sleep, you still have 35 seconds for graceful shutdown.
startupProbe — This is the counterpart to preStop, but for pod startup. Without it, the readiness probe starts checking immediately, and a slow-starting application (like Next.js which needs to compile routes) might fail health checks and get killed before it’s ready. The startup probe gives it up to 60 seconds (12 failures × 5s period) to initialize before liveness/readiness probes kick in.
readinessProbe on /api/healthz — Using a dedicated health endpoint instead of the root / path is important. The root path might render a full page, hit the database, or return a large response. A health endpoint should be a lightweight check that returns 200 when the app is ready to serve traffic. When the readiness probe fails, Kubernetes removes the pod from the Service endpoints — this is the in-cluster equivalent of what the load balancer does externally.
livenessProbe — This restarts the pod if the application becomes unresponsive. Keep the interval longer and the threshold higher than the readiness probe — you don’t want to restart pods aggressively, just catch actual deadlocks.
Rolling update strategy Link to heading
The deployment strategy matters too:
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
maxUnavailable: 0 means Kubernetes must bring up a new pod and wait for it to pass readiness checks before terminating an old one. This guarantees that during a rollout, the total number of ready pods never drops below the desired count. Combined with the preStop hook, it means:
- New pod starts, passes startup probe, then readiness probe
- Old pod marked for termination
- preStop sleep gives the network time to deregister the old pod
- SIGTERM sent, application drains gracefully
- No moment where traffic has nowhere to go
GKE-specific: HealthCheckPolicy Link to heading
If you’re on GKE using the Gateway API, the cloud load balancer runs its own health checks independently of Kubernetes probes. By default, it might check the root path / or use settings that don’t match your readiness probe. You can align them with a HealthCheckPolicy:
apiVersion: networking.gke.io/v1
kind: HealthCheckPolicy
metadata:
name: frontend-healthcheck
spec:
default:
checkIntervalSec: 10
timeoutSec: 3
healthyThreshold: 1
unhealthyThreshold: 3
config:
type: HTTP
httpHealthCheck:
portSpecification: USE_SERVING_PORT
requestPath: /api/healthz
targetRef:
group: ""
kind: Service
name: frontend
This ensures the GKE load balancer checks the same /api/healthz endpoint your Kubernetes probes use, with matching intervals. Without this, you can have a situation where Kubernetes thinks the pod is ready but the load balancer hasn’t marked it healthy yet (or vice versa).
Why 10 seconds? Link to heading
The sleep duration needs to cover the time it takes for the GKE load balancer to stop routing traffic to the pod. The load balancer relies on its own HTTP health checks, which run on a configured interval. With checkIntervalSec: 10 and unhealthyThreshold: 3, the worst case is 30 seconds before the load balancer marks the backend unhealthy.
I use 10 seconds because it reliably covers the propagation delay on our GKE setup. After applying this change, the intermittent 502s during rolling deployments disappeared completely.
Don’t go too high — every second of preStop sleep adds to your deployment time. With 10 replicas and a 10-second sleep, a rolling update takes at least 100 seconds longer than it needs to.
Summary Link to heading
The 502-during-deployment problem is a race condition between two parallel processes: your application shutting down, and the network stack deregistering the pod. The fix is straightforward:
preStop: sleep 10— delays SIGTERM so the network has time to catch upterminationGracePeriodSeconds— must be > preStop sleep + drain timemaxUnavailable: 0— ensures new pods are ready before old ones terminate- Dedicated health endpoint — consistent checks across probes and load balancer
HealthCheckPolicy(GKE) — aligns cloud LB health checks with Kubernetes probes
A one-line preStop hook eliminates an entire class of deployment-related downtime.