HTTP 502 Bad Gateway — Why Your Proxy Got an Invalid Upstream Response

Quick answer

💡A 502 Bad Gateway means the proxy or gateway received an invalid or no response from the upstream server. The nginx or load balancer itself is fine — the application server behind it is not. Check your upstream process logs first, not the nginx access log. Use the HTTP Request Builder to confirm whether the issue is at the proxy layer or the application layer.

Test HTTP Requests →

Error symptoms

✕502 Bad Gateway in browser with nginx default error page
✕upstream prematurely closed connection in nginx error_log
✕connect() failed (111: Connection refused) while connecting to upstream
✕AWS ALB target group shows Unhealthy targets
✕kubectl get pods shows CrashLoopBackOff or 0/1 READY
✕Docker Compose service returns 502 immediately after stack starts

Common causes

•Application process crashed or was OOM-killed — no one listening on the upstream port
•proxy_pass directive points to wrong host or port in nginx.conf
•Kubernetes pod failing readinessProbe so traffic is routed to a not-ready pod
•Docker Compose startup order: nginx starts before the app container is ready
•Database or Redis connection pool exhausted, app hangs, nginx times out
•Zero-downtime deployment: old pods terminated before new pods pass health check

When it happens

•Immediately after a deployment when the new app version crashes on startup
•Under high load when the upstream process runs out of memory and is killed
•During rolling Kubernetes restarts between pod termination and readiness
•In local Docker Compose when you run docker compose up without depends_on health checks
•After a misconfigured nginx reverse proxy change that points to a wrong upstream address

Examples and fixes

A typo in the upstream port means nginx connects to a port with nothing listening.

nginx proxy_pass points to wrong upstream port

❌ Wrong

# nginx.conf — wrong port
server {
  listen 80;
  location / {
    proxy_pass http://127.0.0.1:3001;  # app runs on 3000
    proxy_set_header Host $host;
    proxy_set_header X-Real-IP $remote_addr;
    proxy_connect_timeout 10s;
    proxy_read_timeout 30s;
  }
}

✅ Fixed

# nginx.conf — correct port
server {
  listen 80;
  location / {
    proxy_pass http://127.0.0.1:3000;  # matches app listen port
    proxy_set_header Host $host;
    proxy_set_header X-Real-IP $remote_addr;
    proxy_connect_timeout 10s;
    proxy_read_timeout 30s;
  }
}

nginx will log 'connect() failed (111: Connection refused)' when no process is listening on the target port. Always cross-reference proxy_pass with the PORT your application actually binds to. Run 'ss -tlnp | grep 3000' on the server to confirm what is listening before reloading nginx. The connection refused error in nginx error_log is the fastest path to diagnosing this specific 502 variant.

Without a health-check-based depends_on, nginx comes up and immediately gets 502 on the first requests.

Docker Compose: nginx starts before app is ready

❌ Wrong

# docker-compose.yml — no readiness check
services:
  nginx:
    image: nginx:alpine
    ports: ["80:80"]
    depends_on:
      - app
  app:
    build: .
    expose: ["3000"]
    environment:
      NODE_ENV: production

✅ Fixed

# docker-compose.yml — health-check dependency
services:
  nginx:
    image: nginx:alpine
    ports: ["80:80"]
    depends_on:
      app:
        condition: service_healthy
  app:
    build: .
    expose: ["3000"]
    environment:
      NODE_ENV: production
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:3000/health"]
      interval: 5s
      timeout: 3s
      retries: 5

depends_on with just a service name only waits for the container to start, not for the process inside it to be ready. Adding a healthcheck and condition: service_healthy makes Docker Compose wait until your application is actually serving traffic before nginx starts routing to it. Your app must expose a health endpoint that returns 2xx quickly — /health or /ready that checks database connectivity is the standard pattern.

Why proxies return 502 instead of the real error

When nginx, HAProxy, or an AWS Application Load Balancer returns a 502, it is reporting a problem with its upstream — the backend server it was trying to forward your request to. The proxy itself received either no response at all, or a response that was so malformed it could not be forwarded. Understanding this distinction matters because it tells you where to look: the nginx access log will show a 502, but the nginx error log and the upstream application log contain the actual failure reason.

The most common upstream failure is a crashed process. A Node.js application that throws an uncaught exception will exit, leaving nginx with nothing to connect to. On Linux you can verify this in seconds: run 'systemctl status app.service' to see if the process is running, or 'ss -tlnp | grep 3000' to confirm something is actually listening on the port nginx expects. If the port has nothing listening, the error log will read 'connect() failed (111: Connection refused)' — a clear signal the upstream is down.

Memory exhaustion is a subtler cause. When a Node.js or Python process is OOM-killed by the Linux kernel, it exits immediately without closing connections cleanly. nginx receives an abrupt connection reset from the upstream and records 'upstream prematurely closed connection while reading response header from upstream'. The kernel's OOM kill event appears in 'dmesg | grep -i oom' or 'journalctl -k | grep oom'. After an OOM kill, restarting the process fixes the immediate 502, but the real fix is increasing container memory limits or reducing memory usage in the application.

Resource pool exhaustion is the hardest variant to diagnose because the upstream process is still running. When a database connection pool is full, the application hangs waiting for a free connection instead of responding to nginx. nginx eventually reaches its proxy_read_timeout (default 60 seconds) and responds with 502. The upstream application log will show requests queuing or pool-full errors while nginx logs a timeout-related 502. Always check database metrics — active connections, wait queue length, and connection timeouts — whenever 502 errors appear intermittently under load.

In Kubernetes, 502 errors during deployments almost always come from traffic routing to terminating pods. During a rolling update, the old pod receives a SIGTERM and begins shutting down, but ingress controllers or load balancers may continue routing traffic to it for a few seconds. Setting a preStop sleep hook of 5 to 15 seconds gives the load balancer time to deregister the pod before the process actually stops. Pair this with a properly tuned terminationGracePeriodSeconds in your pod spec.

Step-by-step 502 diagnosis from logs

Start every 502 investigation in the nginx error log, not the access log. The access log records the 502 but not why. Run 'tail -f /var/log/nginx/error.log' and reproduce the request. The error log message tells you the exact failure mode: connection refused means nothing is listening, upstream prematurely closed means the upstream crashed mid-response, and upstream timed out means the upstream was too slow.

For Kubernetes deployments, kubectl is your primary diagnostic tool. Run 'kubectl get pods -n your-namespace' to see pod status and restart count. A pod in CrashLoopBackOff is crashing on startup — 'kubectl logs pod-name --previous' shows the logs from the last crash. A pod that shows 0/1 READY is still alive but has not passed its readinessProbe, so the Service has removed it from the endpoint pool. Run 'kubectl describe pod pod-name' and look at the Events section at the bottom — it will show probe failures, OOM kills, and image pull errors explicitly.

In the browser DevTools Network tab, open the failing request and look at the Response tab. A 502 from nginx will usually include the nginx error page HTML or a brief JSON error body if your proxy adds one. The Timing tab shows whether the connection was refused immediately (fast failure, suggesting the upstream is completely down) or timed out after 30 to 60 seconds (suggesting the upstream is hanging). This timing difference is enough to narrow down the cause before you look at any server logs.

For AWS ALB, navigate to EC2 > Target Groups in the AWS console. Select your target group and check the Targets tab — each registered instance or pod shows its health check status and the failure reason. 'Health checks failed with these codes: 502' or 'Connection refused to health check port' appear directly in the console. Run 'aws elbv2 describe-target-health --target-group-arn arn:...' from the CLI for scriptable checks.

When you suspect a proxy_pass misconfiguration, use 'nginx -t' to validate the config syntax, then run 'curl -v http://127.0.0.1:3000/health' directly on the server to confirm the upstream responds before nginx ever gets involved. Use /tools/http-request-builder to test the endpoint from outside the server network and compare responses. If the direct curl succeeds but nginx returns 502, the problem is in the nginx configuration itself — check proxy_pass host, port, and trailing slash handling.

Reliable fixes for each 502 root cause

Once you have identified the root cause from the logs, apply the targeted fix rather than a broad configuration change. For a crashed process, restart it and address the crash cause: add a process manager like systemd or PM2 with automatic restart, configure memory limits to prevent OOM kills, and ensure uncaught exceptions are logged and cause a clean exit rather than a hung process.

For nginx proxy_pass misconfiguration, correct the upstream address and port, then run 'nginx -s reload' to apply the change without dropping connections. A useful pattern is to define the upstream as a named block: 'upstream app { server 127.0.0.1:3000; }' and then reference it with 'proxy_pass http://app;'. This makes port changes in one place and also enables load balancing across multiple upstream instances later.

For connection pool exhaustion, increase the pool size in your application configuration and verify database-side connection limits are not lower. In PostgreSQL, check 'max_connections' in postgresql.conf and compare it against the sum of all application pool sizes. Use PgBouncer as a connection pooler if the application cannot reduce its pool demands. In Redis, ensure the application releases connections after each request — connection leaks are common when error handling paths skip the connection release.

For Kubernetes zero-downtime deployment issues, add a preStop lifecycle hook to your container spec. A simple 'sleep 10' in the preStop hook ensures the ingress controller removes the pod from rotation before the process starts shutting down. Combine this with a well-tuned readinessProbe that checks actual application health — hitting a database or cache — rather than just returning 200 from a process that is not yet fully initialized. Set failureThreshold and periodSeconds conservatively so the probe catches real failures without false positives.

For Docker Compose startup ordering, replace simple depends_on with health-check conditions. Add a lightweight health endpoint to your application that returns 200 only when the database connection is established and the application is ready to serve traffic. This is the same endpoint you should use for Kubernetes readinessProbe and ALB health checks, making it a single source of truth for application readiness across all deployment environments.

Edge cases that look like upstream issues but are not

A 502 is not always caused by the upstream server. Several proxy and network layer misconfigurations produce 502 errors that look identical to upstream failures in the access log but have completely different causes.

SSL termination mismatch is one example. If nginx is configured with 'proxy_pass https://upstream' but the upstream does not have a valid TLS certificate, or vice versa — the upstream expects plain HTTP but nginx sends HTTPS — the SSL handshake fails and nginx returns 502. Check 'proxy_ssl_verify off' for internal upstreams and confirm the upstream protocol matches the proxy_pass scheme.

Buffer size limits cause 502 for requests with large response headers. nginx has default limits on response header size (proxy_buffer_size, proxy_buffers). If the upstream sends a response with many or large cookies, JWT tokens in headers, or extensive debug headers, nginx can fail to buffer the response and return 502. Increase proxy_buffer_size from the default 4k to 16k and set proxy_buffers to '4 16k' as a starting point.

IPv6 vs IPv4 mismatches appear in containerized environments. If nginx resolves the upstream hostname to an IPv6 address but the upstream only listens on IPv4, or vice versa, the connection fails with a 502. Use explicit IP addresses in proxy_pass during debugging to eliminate DNS resolution as a variable.

Cloudflare and other CDN layers add their own 502 responses that originate from the CDN, not your server. Cloudflare's 502 pages include a distinctive ray ID in the footer. To determine whether the 502 comes from Cloudflare or your nginx, bypass Cloudflare by connecting directly to your server's IP. If the direct connection succeeds, the CDN layer is where the timeout or connection failure is occurring — check Cloudflare's Health Checks settings and origin server connection settings.

Mistakes that make 502 errors harder to diagnose

The most common mistake is reading only the nginx access log and assuming the upstream is fine because the access log shows a brief error. The nginx access log records the status code sent to the client, not what the upstream returned. The error log is where the actual failure is described. Many engineers spend hours looking at the wrong log file.

Ignoring the upstream application log is the second most costly mistake. When nginx returns 502, the application that was supposed to handle the request usually left a log entry too — whether it crashed, refused the connection, or returned a malformed response. The upstream log is almost always more informative than the proxy log. Make sure your application logs to stderr or a file that you can access, and ensure log rotation is not suppressing recent entries.

Changing nginx timeouts without addressing the root cause is a band-aid that creates new problems. Increasing proxy_read_timeout from 60 to 300 seconds may reduce the rate of 502 errors temporarily, but it means clients wait 5 minutes for a response when the upstream is slow, leading to connection pile-up and eventual out-of-memory on the proxy itself. Fix the slow upstream instead of raising the timeout.

Not adding a health check endpoint to the application is a structural omission that makes all future debugging harder. Every application behind a proxy or load balancer should expose a /health or /ready endpoint that the proxy, load balancer, and Kubernetes readinessProbe can poll. This endpoint should verify that the application's critical dependencies — database, cache, message queue — are reachable. Without it, the proxy cannot distinguish between a starting application and a crashed one, and zero-downtime deployments become unreliable.

Using 'docker compose up' without '--wait' or health-check conditions in CI pipelines causes intermittent test failures that reproduce as 502 errors. The container starts but the process inside takes 2 to 5 seconds to initialize. Add a docker compose up --wait flag or use dockerize to wait for the port before running tests.

Proxy and upstream practices that prevent 502

A resilient reverse proxy setup requires deliberate configuration at each layer. Start with explicit upstream health checks in nginx using the upstream block with a health_check directive (available in nginx Plus) or replicate the behavior with a simple check endpoint that the load balancer polls independently. AWS ALB and GCP Load Balancer both support configurable health check paths, intervals, and thresholds — tune them tighter than the default 30-second interval for production services.

Set proxy timeouts explicitly rather than relying on defaults. proxy_connect_timeout should be short — 5 to 10 seconds — because connection establishment to a healthy local upstream should be near-instant. proxy_read_timeout should reflect your application's legitimate maximum response time, not an arbitrary large number. If your API endpoints are designed to respond within 5 seconds, set proxy_read_timeout to 10 seconds and treat any request exceeding that as a bug in the application to fix.

For Kubernetes, define resource requests and limits on every container. CPU throttling from missing CPU limits causes containers to respond slowly under load, which eventually triggers 502 timeouts at the ingress layer. Memory limits prevent OOM kills from cascading into service outages. Use Vertical Pod Autoscaler in recommendation mode to observe actual resource usage before setting hard limits.

Implement graceful shutdown in your application. When the process receives SIGTERM, it should stop accepting new connections, finish in-flight requests, and exit cleanly. Node.js requires explicit handling of the process SIGTERM signal to call server.close(). Without graceful shutdown, a rolling deployment terminates pods that have active requests, producing 502 errors for those requests. Use /tools/http-request-builder to test your health endpoint and simulate the exact request pattern your proxy uses during health checks.

502 Bad Gateway fix checklist

✓Check nginx error_log — not access_log — for the exact upstream error message
✓Verify the upstream process is running: ss -tlnp | grep PORT or kubectl get pods
✓Confirm proxy_pass host and port exactly match where the app listens
✓Check upstream application log for crashes, OOM kills, or connection errors
✓Run dmesg | grep oom to detect kernel OOM kills of the upstream process
✓For Kubernetes: kubectl describe pod and check readinessProbe failure events
✓For Docker Compose: add healthcheck + condition: service_healthy dependency
✓Test the upstream directly with curl before reloading nginx configuration

Related guides

Frequently asked questions

Is a 502 error the client's fault or the server's fault?

A 502 is always a server-side error. The proxy received a request from the client correctly but could not get a valid response from the upstream backend. The client did nothing wrong — the error is between the proxy (nginx, ALB, Cloudflare) and the application server behind it. The client should retry or wait for the issue to be resolved on the server side.

What does 'upstream prematurely closed connection' mean in nginx?

It means the upstream application server closed the TCP connection before sending a complete HTTP response. This usually happens when the upstream process crashes mid-request, is OOM-killed, or exits due to an unhandled exception. Check the upstream application log and run 'dmesg | grep oom' to rule out kernel memory kills. The application process likely needs to be restarted and the crash cause addressed.

How is a 502 different from a 504?

A 502 Bad Gateway means the upstream sent an invalid or no response immediately. A 504 Gateway Timeout means the upstream did respond, but too slowly — the proxy gave up waiting. In practice: 502 usually means the upstream crashed or refused the connection. 504 usually means the upstream is alive but processing slowly, often due to a slow database query or long-running computation.

Why does my Kubernetes pod show 0/1 READY and cause 502?

A pod shows 0/1 READY when it has not yet passed its readinessProbe. Kubernetes removes pods that are not Ready from the Service endpoint pool, so no traffic should reach them. If you are still seeing 502, the readinessProbe may not be configured, allowing traffic to reach a pod that is starting up but not yet able to handle requests. Check 'kubectl describe pod' for probe failure events.

Can a slow database query cause a 502?

Yes. When a database query takes longer than the nginx proxy_read_timeout (default 60 seconds), nginx closes the connection and returns 502 to the client. The upstream application is still running and trying to execute the query, but the client receives a 502. Fix the slow query using EXPLAIN ANALYZE in PostgreSQL or EXPLAIN in MySQL, and add an index or rewrite the query to stay well under the timeout threshold.

How do I fix 502 errors during rolling Kubernetes deployments?

Add a preStop lifecycle hook with a short sleep of 5 to 15 seconds to your container spec. This gives the ingress controller time to remove the terminating pod from rotation before the process stops. Also ensure your readinessProbe is correctly configured so new pods only receive traffic after they are fully initialized. Combine both to achieve zero-downtime rolling updates without 502 spikes.

Why does nginx return 502 but curl to the upstream works fine?

The most common reason is that nginx is connecting to a different address or port than your manual curl test. Double-check the proxy_pass host and port against what curl uses. Another cause is that the upstream only handles a limited number of concurrent connections, and nginx is exhausting them while your serial curl test works fine. Check upstream connection limits and nginx worker_connections settings.

What is the fastest way to diagnose a 502 in production?

Run 'tail -50 /var/log/nginx/error.log' to see the actual error message, then immediately check whether the upstream process is running with 'systemctl status' or 'kubectl get pods'. If the process is running, check its own application log for errors in the same time window. This three-step sequence identifies the failure layer in under two minutes for most 502 scenarios.

All tools run in your browser. Your data never leaves your device. Last updated: 2026-05-06.