HTTP 429 Too Many Requests — Handle Rate Limits Without Breaking Your Integration

Quick answer

💡HTTP 429 Too Many Requests means the server is enforcing a rate limit and your client is sending too many requests too fast. The response will include a Retry-After header (or X-RateLimit-Reset) telling you when to retry. Read that header and wait before retrying — do not retry immediately or you will stay rate-limited. Use the HTTP Request Builder to inspect the full response headers from the rate-limited API.

Error symptoms

  • HTTP 429 Too Many Requests response status code
  • Response body: { "error": "rate_limit_exceeded" } or similar
  • Retry-After response header with a wait time in seconds
  • X-RateLimit-Remaining: 0 in response headers
  • Requests succeed in testing but fail in production under load
  • GitHub API returns 403 with message 'API rate limit exceeded for...' instead of 429

Common causes

  • Sending too many API requests per minute or per hour without throttling
  • Not reading the Retry-After header and retrying immediately after a 429
  • Multiple instances of your application sharing one API key without a distributed rate limiter
  • Polling an API in a tight loop instead of using webhooks or long polling
  • GitHub API: making unauthenticated requests which have much lower limits than authenticated
  • OpenAI API: exceeding tokens-per-minute limit even when requests-per-minute is within quota

When it happens

  • During bulk data processing that calls an API for each record without throttling
  • In a microservices architecture where multiple services share a single third-party API key
  • After a traffic spike where users trigger many API calls simultaneously
  • During CI/CD pipelines that clone GitHub repos or make API calls without caching
  • When running parallel integration tests that each make real API calls

Examples and fixes

Retrying immediately on 429 keeps the client rate-limited and may get the IP blocked.

Ignoring Retry-After and retrying immediately

❌ Wrong

// Naive retry — hammers the API and stays rate-limited
async function fetchWithRetry(url, options) {
  for (let i = 0; i < 5; i++) {
    const res = await fetch(url, options);
    if (res.status === 429) {
      console.log('Rate limited, retrying...');
      continue; // Retries immediately — wrong
    }
    return res;
  }
  throw new Error('Max retries exceeded');
}

✅ Fixed

// Correct retry — reads Retry-After and waits
async function fetchWithRetry(url, options, maxRetries = 5) {
  for (let i = 0; i < maxRetries; i++) {
    const res = await fetch(url, options);
    if (res.status === 429) {
      const retryAfter = res.headers.get('Retry-After');
      // Retry-After can be seconds (integer) or HTTP-date
      const waitMs = retryAfter
        ? (isNaN(retryAfter) ? new Date(retryAfter) - Date.now() : retryAfter * 1000)
        : Math.min(1000 * 2 ** i, 32000); // exponential backoff fallback
      await new Promise(r => setTimeout(r, waitMs));
      continue;
    }
    return res;
  }
  throw new Error('Max retries exceeded after rate limiting');
}

The Retry-After header can be either an integer (seconds to wait) or an HTTP-date string. Your code must handle both formats. When the header is absent, fall back to exponential backoff: 1s, 2s, 4s, 8s, with a cap. Add random jitter by multiplying the wait by Math.random() to prevent multiple clients from retrying simultaneously — this thundering herd effect is what causes rate limit cascades.

Processing 10,000 items in parallel saturates the API rate limit instantly.

Throttling concurrent requests with p-limit

❌ Wrong

// Processes all items in parallel — instant rate limit
async function processAll(items) {
  const results = await Promise.all(
    items.map(item =>
      fetch(`https://api.example.com/process/${item.id}`, {
        headers: { Authorization: `Bearer ${API_KEY}` }
      })
    )
  );
  return results;
}
// 10,000 concurrent requests → immediate 429 storm

✅ Fixed

// Throttled with p-limit: max 5 concurrent, automatic queuing
import pLimit from 'p-limit';
const limit = pLimit(5); // max 5 concurrent requests

async function processAll(items) {
  const results = await Promise.all(
    items.map(item =>
      limit(() =>
        fetch(`https://api.example.com/process/${item.id}`, {
          headers: { Authorization: `Bearer ${API_KEY}` }
        })
      )
    )
  );
  return results;
}
// Items queue up, only 5 run at a time — respects rate limit

p-limit (npm) is the simplest throttling primitive for Node.js. Set the concurrency limit to a value that keeps your request rate comfortably under the API's limit. For a 100 requests/minute API and requests that average 500ms each, 100/60*0.5 = ~0.8 concurrent requests would saturate it — set limit to 1 to be safe, or 2 with jitter. For distributed architectures where multiple Node.js processes share one API key, p-limit is insufficient — use Bottleneck with Redis-backed distributed rate limiting instead.

How rate limiting algorithms work and what triggers them

HTTP 429 is the server telling your client to slow down. Understanding the algorithm behind a rate limit helps you decide how to respond — and how to avoid the limit in the first place.

The token bucket algorithm is the most common. Imagine a bucket that starts with N tokens and refills at a fixed rate. Each request consumes one token. When the bucket is empty, requests are rejected with 429 until the bucket refills. Token bucket allows short bursts — if you have not made requests recently, you can send several in quick succession. GitHub's API uses this model, which is why you can make 5 rapid requests after being idle without hitting the limit.

The sliding window algorithm tracks requests within a rolling time period. GitHub's REST API uses a 5,000 requests per hour sliding window for authenticated users. Rather than a fixed hourly reset, the window slides continuously. If you made 4,990 requests in the past 59 minutes, you have 10 left for the next minute regardless of where the clock hour boundary falls. The X-RateLimit-Reset header tells you the Unix timestamp when your window fully resets.

The fixed window algorithm resets counts at a fixed time — every minute, every hour. This creates thundering herd effects at reset boundaries when all rate-limited clients retry simultaneously. APIs that use fixed windows often include a random jitter in their Retry-After header to spread out the retry storm.

OpenAI's API adds a second dimension to rate limiting: tokens per minute (TPM) in addition to requests per minute (RPM). You can stay within RPM but still hit 429 by sending very large prompts. GPT-4 and GPT-3.5-turbo have different TPM limits — check the OpenAI usage dashboard for your specific tier. When you hit TPM limits, the response includes a message specifying which limit you exceeded.

Stripe enforces separate read and write limits: 100 reads per second and 100 writes per second per secret key. The write limit catches bulk create operations — processing 500 customer objects in a loop will exceed the limit immediately. For Stripe retries specifically, always use idempotency keys to ensure that retried requests do not create duplicate charges or subscriptions. Stripe's 429 responses include a Retry-After header when you should wait.

Reading rate limit headers before the 429 hits

Most APIs send rate limit status headers with every response, not just 429 responses. Reading these headers proactively lets you slow down before hitting the limit rather than recovering from it.

The de facto standard headers are X-RateLimit-Limit (your total allowed requests per window), X-RateLimit-Remaining (how many you have left), and X-RateLimit-Reset (Unix timestamp when the window resets). GitHub, Twitter, and many other major APIs use these names. Check every response for these headers and log them at debug level so you can see how quickly you are consuming your quota.

When you do receive a 429, always check the Retry-After header before any other header. This header can be in two formats: an integer representing seconds to wait ('Retry-After: 30') or an HTTP-date string ('Retry-After: Wed, 07 May 2026 12:00:00 GMT'). Your client must handle both. Parse the integer with parseInt and compare its value to Date.now() to compute the actual wait in milliseconds. Parse the HTTP-date with new Date(retryAfterHeader).getTime() - Date.now().

In the browser DevTools Network tab, select the failed request and switch to the Response Headers panel. You will see the full list of rate limit headers and their current values. Pay attention to whether Retry-After is present — some APIs (OpenAI included) sometimes omit it and expect clients to implement their own backoff strategy. Use /tools/http-request-builder to capture the complete headers from the API outside your application code, which is useful when debugging whether your application is correctly reading and forwarding headers.

For GitHub specifically, the rate limit response body includes detailed information: the resource type being limited (core, search, graphql), the reset time, and whether you are using an authenticated or unauthenticated request. Authenticated requests have 5,000 requests/hour while unauthenticated requests have only 60/hour — many 429 problems with GitHub are simply from running unauthenticated requests that could be authenticated with a personal access token or GitHub App credentials.

Strategies that reliably stay under rate limits

The most robust strategy is to never hit the rate limit in the first place, which requires throttling outgoing requests to stay comfortably below the limit. For single-process Node.js applications, the p-limit package provides simple concurrency control. For distributed systems where multiple processes or services share an API key, you need a centralized rate limiter backed by Redis.

Bottleneck (npm) supports Redis-backed distributed rate limiting. Initialize it with the API's requests-per-second limit and share the Redis key across all application instances. Bottleneck queues excess requests rather than dropping them, which is usually what you want for batch processing. The minTime option sets minimum milliseconds between requests, which directly translates to a per-second rate: minTime: 100 limits you to 10 requests per second.

For GitHub API specifically, use conditional requests with ETag headers. When you fetch a resource, GitHub returns an ETag header. On subsequent requests, send that value as If-None-Match. If the resource has not changed, GitHub returns 304 Not Modified — and crucially, 304 responses do not count against your rate limit. For polling scenarios, this means you can check frequently without consuming quota. Cache the ETag and last response in Redis or an in-memory store.

For OpenAI API, implement a token counter before each request. Estimate the prompt token count using the tiktoken library (which matches OpenAI's tokenizer exactly) and check your current TPM usage against the limit. If the request would exceed the limit, wait until the window resets before sending. For GPT-4 calls where token costs are high, batch as much context into a single request as possible rather than making many smaller requests.

For Stripe, use idempotency keys for all write operations. Pass the Stripe-Idempotency-Key header with a UUID that uniquely identifies the intent of the request. If you retry a failed request with the same key, Stripe returns the original response without creating a duplicate object. This makes retries safe for create operations and means you can retry 429 responses without risk. Store the idempotency key alongside the pending operation in your database so you can resume after process restarts.

The queue-based architecture is the most scalable solution for high-volume API consumers. Instead of calling the API directly from request handlers, add jobs to a queue (SQS, BullMQ, RabbitMQ). A worker pool pulls jobs from the queue at a controlled rate that stays within the API limit. The queue absorbs traffic spikes without triggering rate limits, and you can tune the worker concurrency precisely to the API's quota.

Rate limit edge cases that catch experienced engineers

GitHub API returns 403 instead of 429 for rate limit violations in some scenarios. When you exceed the primary rate limit on the REST API, you get 403 with a message body containing 'API rate limit exceeded'. This is a historical inconsistency — GitHub's secondary rate limits (abuse detection) do return 429. Your error handling must check for both 403 with a rate limit body and 429 when working with GitHub.

Shared API keys across microservices create distributed rate limit exhaustion that is difficult to debug. Service A and Service B both use the same API key. Service A runs a batch job that consumes the entire quota. Service B starts returning 429 errors that appear unrelated to the batch job. The fix is to either allocate separate API keys per service or use a centralized rate limiter with a shared state (Redis) that all services check before making requests.

The thundering herd problem occurs when many clients receive a 429 simultaneously and all wait for the exact Retry-After time, then retry at once. This creates a second wave of 429 responses. The solution is to add random jitter to the retry wait time. The full jitter strategy multiplies the calculated wait by a random number between 0 and 1: waitMs * Math.random(). This spreads retries across the Retry-After window and prevents synchronized storms.

Some APIs enforce IP-level rate limits separate from API key limits. CloudFlare and other DDoS protection services may apply IP rate limits to your outbound requests before they even reach the API's authentication layer. These show up as 429 or 503 responses that look identical to API-level rate limits but cannot be fixed by authenticating. If you make API calls from ephemeral infrastructure (Kubernetes pods, serverless) with many IP addresses, each IP may be rate-limited independently even with low per-IP request rates. Using a NAT gateway with a static IP routes all outbound traffic through a single IP address that you can get allowlisted if needed.

Mistakes that keep clients stuck in rate limit loops

Not reading the Retry-After header is the most common mistake and the most expensive one. Engineers implement a fixed retry delay — 'wait 5 seconds and try again' — without checking what the API actually requests. If the API enforces a 60-second window and you retry after 5 seconds, you stay rate-limited for the entire window and waste retries. Always read Retry-After first; use your own backoff strategy only when the header is absent.

Using exponential backoff without a cap creates unpredictable retry durations. Starting at 1 second and doubling each time: 1, 2, 4, 8, 16, 32, 64, 128 seconds. Without a cap, a long outage results in waits measured in hours. Cap the maximum wait at 32 to 60 seconds for most APIs. Some engineers accidentally use seconds when the API returns milliseconds (or vice versa), causing either immediate retries or waits of thousands of seconds.

Not using idempotency keys for write operations means retries can create duplicate records. If a POST request to create an order times out and you retry, you may create two orders. Every POST, PUT, and PATCH to APIs that support idempotency keys should always include them. Generate the key before the first attempt and store it with the operation so it survives process restarts. Stripe, PayPal, and many financial APIs support this pattern explicitly.

Pollng a resource in a tight loop instead of using webhooks is a structural problem that wastes rate limit quota on unchanged data. Most APIs that publish webhooks have lower rate limits on polling endpoints by design. Subscribe to webhook events for state changes and only make synchronous API calls when you need to create or modify resources, not to check whether something changed. Use /tools/http-request-builder and /tools/cors-tester to verify your webhook endpoint is reachable from the API provider's servers before investing in the integration.

Rate limit handling that survives production load

Build rate limit handling into a centralized API client rather than scattering retry logic across every API call site. Create a wrapper class or function that accepts a fetch function and rate limit parameters, applies throttling and backoff automatically, and exposes the same interface as the underlying API call. All code that calls the external API uses this wrapper — rate limit handling is not duplicated.

Expose rate limit metrics in your application observability. Log X-RateLimit-Remaining and X-RateLimit-Reset for every API call, and emit them as metrics (Prometheus gauges, Datadog custom metrics). Alert when remaining drops below 20 percent of the limit — this gives you time to throttle back proactively before reaching zero. Track rate limit errors as a separate error category in your error rate dashboards.

For APIs where you regularly approach the rate limit, request a higher limit from the provider. GitHub has verified organizations with higher rate limits. OpenAI, Stripe, and most B2B APIs have enterprise tiers with higher quotas. Submit a rate limit increase request with your expected usage volume and use case before you hit production issues. Most providers prefer to increase limits proactively over handling incident escalations.

Test rate limit handling explicitly in your test suite. Write a test that mocks the API to return 429 with a Retry-After header and verify your client waits the correct duration and retries successfully. Test both the integer and HTTP-date formats of Retry-After. Test that jitter is applied. Test that the client does not retry after the maximum retry count is exceeded. These scenarios are easy to unit test but rarely written without deliberate effort, and they prevent production regressions when retry logic changes.

429 rate limit fix checklist

  • Read the Retry-After header and wait the specified duration before retrying
  • Parse Retry-After as both integer seconds and HTTP-date string formats
  • Add jitter to your retry wait time to prevent thundering herd on reset
  • Use p-limit or Bottleneck to throttle concurrent outgoing requests
  • For GitHub: use an authenticated token — unauthenticated limit is 60/hour vs 5000/hour
  • For Stripe: always pass Stripe-Idempotency-Key on POST/PUT requests before retrying
  • Monitor X-RateLimit-Remaining and alert when it drops below 20 percent
  • Replace tight polling loops with webhooks for event-driven state changes

Related guides

Frequently asked questions

What is the difference between Retry-After in seconds vs HTTP-date format?

Retry-After: 30 means wait 30 seconds. Retry-After: Wed, 07 May 2026 12:00:00 GMT means wait until that UTC timestamp. Your parser must handle both: use isNaN(value) to distinguish them. Parse the integer directly as seconds. Parse the date string with new Date(value).getTime() - Date.now() to get milliseconds to wait. Never assume one format.

Why does GitHub return 403 instead of 429 for rate limits?

GitHub returns 403 for primary rate limit exhaustion on the REST API — this is a historical quirk. Secondary rate limits (abuse detection) return 429. Your error handling for GitHub must check both: status 403 with a response body containing 'rate limit exceeded', and status 429. Check the response body message, not just the status code, to distinguish a rate limit 403 from a permissions 403.

What is exponential backoff with jitter and why does it matter?

Exponential backoff means each retry waits twice as long as the previous: 1s, 2s, 4s, 8s. Jitter adds a random multiplier (0 to 1) to spread retries across time. Without jitter, multiple rate-limited clients all retry at the same moment, creating a new traffic spike that immediately triggers another 429. Full jitter (wait * Math.random()) distributes retries evenly across the wait window.

How do I handle rate limits across multiple Node.js processes sharing one API key?

Use the Bottleneck npm package with a Redis datastore. Each process connects to the same Redis instance and Bottleneck coordinates the rate across all processes. Set maxConcurrent, minTime, and reservoir values to match the API's limits. All processes share a single rate limit counter in Redis, preventing any single process from consuming the entire quota.

Do GitHub conditional requests (ETag) count against the rate limit?

304 Not Modified responses to conditional requests do not count against your GitHub rate limit. Send the ETag value you received from a previous response as the If-None-Match header. If the resource has not changed, GitHub returns 304 without deducting from your quota. Cache the ETag alongside the response body and use conditional requests for any polling scenario.

What is OpenAI's tokens-per-minute limit and how do I stay under it?

OpenAI enforces separate requests-per-minute and tokens-per-minute limits. You can be under RPM but still hit 429 by sending large prompts. Use the tiktoken library to count tokens before each request and track your rolling TPM usage. Batch content into larger single requests rather than many small ones, and use cheaper models for tasks that do not require the highest capability.

Should I use webhooks instead of polling to avoid rate limits?

Yes, for any resource that changes asynchronously. Webhooks push state changes to your endpoint as they happen, eliminating the need to poll. Most GitHub, Stripe, and payment events are better handled via webhooks. Reserve polling for resources that do not support webhooks or for initial sync scenarios. This also reduces latency since you receive events in real time rather than on your polling interval.

What are Stripe idempotency keys and when must I use them?

Stripe idempotency keys ensure that retried requests do not create duplicate objects. Pass a Stripe-Idempotency-Key header with a UUID that identifies the intent of the operation. Generate it before the first attempt and reuse it on retries. Any POST to create a payment, subscription, or customer must use idempotency keys — without them, a retry after a 429 or network error can create duplicate charges.

All tools run in your browser. Your data never leaves your device. Last updated: 2026-05-06.