Complete URL percent encoding guide based on RFC 3986

Quick answer

💡Percent encoding replaces a byte with % followed by its two-digit hexadecimal value. RFC 3986 defines unreserved characters (A-Z a-z 0-9 - _ . ~) which never need encoding, and reserved characters which act as delimiters and must be encoded when used as data. Spaces encode as %20 in path and fragment contexts; the + sign is only valid as a space replacement in application/x-www-form-urlencoded query strings.

Encode or decode Base64 data →

Error symptoms

✕Server returns 400 Bad Request when URL contains characters like spaces, brackets, or angle brackets
✕URL parameter is truncated at & because the value was not encoded
✕Non-ASCII path segments show garbled characters after server round-trip
✕Browser auto-corrects the URL in a way that changes the encoded values
✕International domain name like münchen.de causes DNS lookup failure in programmatic HTTP clients
✕Webhook URL with %20 in the path works but URL with + in the path does not

Common causes

•Not encoding characters that have special meaning in the URL context where they appear
•Confusing the application/x-www-form-urlencoded format (+ for space) with RFC 3986 percent encoding (%20 for space)
•Encoding characters that should remain unencoded as delimiters, such as encoding the / in a path
•Using the deprecated escape() function which produces %uXXXX for non-ASCII characters, which is not valid percent encoding
•Not handling internationalized domain names (IDN) by converting to punycode before use in HTTP clients
•Applying percent encoding to a URL fragment, which is never sent to the server

When it happens

•When building REST API clients that construct URLs from user-provided search terms or filters
•When implementing file download endpoints where the filename in the Content-Disposition header must be encoded
•When proxying requests where the proxy must preserve or re-encode the incoming URL
•When embedding URLs in QR codes that may be scanned by devices with strict URL parsers
•When generating sitemap XML where URLs must be valid XML as well as valid URLs

Examples and fixes

Path segments and query string values have different encoding requirements for the same characters.

Encoding query strings vs path segments

❌ Wrong

// Building a URL with user-provided values
const username = 'john doe';
const tag = 'node.js & npm';

// Wrong: encodes / and = in path, and leaves & unencoded in query
const wrong = encodeURIComponent('https://api.example.com/users/' + username) +
  '?tag=' + tag;
console.log(wrong);
// 'https%3A%2F%2F...' — protocol is encoded, & is not

✅ Fixed

const username = 'john doe';
const tag = 'node.js & npm';

// Correct: encode each component separately
const base = 'https://api.example.com';
const path = '/users/' + encodeURIComponent(username);
const params = new URLSearchParams({ tag });
const url = base + path + '?' + params.toString();
console.log(url);
// https://api.example.com/users/john%20doe?tag=node.js+%26+npm

// Or use the URL constructor:
const urlObj = new URL('/users/' + encodeURIComponent(username), base);
urlObj.searchParams.set('tag', tag);
console.log(urlObj.toString());

URL encoding must be applied at the component level, not to the entire URL string. The base URL (protocol, host, port) must not be encoded. Path segments must be encoded individually with encodeURIComponent so the / delimiters remain intact. Query string values should use URLSearchParams which handles encoding automatically. Using URLSearchParams or the URL constructor eliminates the need to think about which characters need encoding in which context.

Handling malformed percent sequences without crashing.

Decoding percent-encoded values safely

❌ Wrong

// API handler that receives a URL-encoded search query
app.get('/search', (req, res) => {
  // req.query is already decoded by Express, but sometimes
  // clients double-encode values, so you might try to re-decode:
  const query = decodeURIComponent(req.query.q);
  // Crashes if req.query.q is undefined or malformed
  res.json({ results: search(query) });
});

✅ Fixed

app.get('/search', (req, res) => {
  const rawQuery = req.query.q;
  if (!rawQuery || typeof rawQuery !== 'string') {
    return res.status(400).json({ error: 'Missing query parameter q' });
  }
  let query;
  try {
    // Only re-decode if the value appears to be double-encoded
    query = rawQuery.includes('%') ? decodeURIComponent(rawQuery) : rawQuery;
  } catch {
    // Malformed %xx sequence — use as-is after basic sanitization
    query = rawQuery.replace(/%(?![0-9A-Fa-f]{2})/g, '');
  }
  res.json({ results: search(query.trim()) });
});

decodeURIComponent throws URIError for malformed percent sequences (a lone % not followed by two hex digits) and for broken UTF-8 sequences encoded as percent bytes. In an Express app, req.query is already decoded by the query parser, so double-decoding is usually wrong. If a client is sending double-encoded values, fix the client. If you must handle both, use try/catch around decodeURIComponent. Validate that the query parameter exists and has the right type before processing to provide clear error messages rather than uncaught exceptions.

RFC 3986 percent encoding rules explained

Percent encoding is defined in RFC 3986 (Uniform Resource Identifier: Generic Syntax). The mechanism is simple: any byte can be represented as % followed by two uppercase hexadecimal digits representing the byte value. The character space (ASCII 32, hex 0x20) encodes as %20. The ampersand (ASCII 38, hex 0x26) encodes as %26. Non-ASCII characters are first encoded as UTF-8, then each byte is percent-encoded: the euro sign € (U+20AC) encodes to three UTF-8 bytes E2, 82, AC, which produces %E2%82%AC.

RFC 3986 divides characters into three categories. Unreserved characters (A-Z, a-z, 0-9, -, _, ., ~) may appear anywhere in a URI without encoding and should not be encoded. Reserved characters serve as delimiters: gen-delimiters (: / ? # [ ] @) separate major URI components, and sub-delimiters (! $ & ' ( ) * + , ; =) serve as delimiters within components. Reserved characters must be percent-encoded when they appear as data in a component where they would otherwise be interpreted as delimiters.

The context where a value appears determines which characters must be encoded. In a path segment, the / character must be encoded as %2F when it appears in data (to distinguish from the path separator). The ? and # characters must also be encoded in path data. In a query string value, both & and = must be encoded to prevent them from being interpreted as parameter delimiters. In a fragment, no characters are sent to the server so encoding is only necessary for parsing consistency.

The percent sign itself (%) must be encoded as %25 when it appears as data, not as the first character of a percent-encoded sequence. A URL validator must verify that every % is followed by exactly two hexadecimal characters. A bare % or a % followed by non-hex characters is a malformed URL that different parsers handle inconsistently — some reject it, some pass it through unchanged, and some attempt to encode it on the fly.

Diagnosing percent encoding problems in URLs

The fastest diagnostic is to use the browser DevTools Network tab to inspect the actual URL sent in a request. Click on the request, view the Headers tab, and look at the Request URL field. This shows the exact URL including encoding. Compare this against what your code constructed. If the URL was modified by the browser or framework, the difference is visible here.

In Node.js, log the URL at the point of construction before passing it to fetch, axios, or http.request. console.log(url) shows the string as JavaScript sees it. Inspect whether special characters are already percent-encoded or still raw. For incoming requests, log req.url (raw, with encoding) and req.path (decoded by Express) to see both forms.

To check whether a string is valid percent-encoded, test it with decodeURIComponent wrapped in try/catch. If it throws, the string has malformed percent sequences. To check whether a query string is correctly structured, parse it with new URLSearchParams(queryString) and inspect the resulting entries. Parameters that should be distinct but share a name indicate an unencoded & in a value.

For internationalized domain name issues, check whether the HTTP client is handling IDN automatically. Node.js's built-in http and https modules (and fetch) use the WHATWG URL parser which converts IDN to punycode: new URL('https://münchen.de').host returns 'xn--mnchen-3ya.de'. Some older HTTP client libraries do not perform this conversion and require the caller to provide the punycode hostname manually using the 'punycode' npm package (deprecated) or the 'tr46' package.

Applying correct percent encoding by URL component

For query string construction, use URLSearchParams as the primary API. It correctly encodes all values and handles & separators. For API clients that need specific query string formats (such as PHP-style array notation), use the qs package: qs.stringify({ filters: ['a', 'b'] }) produces filters%5B0%5D=a&filters%5B1%5D=b with properly encoded brackets.

For path segments with dynamic data, encode each segment individually: '/files/' + encodeURIComponent(filename). If the path includes multiple levels constructed from user data, map over the segments: [baseSegment, ...userSegments.map(encodeURIComponent)].join('/'). Never apply encodeURIComponent to an entire path that includes / delimiters.

For complete URL construction with both path and query components, the URL constructor is the most robust approach. const u = new URL(path, baseUrl); u.searchParams.set('q', query); return u.toString(). This handles encoding, normalizes the path, and produces a correctly formatted URL regardless of what the individual values contain.

For Content-Disposition headers in file download responses, the filename must be percent-encoded using a specific encoding format. The modern approach (RFC 5987 extended notation) is: Content-Disposition: attachment; filename*=UTF-8''${encodeURIComponent(filename)}. The filename* parameter with the UTF-8'' prefix supports non-ASCII characters correctly in all modern browsers. The legacy filename parameter should also be included for compatibility with older clients, with non-ASCII characters transliterated or stripped.

IDN, fragment encoding, and edge cases

Internationalized domain names (IDN) use Punycode encoding at the DNS level. The domain münchen.de has the code point U+00FC (ü) which is not valid in a DNS label. The IDN encoding converts this to the ACE (ASCII Compatible Encoding) form xn--mnchen-3ya.de using the Punycode algorithm defined in RFC 3492. Modern browsers and the WHATWG URL parser perform this conversion automatically. However, older Node.js code that manually constructs hostnames from user input may need to apply Punycode conversion explicitly. The built-in url module (not WHATWG URL) does not perform IDN conversion; use the WHATWG URL class (new URL()) instead.

URL fragments (the # portion) are never transmitted to the server in HTTP requests. They are processed entirely by the browser. Despite this, percent-encoding rules still apply to fragment content for consistency with URL parsing. Fragment values can contain any Unicode character percent-encoded as UTF-8. If a server-side rendered page needs to set a fragment value that contains special characters, it should percent-encode the value in the HTML href.

The plus sign + has ambiguous status in URL encoding. In application/x-www-form-urlencoded (HTML form submission), + represents a space. In RFC 3986 URI syntax, + is a sub-delimiter that is allowed in path and query components without encoding. When a URL is parsed by a server, + in the path remains as a literal plus sign. + in a query string may be decoded as a space (by frameworks that implement form decoding on query strings) or kept as + (by frameworks that implement strict RFC 3986 decoding). This ambiguity is why %20 is safer than + for spaces in URLs that may be processed by different parsers.

Data URIs use percent encoding for the data portion when the media type is not base64. A data URI for an SVG image with non-ASCII characters must percent-encode those characters. Data URIs that use base64 encoding (data:image/png;base64,...) do not percent-encode the base64 data, but the base64 alphabet (A-Z a-z 0-9 + /) is safely allowed in URI context without encoding.

Common URL percent encoding mistakes

Encoding the entire URL string with encodeURIComponent is the most disruptive encoding mistake. This encodes the : and // in the protocol, producing https%3A%2F%2F which is not a valid URL. The fix is always to encode individual components (path segments, query values) and combine them with unencoded structural characters.

Not encoding the # character in redirect URLs is a common security-adjacent mistake. If an OAuth redirect_uri contains a # (such as a single-page application hash route), the server-side OAuth handler must encode the # as %23 in the redirect_uri parameter. If it does not, the authorization server may interpret the # as the beginning of the fragment and truncate the redirect_uri parameter value.

Using the wrong encoding for Content-Disposition filename fields breaks file downloads in non-ASCII languages. The legacy filename="file.pdf" field does not support non-ASCII characters. Using UTF-8 bytes directly in this field produces garbled filenames on Windows and is technically invalid. The correct approach is to use the RFC 5987 format: filename*=UTF-8''percent-encoded-name for non-ASCII filenames, with a fallback ASCII filename for older clients.

Ignoring the encoding of matrix parameters (semicolon-separated path parameters like /path;param=value) is less common but causes problems with some Java and .NET frameworks. Matrix parameters use the same percent-encoding rules as query strings. If a matrix parameter value contains a semicolon or equals sign, these must be encoded as %3B and %3D respectively.

Percent encoding best practices for developers

Build a URL abstraction layer in your application that accepts structured inputs and produces correctly encoded URLs, rather than allowing ad-hoc string concatenation throughout the codebase. The URL and URLSearchParams classes are the foundation for this layer in browser and Node.js environments. For server-side URL construction in Express, use the path module for path manipulation and url.format with an options object for full URL construction.

Document the encoding contract for every URL field in your API. In OpenAPI specification, use the parameter encoding object to specify the encoding style (form, spaceDelimited, pipeDelimited) for each parameter. This generates correct encoding in SDK clients and clarifies expectations in documentation.

For URL parameters that contain sensitive data (such as authentication tokens in OAuth flows), prefer encoding the entire value with encodeURIComponent even if it does not strictly require encoding. A value like a JWT access token may contain base64url characters that do not need encoding, but explicitly encoding it prevents future bugs if the token format changes.

Monitor your web server logs for requests containing %2F in path segments. A URL like /files/%2F../../../etc/passwd uses an encoded slash to attempt a path traversal attack. Your URL decoder must handle this by rejecting paths that resolve outside the intended directory even after decoding. Normalize paths with path.resolve and verify the result starts with the expected base directory before serving files.

Quick fix checklist

✓Encode path segments with encodeURIComponent individually, not the entire path string.
✓Use URLSearchParams for query string construction to avoid encoding mistakes.
✓Check for %2520 (double-encoded %) in URLs — decode first, then re-encode raw values.
✓Use %20 for spaces in path segments; + is only valid in application/x-www-form-urlencoded query strings.
✓Wrap decodeURIComponent in try/catch to handle malformed %xx sequences.
✓Use new URL() constructor to validate and normalize URLs rather than regex or string checks.
✓For file download headers, use RFC 5987 filename*=UTF-8'' format for non-ASCII filenames.
✓Convert internationalized domain names to punycode for older HTTP clients that do not perform IDN conversion.

Related guides

Frequently asked questions

What is percent encoding and why is it needed?

Percent encoding replaces a byte with % followed by its two-digit hexadecimal value. URLs can only contain a limited set of ASCII characters. Non-ASCII characters and ASCII characters with special URL meanings must be encoded so they can be transmitted as data rather than being interpreted as URL structure. RFC 3986 defines the encoding rules. Every byte in the character's UTF-8 encoding gets its own %xx escape.

Which characters never need percent encoding in a URL?

The unreserved characters defined by RFC 3986 never need encoding: uppercase letters A-Z, lowercase letters a-z, digits 0-9, hyphen (-), underscore (_), period (.), and tilde (~). Encoding these characters is allowed but unnecessary. Percent-encoding an unreserved character and not encoding it are equivalent; a URL validator must treat %41 the same as A.

What is the difference between %20 and + for spaces?

%20 is the RFC 3986 percent encoding for a space character (ASCII 32 = hex 20). The plus sign + as a space encoding is specific to application/x-www-form-urlencoded format used in HTML form submissions. Most web frameworks decode + as space only in query strings, not in path segments. Use %20 in paths and when in doubt; use + only when producing form-encoded query strings.

How do I encode a URL that contains non-ASCII characters like Chinese or Arabic?

Non-ASCII characters must be encoded as UTF-8 bytes, then each byte percent-encoded. The Chinese character 中 (U+4E2D) encodes to UTF-8 bytes E4 B8 AD, producing %E4%B8%AD. In JavaScript, encodeURIComponent handles this automatically: encodeURIComponent('中') returns '%E4%B8%AD'. The URL constructor also handles non-ASCII path and query characters correctly.

What is punycode and when do I need it?

Punycode is an encoding algorithm for internationalized domain names that converts Unicode labels to ASCII-Compatible Encoding (ACE). The domain münchen.de becomes xn--mnchen-3ya.de in punycode. Modern browsers and the WHATWG URL API (new URL()) convert IDN to punycode automatically. Older Node.js http module clients may not, requiring manual conversion with the tr46 npm package for domains containing non-ASCII characters.

Is the URL fragment (#) percent-encoded?

Fragment values can contain percent-encoded characters and the fragment portion is processed by the browser, not sent to the server. Characters in fragments that require encoding (non-ASCII, spaces, < > ' " etc.) should be percent-encoded for consistency and correct parsing. The # delimiter itself must be %23 when it appears as data in a query parameter, since the browser interprets the first unencoded # as the fragment start.

How do I encode a URL for use in an HTML attribute?

A URL in an HTML href attribute needs two levels of encoding: the URL characters must be percent-encoded, and the resulting string must be HTML-entity-encoded (replacing & with &). If a URL has a query string with & separators, the & must appear as & in HTML. Most templating engines handle HTML escaping automatically. The URL encoding must be applied before HTML escaping, not after.

What happens if I send a URL with unencoded spaces to a server?

An unencoded space in a URL is a syntax violation. Most browsers automatically encode spaces before sending the request, converting them to %20. However, programmatic HTTP clients (fetch, axios, curl) may reject the URL or pass it unmodified, which most servers accept while some strict proxies and WAFs reject with 400 Bad Request. Always encode URLs before sending them rather than relying on client-side automatic encoding.

All tools run in your browser. Your data never leaves your device. Last updated: 2026-05-06.