Base64 vs Hex Encoding: Size, Use Cases, and API Differences

Quick answer

๐Ÿ’กBase64 encodes 3 bytes as 4 ASCII characters (33% size overhead) and is best for embedding binary data in JSON, HTML, or email. Hex encodes each byte as 2 characters (100% overhead) and is best for checksums, hashes, and debug output where human readability matters. For URLs, use base64url (replaces + with - and / with _, removes = padding) instead of standard base64.

Error symptoms

  • โœ•InvalidCharacterError: Failed to execute 'atob' on 'Window' โ€” string contains characters outside of Latin1
  • โœ•Base64-encoded string contains + and / characters that break URL parsing
  • โœ•Decoded hex string is twice as long as the original binary data in memory
  • โœ•SyntaxError: JSON.parse โ€” unexpected character after switching to hex encoding in a JSON payload
  • โœ•Buffer mismatch: Node.js Buffer.from(str, 'base64') silently ignores invalid characters
  • โœ•Padding error: base64 string length is not a multiple of 4 after URL transmission

Common causes

  • โ€ขUsing standard base64 in URLs without switching to base64url or percent-encoding the + and / characters
  • โ€ขConfusing the 33% overhead of base64 with the 100% overhead of hex when estimating payload sizes
  • โ€ขUsing btoa() in browsers, which only accepts Latin1 (ISO 8859-1) strings, not arbitrary UTF-8
  • โ€ขChoosing hex for large binary blobs (images, files) where the 2x size increase significantly impacts performance
  • โ€ขForgetting to strip base64 padding (=) before embedding in a URL, causing 400 errors in strict routers
  • โ€ขMixing hex lowercase (a-f) and uppercase (A-F) output across services, causing case-sensitive comparison failures

When it happens

  • โ€ขWhen embedding a binary file or cryptographic key inside a JSON API response
  • โ€ขWhen generating a URL-safe token (CSRF token, password reset link, OAuth state parameter)
  • โ€ขWhen displaying a SHA-256 hash or MD5 checksum to a developer or end user for comparison
  • โ€ขWhen sending binary data over SMTP email where 8-bit characters are not guaranteed to survive transport
  • โ€ขWhen storing a UUID or random token in a URL path segment or query parameter

Examples and fixes

Standard base64 breaks URLs. Use base64url or the Node.js Buffer base64url encoding for URL-safe tokens.

Base64 URL-safe token generation vs standard base64 in a URL

โŒ Wrong

const crypto = require('crypto');
// Standard base64 โ€” may contain +, /, and =
const token = crypto.randomBytes(32).toString('base64');
// Example: 'v8K+3mQ/rXlP2sN4yTcBhA=='
// Placing this in a URL breaks query parsing:
const resetUrl = `https://example.com/reset?token=${token}`;
// fetch(resetUrl) โ€” + becomes a space, / may truncate path

โœ… Fixed

const crypto = require('crypto');
// base64url: replaces + with -, / with _, strips = padding
const token = crypto.randomBytes(32).toString('base64url');
// Example: 'v8K-3mQ_rXlP2sN4yTcBhA'
// Safe for URLs, path segments, and HTTP headers:
const resetUrl = `https://example.com/reset?token=${token}`;
// Node.js 16+: Buffer.from(token, 'base64url') to decode
const decoded = Buffer.from(token, 'base64url');
console.log(decoded.length); // 32 bytes, always

Standard base64 uses + and / as alphabet characters and = for padding. When a base64 string appears in a URL query parameter, + is interpreted as a space by URL parsers following the application/x-www-form-urlencoded spec, and / can confuse path routers. Base64url (RFC 4648 ยง5) replaces + with - and / with _, and omits the = padding, making the string safe for direct use in URLs, HTTP headers, and JWT parts. Node.js 14 added 'base64url' as a first-class Buffer encoding. For older Node versions, use str.replace(/\+/g, '-').replace(/\//g, '_').replace(/=/g, '') after standard base64 encoding.

Hex is the correct format for displaying checksums. Base64 is correct for storing binary data efficiently in databases or JSON.

Choosing hex for SHA-256 checksums vs base64 for binary storage

โŒ Wrong

const crypto = require('crypto');
const fileBuffer = require('fs').readFileSync('archive.tar.gz');
// Hex for storage is wasteful: 1MB file -> 2MB hex string
const storedInDB = fileBuffer.toString('hex');
// Base64 for a checksum is confusing to compare visually:
const checksum = crypto.createHash('sha256')
  .update(fileBuffer).digest('base64');
// 'xDtQZMsLr8F5y2Kp0WcVeA==' โ€” hard to compare with sha256sum output

โœ… Fixed

const crypto = require('crypto');
const fileBuffer = require('fs').readFileSync('archive.tar.gz');
// Base64 for binary storage: 1MB -> ~1.33MB (33% overhead)
const storedInDB = fileBuffer.toString('base64');
// Hex for human-readable checksums: matches sha256sum CLI output
const checksum = crypto.createHash('sha256')
  .update(fileBuffer).digest('hex');
// 'c43b5064cb0bafc179cb62a9d165c578...' โ€” matches sha256sum exactly
console.log(`Size overhead: ${((storedInDB.length / fileBuffer.length) - 1) * 100}%`); // 33.3%

The choice depends on purpose. Binary data embedded in a database text column or JSON payload should use base64 because it is 33% larger than the original vs hex's 100% overhead. A SHA-256 checksum displayed to a user or written to a manifest file should use hex because it matches what tools like sha256sum, md5sum, and git log output. Hex is also easier to compare character-by-character in logs: developers can immediately identify where two hex strings diverge. Base64 comparison is harder because a one-bit difference can change multiple adjacent characters due to the 6-bits-per-character packing.

How base64 and hex encoding work under the hood

Both base64 and hex are binary-to-text encoding schemes โ€” they convert arbitrary bytes into a subset of printable ASCII characters so binary data can safely pass through text-only channels like JSON, email, HTML attributes, and HTTP headers. The key difference is in how many bits each output character represents and therefore how much size overhead each scheme adds.

Hex encoding maps each byte (8 bits) to exactly two hexadecimal characters (each representing 4 bits). The alphabet is 0-9 and a-f (or A-F in uppercase). A 32-byte SHA-256 hash becomes a 64-character hex string. A 1 MB binary file becomes a 2 MB hex string. The overhead is always exactly 100% โ€” output is always twice the input size. The advantage is complete human readability and direct compatibility with standard tools like sha256sum, xxd, and hexdump.

Base64 encoding maps every 3 bytes (24 bits) to exactly 4 ASCII characters (each representing 6 bits). The alphabet is A-Z, a-z, 0-9, +, and / (64 characters total), with = used as padding to fill incomplete 3-byte groups. A 32-byte key becomes approximately 44 base64 characters. A 1 MB binary file becomes approximately 1.33 MB of base64. The overhead is always approximately 33.3% โ€” output is ceil(n/3)*4 characters for n input bytes.

URL-safe base64 (base64url, RFC 4648 ยง5) replaces the two URL-unsafe characters: + becomes - and / becomes _. Padding = characters are typically omitted because their presence in a URL causes ambiguity. Base64url is used in JWTs, OAuth tokens, and password reset links. It is not the same as percent-encoding the + and / โ€” percent-encoding would produce %2B and %2F, which add more characters; base64url substitution keeps the same length.

Diagnosing encoding errors in base64 and hex

When atob() throws "InvalidCharacterError", the input string contains characters outside the Latin1 (ISO 8859-1) range. atob() does not accept UTF-8 strings โ€” it only processes bytes 0-255. If your string contains emoji, Chinese characters, or other multi-byte Unicode, you must encode those characters to UTF-8 bytes first and then base64-encode the bytes. Use new TextEncoder().encode(str) to get a Uint8Array, then convert that to base64.

When base64 decoding produces the wrong output after URL transmission, the likely cause is that + characters were converted to spaces. A standard base64 string like 'abc+def/ghi=' gets URL-decoded as 'abc def/ghi=' (space instead of +), which decodes to completely different bytes. Check your server-side URL decoding with console.log(req.query.token) before decoding โ€” if you see spaces, the client sent standard base64 in a URL. The fix is either base64url on the client or encodeURIComponent(token) before appending to the URL.

For hex comparison failures, check case sensitivity. crypto.createHash('sha256').digest('hex') in Node.js produces lowercase hex. PHP's hash() function produces lowercase by default. But some systems output uppercase hex. If you are comparing a hash produced by one system with one produced by another, normalize both to lowercase with str.toLowerCase() before comparing. This is especially common when comparing git commit hashes, which are always lowercase, against a database value that was stored from an uppercase-outputting source.

Node.js Buffer.from(str, 'base64') silently ignores characters that are not valid base64. If you accidentally pass a base64url string (containing - and _) to Buffer.from with the 'base64' encoding instead of 'base64url', the invalid characters are silently dropped rather than throwing an error. This produces a shorter-than-expected buffer that decodes to the wrong bytes. Always use 'base64url' explicitly when you know the input uses the URL-safe alphabet.

Fixing base64 and hex encoding bugs

For base64 in URLs, switch to base64url. In Node.js 16+, use Buffer.toString('base64url') for encoding and Buffer.from(str, 'base64url') for decoding. In older Node versions or browsers, apply the character replacement manually: encoded.replace(/\+/g, '-').replace(/\//g, '_').replace(/=/g, '') after encoding, and add padding back before decoding: str + '='.repeat((4 - str.length % 4) % 4) before passing to Buffer.from(str, 'base64').

For btoa() failing on non-Latin1 strings in browsers, use the TextEncoder + Uint8Array pattern. Encode the string to UTF-8 bytes, then convert each byte to a Latin1 character, then call btoa(): btoa(String.fromCharCode(...new TextEncoder().encode(str))). For decoding: new TextDecoder().decode(Uint8Array.from(atob(b64), c => c.charCodeAt(0))). This pattern works in all browsers back to Chrome 38 and is the standard workaround for the btoa() Latin1 limitation.

For the size decision in API design: if you are sending binary data (images, encrypted ciphertext, raw hashes, file content) inside JSON, use base64. JSON requires string values to be valid Unicode, so raw bytes are not valid. Base64 gives a 33% overhead, which is generally acceptable. If bandwidth is critical, consider sending the binary data as a direct HTTP response body with Content-Type: application/octet-stream instead of embedding it in JSON.

For checksums and fingerprints displayed in logs, UIs, or CLI output, hex is almost always the better choice. The output is predictable in length (2 chars per byte), completely lowercase alphanumeric, and identical to what sha256sum, openssl dgst, and git rev-parse produce. This avoids discrepancies when developers manually compare values.

Padding, Unicode, and streaming edge cases

Base64 padding deserves careful attention in streaming contexts. When you encode data in chunks and concatenate base64 strings, the chunk boundaries must align to 3-byte boundaries or the padding characters of the first chunk will be in the middle of the output. For example, encoding a 4-byte buffer produces 'AAAA' with 8 chars (no padding because 4 bytes = 1 group of 3 + 1 leftover = 6 chars + 2 padding = 8). Concatenating two base64-encoded chunks without first concatenating the underlying byte arrays will produce incorrect output if either chunk is not a multiple of 3 bytes.

Hex encoding has no padding issue because it operates on individual bytes without any grouping. This makes hex more suitable for streaming scenarios where you do not know the total length in advance. The crypto.Hash class's digest('hex') call in Node.js always produces a hex string because the hash output length is fixed and known in advance.

Browser btoa() and atob() are synchronous and operate on strings, not on ArrayBuffers. There is no streaming btoa() โ€” you must have the complete data before encoding. For large binary data in the browser, use the FileReader API with readAsDataURL(), which produces a data: URI containing base64, or use a chunked conversion approach that processes the ArrayBuffer in 65,535-byte slices (to avoid exceeding the maximum argument count for String.fromCharCode.apply()).

Hex strings are case-insensitive when used as input to most APIs โ€” Buffer.from('DEADBEEF', 'hex') and Buffer.from('deadbeef', 'hex') produce identical results in Node.js. However, when used as dictionary keys, map keys, or database primary keys, 'DEADBEEF' and 'deadbeef' are different strings. Always normalize hex to lowercase before storing or comparing to prevent duplicate key errors and lookup failures.

Common base64 and hex encoding mistakes

Double-encoding is a frequent mistake in multi-layer systems. If a middleware base64-encodes an already-base64-encoded value, the output looks like valid base64 but decodes to another base64 string rather than the original data. This often happens in logging pipelines where an engineer adds "encode before logging" at multiple layers. Debug by checking the length: a 32-byte input should produce 44 base64 chars after one encoding. If you see 60 or 64 chars, it has been double-encoded.

Using hex for large binary data is a common performance mistake in backend services. If you store a 10 MB image as a hex string in a database text column, it occupies 20 MB. Reading and writing that column is slower, network transfer costs more, and the column may exceed the maximum row size of some databases (MySQL's default max_allowed_packet is 4 MB). Use base64 for large binary storage, or better yet, store binary data in a BYTEA column (PostgreSQL) or BLOB column (MySQL) and avoid any text encoding.

Confusing Buffer.from(str) with Buffer.from(str, 'hex') or Buffer.from(str, 'base64') is a source of silent bugs. Buffer.from('deadbeef') treats the string as UTF-8 and creates an 8-byte buffer containing the ASCII codes of the characters d, e, a, d, b, e, e, f. Buffer.from('deadbeef', 'hex') creates a 4-byte buffer containing the bytes 0xDE, 0xAD, 0xBE, 0xEF. Always specify the encoding explicitly when constructing a Buffer from an encoded string.

Using hex for session tokens and API keys is a common over-engineering choice. A 32-byte random value encoded as hex produces a 64-character token. The same value encoded as base64url produces a 43-character token. Both have the same entropy (256 bits), but the hex version is 49% longer. For tokens that appear in HTTP headers, cookies, or URLs, the shorter base64url format is preferable. For secrets stored in .env files that developers read, hex is arguably more readable.

Choosing the right encoding for each use case

Use a simple decision framework based on the destination and purpose of the encoded value. For URL parameters, path segments, JWT tokens, CSRF tokens, and OAuth state values: use base64url. It is compact, URL-safe without percent-encoding, and has native Node.js support in Buffer encoding 'base64url' since Node 14.

For displaying cryptographic hashes (SHA-256, SHA-512, BLAKE2), content-addressed identifiers (git object IDs), and MAC values to humans or in CLI tools: use lowercase hex. It matches the output format of virtually all cryptographic command-line tools and is easy to compare character by character. Standardize on lowercase in your codebase and use .toLowerCase() on any hex values arriving from external sources.

For embedding binary data inside JSON, XML, HTML data attributes, or email bodies: use standard base64. JSON does not support binary values natively, and base64 is the universally accepted encoding for this purpose. At 33% overhead, it is much more efficient than hex for large payloads. If the JSON will later be placed in a URL, use base64url to avoid a double-encoding step.

For database storage of binary data: prefer native binary columns (BYTEA in PostgreSQL, BLOB/VARBINARY in MySQL/SQLite) over any text encoding. If you must store as text (some ORMs or schema requirements), use base64 rather than hex to halve the storage size. Document the encoding choice in the column comment so future developers know which Buffer encoding to use when reading the column.

Quick fix checklist

  • โœ“Check whether + or / characters in base64 strings are breaking URL or JSON parsing
  • โœ“Switch to base64url for any token that will appear in a URL, cookie, or HTTP header
  • โœ“Use lowercase hex for checksums and hashes to match sha256sum and other CLI tools
  • โœ“Avoid hex encoding for large binary data (100% overhead) โ€” use base64 (33% overhead) instead
  • โœ“Use Buffer.from(str, 'base64url') not 'base64' when decoding JWT or OAuth tokens
  • โœ“Always specify the encoding in Buffer.from(str, 'hex') โ€” omitting it reads the string as UTF-8
  • โœ“Normalize hex to lowercase before storing or comparing to prevent case-sensitivity bugs
  • โœ“For btoa() in browsers, pre-encode UTF-8 strings with TextEncoder before passing to btoa()

Related guides

Frequently asked questions

What is the exact size overhead of base64 encoding?

Base64 encodes 3 bytes as 4 characters, so the overhead is exactly 4/3 = 1.333 โ€” a 33.3% increase. A 100-byte input produces ceil(100/3)*4 = 136 characters. After base64 encoding, the output length is always a multiple of 4 (padded with = if necessary). In practice, you should also account for line breaks added by MIME base64 which splits output at 76-character intervals, adding additional overhead.

What is the size overhead of hex encoding?

Hex encodes each byte as 2 characters (one for each nibble), so the overhead is exactly 2x โ€” a 100% increase. A 32-byte SHA-256 hash produces a 64-character hex string. A 1 MB file becomes a 2 MB hex string. There is no padding in hex encoding because every byte always maps to exactly 2 characters with no remainder. Hex is always even-length.

Why does atob() fail on some base64 strings?

atob() only accepts strings where every character is in the Latin1 (ISO 8859-1) range (0-255). It does not accept UTF-8 strings with multi-byte characters. Additionally, atob() rejects base64url strings that contain - or _ instead of + and /. To decode base64url in a browser, first convert - back to + and _ back to /, restore any missing = padding, then call atob(). Or use a modern approach with Uint8Array and TextDecoder.

What is the difference between base64 and base64url?

Standard base64 uses + and / as alphabet characters and = for padding. Base64url (RFC 4648 ยง5) replaces + with -, / with _, and typically omits = padding. This makes base64url safe for use in URLs, HTTP headers, JSON field names, and filenames without percent-encoding. JWTs use base64url for all three parts. Node.js 14+ supports 'base64url' as a Buffer encoding directly.

How do I encode a UTF-8 string to base64 in a browser?

Use TextEncoder to convert the string to UTF-8 bytes, then convert those bytes to a Latin1 string for btoa(): btoa(String.fromCharCode(...new TextEncoder().encode(str))). For strings longer than ~32KB, use a loop instead of the spread operator to avoid exceeding the argument limit. Alternatively, use Buffer.from(str).toString('base64') in Node.js, which handles UTF-8 natively.

When should I use hex instead of base64?

Use hex when human readability and comparison are important: displaying cryptographic hashes, content fingerprints, MAC values, or debug dumps. Hex output exactly matches what sha256sum, openssl dgst, and git show produce, making it easy to copy-paste and verify manually. Use base64 when you need to transmit or store binary data efficiently and human readability is not a concern.

Does Buffer.from() require the encoding parameter?

Yes, always specify it when the input is an encoded string. Buffer.from('deadbeef', 'hex') creates a 4-byte buffer. Buffer.from('deadbeef') creates an 8-byte buffer containing the ASCII codes of the characters. The default encoding is 'utf8', not 'binary' or 'hex'. Omitting the encoding is a common source of silent bugs where the buffer appears to succeed but contains completely different bytes.

Can I mix hex and base64 in the same system?

Yes, but document which encoding each field uses. A common pattern is storing binary data (encryption keys, file content) in base64 in the database, while logging and displaying cryptographic hashes in hex for developer readability. The only requirement is consistency: always encode and decode using the same format for each field. Using a type wrapper or Zod schema with a custom transform helps enforce this at the API boundary.

All tools run in your browser. Your data never leaves your device. Last updated: 2026-05-06.