Base64 decoding explained with real code examples
Quick answer
💡Call atob() in browsers to decode a standard Base64 string to a binary string, then use TextDecoder to convert bytes to UTF-8 text. In Node.js use Buffer.from(str, 'base64url') for URL-safe tokens or Buffer.from(str, 'base64') for standard strings. In Python use base64.b64decode() after restoring missing = padding. For URL-safe Base64 replace - with + and _ with / before decoding with older APIs.
Error symptoms
- ✕
InvalidCharacterError from atob in the browser - ✕
Python raises binascii.Error: Incorrect padding - ✕
Decoded text contains replacement characters or garbled mojibake - ✕
Node.js Buffer silently returns truncated output for URL-safe input - ✕
data:image/png;base64, prefix breaks the decoder - ✕
Decoded binary file is corrupt or truncated
Common causes
- •Missing = padding characters at the end of the string
- •URL-safe Base64 uses - and _ instead of + and /
- •atob returns a binary string, not Unicode text
- •data URL prefix included in the input
- •TextDecoder step skipped for multi-byte UTF-8 text
- •Treating Base64 as encryption rather than encoding
When it happens
- •Decoding JWT payloads in the browser
- •Embedding small images inline in HTML or CSS
- •Passing encoded values between browser and Node.js server
- •Decoding environment variables stored in Base64
- •Reading Base64-encoded binary files or PDFs
How atob() and Buffer.from() handle encoded strings
The browser's atob() function and Node.js's Buffer.from(str, 'base64') are the two dominant Base64 decoders in JavaScript, but they operate at very different abstraction levels — and conflating them is the most common source of decoding bugs.
atob() is specified by the HTML Living Standard. It accepts a standard Base64 string (using the alphabet A-Z, a-z, 0-9, +, / and = padding characters), decodes it byte by byte, and returns a DOMString where each character's code point equals the raw byte value. This is called a binary string. The function performs no character encoding — it simply stores each decoded byte as a Latin-1 character. For ASCII-only content (code points 0-127) this distinction is invisible. The moment your original data contained multi-byte UTF-8 characters — accented letters, emoji, CJK ideographs — each byte of those sequences lands in its own character position, producing mojibake instead of the intended Unicode text. atob() throws InvalidCharacterError for any character outside the standard alphabet, including the URL-safe characters - and _, whitespace, and newlines.
Buffer.from(str, 'base64') in Node.js operates differently. It returns a Buffer — a Uint8Array subclass — containing the raw decoded bytes without applying any text encoding. You then call .toString('utf8') or another encoding on the Buffer to interpret those bytes as text. This two-step design makes UTF-8 handling reliable because Node.js applies the text encoding explicitly after the binary decode. Buffer is also substantially more permissive than atob(): it silently ignores characters outside the standard alphabet (including - and _ from URL-safe variants) and tolerates missing padding. That permissiveness is actually a hazard — it means you can pass URL-safe Base64 to Buffer.from with 'base64' and get silently truncated output rather than an error.
Node.js 16 added the 'base64url' encoding identifier specifically to address this. Buffer.from(str, 'base64url') handles the URL-safe alphabet correctly and accepts the missing padding that URL-safe encoders often omit. This is the right choice for JWT segments, OAuth tokens, and any value encoded by libraries following RFC 4648 Section 5.
Python's base64.b64decode() returns a bytes object, and you then call .decode('utf-8') to get a string. Python is strict about padding: a string whose length is not a multiple of 4 raises binascii.Error: Incorrect padding. The standard workaround is to add extra padding before decoding: s += '=' * (-len(s) % 4). Python also provides base64.urlsafe_b64decode() for the URL-safe variant. These explicit steps make Python's two-stage approach the most transparent of the three environments for understanding what is actually happening.
Example 1
atob returns a binary string; TextDecoder is required for any multi-byte text.
UTF-8 text with emoji in the browser
❌ Wrong
// Broken: treats atob output as final text
const encoded = "SGVsbG8g8J+Zgg=="; // encodes "Hello 🙂"
const text = atob(encoded);
console.log(text);
// Output: Hello ð (garbled multi-byte chars)
const obj = { message: text };
console.log(obj.message.length); // wrong length
// JSON.parse may throw or produce corrupted values✅ Fixed
// Fixed: convert binary string to bytes, then decode UTF-8
const encoded = "SGVsbG8g8J+Zgg=="; // encodes "Hello 🙂"
const binaryStr = atob(encoded);
const bytes = Uint8Array.from(binaryStr, c => c.charCodeAt(0));
const text = new TextDecoder("utf-8").decode(bytes);
console.log(text);
// Output: Hello 🙂
console.log(text.length); // correct codepoint lengthatob decodes Base64 to a Latin-1 binary string, not Unicode. ASCII content looks fine, but UTF-8 multi-byte characters — accents, emoji, CJK — appear garbled because each byte lands in its own code unit. Converting the binary string to a Uint8Array and feeding it through TextDecoder('utf-8') interprets the bytes correctly and produces the expected Unicode string.
Base64url vs standard Base64: the padding and character difference
RFC 4648 defines two closely related encoding schemes. Section 4 defines standard Base64 using the alphabet A-Z, a-z, 0-9, + and /, with = as the padding character to bring output length to a multiple of 4. Section 5 defines the URL-safe variant, called Base64url, which substitutes + with - and / with _ and makes = padding optional. The two variants are visually similar but not interchangeable with most decoders without normalization.
The URL-safe variant exists because + is interpreted as a space by application/x-www-form-urlencoded parsers, and / is a path separator in URL hierarchies. Standard Base64 embedded in a URL query parameter or path segment requires percent-encoding of those two characters, producing longer and harder-to-read URLs. Base64url avoids the problem by choosing characters — hyphen and underscore — that are safe in URLs and require no escaping. JSON Web Tokens use Base64url for all three segments (header, payload, signature). AWS pre-signed S3 URLs, Google OAuth access tokens, and PKCE code verifiers all use Base64url.
The padding situation compounds the alphabet difference. Standard Base64 output length is always a multiple of 4. If the input byte count is not a multiple of 3, the encoder adds one or two = characters. Many Base64url implementations drop the padding entirely because = can still require escaping in some URL contexts (such as URL parsers that treat it as a key-value separator). When you receive a Base64url string without padding, you can reconstruct the correct number of = characters: (4 - len % 4) % 4 gives you the count. In JavaScript: str.padEnd(Math.ceil(str.length / 4) * 4, '='). In Python: s += '=' * (-len(s) % 4). Adding two extra = is always safe because decoders ignore trailing padding beyond what is needed.
MIME Base64 (used in email attachments and PEM certificate blocks) adds a third variant: it inserts a CRLF newline every 76 characters per RFC 2045. atob() rejects these newlines because they are not in the standard Base64 alphabet. Before calling atob() on PEM or MIME content, strip all whitespace: str.replace(/\s/g, ''). GNU base64 -d on Linux ignores whitespace; BSD base64 -D on macOS does not, so cross-platform CLI scripts need explicit stripping.
A practical detection heuristic: if the string contains + or / it is standard Base64. If it contains - or _ it is Base64url. If it contains only A-Za-z0-9 you cannot determine the variant from the characters alone — rely on documentation or context. Length modulo 4 equal to 0 suggests standard Base64 or padded Base64url; any other remainder suggests Base64url with padding omitted (valid for remainders of 2 or 3, never 1).
JWT tokens use Base64url, not standard Base64
JWT header and payload segments use Base64url: replace - with + and _ with / before calling atob(), or use Buffer.from(segment, 'base64url') in Node.js 16+. Missing = padding is intentional — add it back with padEnd. Do not decode twice: the payload decodes to JSON, not to another Base64 string.
Example 2
JWT and OAuth tokens use the URL-safe alphabet without padding. Use 'base64url' encoding in Node.js 16+.
URL-safe Base64 token in Node.js
❌ Wrong
// Broken: URL-safe alphabet passed to standard base64 decoder
const token = "eyJzdWIiOiJ1c2VyLTEyMyIsInNjb3BlIjoicmVhZDphbGwifQ";
// Buffer with 'base64' silently ignores - and _ characters
const raw = Buffer.from(token, "base64").toString("utf8");
console.log(JSON.parse(raw));
// May produce truncated or corrupted JSON silently✅ Fixed
// Fixed: use 'base64url' encoding in Node.js 16+
const token = "eyJzdWIiOiJ1c2VyLTEyMyIsInNjb3BlIjoicmVhZDphbGwifQ";
const raw = Buffer.from(token, "base64url").toString("utf8");
console.log(JSON.parse(raw));
// → { sub: 'user-123', scope: 'read:all' }
// Node.js < 16 fallback:
const padded = token.replace(/-/g, "+").replace(/_/g, "/")
.padEnd(Math.ceil(token.length / 4) * 4, "=");
const rawLegacy = Buffer.from(padded, "base64").toString("utf8");Buffer.from(str, 'base64') silently discards characters outside the standard alphabet, including the - and _ used in URL-safe Base64. This produces corrupted output with no error thrown. Node.js 16 added the 'base64url' identifier that handles the alternate alphabet and missing padding correctly. For older Node.js, replace the URL-safe characters and restore padding manually before using 'base64'.
Decoding binary data and image files
When the Base64-encoded payload represents binary data — a PNG image, a PDF, a gzip archive, an audio file — the decoding process requires extra care. The decoded bytes must be kept as a byte sequence, not converted to text, because arbitrary binary byte values are not valid UTF-8 sequences. Calling .toString('utf8') or TextDecoder on raw binary bytes will corrupt the data wherever bytes fall outside valid UTF-8 ranges.
In the browser, the standard workflow for a Base64-encoded image is: (1) strip any data URI prefix by splitting on the first comma and taking the second part; (2) call atob() to get the binary string; (3) convert to Uint8Array using Uint8Array.from(binaryStr, c => c.charCodeAt(0)); (4) create a Blob with the correct MIME type: new Blob([bytes], { type: 'image/png' }); (5) create an object URL with URL.createObjectURL(blob) and assign it to img.src. The data URI shortcut — setting img.src = 'data:image/png;base64,...' — bypasses all these steps and lets the browser's built-in parser handle it, but data URIs have size limits in some browsers (Safari historically capped them around 2 MB for certain contexts) and cannot be stored in the browser's disk cache, which matters for repeated loads.
In Node.js, binary file decoding is simpler because Buffer.from(str, 'base64') already gives you a Buffer — a subclass of Uint8Array — containing the raw bytes. Pass it directly to fs.writeFileSync(path, buffer) or a writable stream without calling .toString(). The error to avoid is calling .toString() at all: Buffer.toString('utf8') on arbitrary binary data will silently corrupt bytes in the range 128-191 because UTF-8 continuation bytes require multi-byte sequences, and random binary bytes rarely conform to that pattern. Node.js will not throw; it will just produce broken file content.
In Python, base64.b64decode() returns a bytes object. Write it with binary mode: with open('output.pdf', 'wb') as f: f.write(base64.b64decode(encoded)). The 'wb' flag (write binary) is essential on Windows, where 'w' mode applies a Windows line-ending conversion that corrupts binary files. On macOS and Linux the distinction does not matter in practice, but using 'wb' is universally correct and portable.
For streaming large Base64 payloads (multi-megabyte files) in Node.js, avoid loading the entire string into memory. Use a Transform stream that accumulates input in 4-character chunks and decodes them immediately: this limits memory usage to roughly the chunk size rather than allocating both the full Base64 string and the full decoded buffer simultaneously. The AWS SDK v3 and Google Cloud Storage client use this pattern internally for binary object transfers.
UTF-8 text roundtrip through Base64
A correct Base64 roundtrip for Unicode text requires explicit UTF-8 encoding on the encode side and explicit UTF-8 decoding on the decode side. JavaScript strings are UTF-16 internally, and atob/btoa operate on Latin-1 byte strings with code points in the range 0-255. Any character above U+00FF will cause btoa() to throw 'The string to be encoded contains characters outside of the Latin1 range.' This is an extremely common surprise for developers who test only with ASCII content.
The canonical encode-then-decode pattern in modern browsers uses TextEncoder and TextDecoder. To encode: (1) new TextEncoder().encode(text) gives you a UTF-8 Uint8Array; (2) String.fromCharCode(...bytes) converts the bytes to a binary string; (3) btoa(binaryStr) produces the Base64 output. To decode: (1) atob(encoded) produces a binary string; (2) Uint8Array.from(binaryStr, c => c.charCodeAt(0)) produces the byte array; (3) new TextDecoder('utf-8').decode(bytes) produces the correct Unicode string. TextEncoder and TextDecoder are supported in all modern browsers and in Node.js as globals since version 11.
In Node.js the roundtrip is dramatically simpler because Buffer separates the concerns cleanly. Encode: Buffer.from(text, 'utf8').toString('base64'). Decode: Buffer.from(encoded, 'base64').toString('utf8'). These two lines handle any valid Unicode string — including emoji, combining characters, and characters above the Basic Multilingual Plane — without any intermediate conversion. There is no Latin-1 restriction because Buffer.from with 'utf8' encoding converts the full Unicode string to UTF-8 bytes before the Base64 step.
In Python the pattern is equally explicit. Encode: base64.b64encode(text.encode('utf-8')).decode('ascii'). Decode: base64.b64decode(encoded.encode('ascii')).decode('utf-8'). The encode('utf-8') call converts the Python str to a bytes object that b64encode can process. The b64encode return value is itself bytes containing only ASCII characters, so .decode('ascii') produces a Python str ready for JSON serialization or HTTP headers. On the decode path, .decode('utf-8') at the end converts the raw decoded bytes back to a Python str.
A subtle production pitfall: if your text contains null bytes (U+0000) — possible with binary blobs accidentally routed through text paths — the roundtrip may appear to succeed locally but fail in C-based systems, PostgreSQL VARCHAR columns using C-string semantics, or HTTP headers that treat null as a terminator. Validate that text intended as human-readable strings does not contain null bytes before encoding: text.indexOf('\0') === -1 is the browser check, '\x00' not in text is the Python check. Log a warning and reject the input rather than silently encoding corrupted data.
TextDecoder is the correct tool for UTF-8 text
Always pipe atob() output through new TextDecoder('utf-8').decode(bytes) for any text that might contain non-ASCII characters. Accented letters, emoji, and CJK characters all require this step. Skipping it produces invisible bugs: English content looks fine, but any user with non-Latin characters in their data sees garbled output.
When decoded output looks garbled or truncated
Garbled output after Base64 decoding falls into three distinct patterns, each pointing to a different root cause. Identifying the pattern saves significant debugging time.
The first pattern is mojibake: you see Latin-1 characters like é, è, £, or sequences of special characters where accented letters or non-Latin characters should appear. This is the signature of UTF-8 bytes being interpreted as Latin-1 or ISO-8859-1. The two UTF-8 bytes encoding é (U+00E9) are 0xC3 and 0xA9. Treated as individual Latin-1 characters those render as à and ©. This happens when atob() output is used directly as a string without routing it through TextDecoder. The fix is always the same: pipe through new TextDecoder('utf-8').decode(bytes). The same pattern appears in Python when you call .decode('latin-1') instead of .decode('utf-8') on the decoded bytes.
The second pattern is truncation or replacement characters (the Unicode replacement character U+FFFD, which renders as a question mark or box). This indicates that the Base64 input itself was cut short before being decoded. A truncated Base64 string produces truncated binary output. The most reliable diagnostic is checking the string length: a valid standard Base64 string has length that is a multiple of 4. URL-safe Base64 without padding can have any length as long as the remainder modulo 4 is 0, 2, or 3 — a remainder of 1 is always invalid and proves the string is truncated. Check where the string is constructed and whether any character-length limits in database columns, HTTP headers, or API parameters are clipping it.
The third pattern is complete garbage — seemingly random high-byte characters with no recognizable structure. This almost always means the original encoded content was binary (an image, a compressed file, a serialized binary format) and you are attempting to decode it as UTF-8 text. The bytes decoded correctly but they are not text. Confirm the MIME type or content type of the original data before attempting text decoding. Many API responses include a Content-Type or a type field alongside the Base64 payload that tells you whether to treat the decoded bytes as text, JSON, or binary.
Double-decoding is a related trap. A JWT payload is Base64url-encoded JSON. Decode it once and you get a JSON string. If that JSON string happens to start with 'ey' (which is the Base64url encoding of '{'), a developer might decode it again thinking the output is still encoded. The second decode produces garbage. Always parse the once-decoded output with JSON.parse rather than decoding again. A simple guard: if decoding produces a valid JSON string, parse it — do not decode it a second time.
Validating Base64 strings before decoding
Validating a Base64 string before attempting to decode it produces better error messages, prevents opaque exceptions from propagating up the call stack, and adds a layer of protection in security-sensitive contexts.
The fastest validation approach is a regular expression combined with a length check. For standard Base64: /^[A-Za-z0-9+/]*={0,2}$/.test(str) validates the alphabet and confirms at most two = padding characters; str.length % 4 === 0 confirms correct length. Both conditions must pass for a valid standard Base64 string. For Base64url without padding: /^[A-Za-z0-9_-]*$/.test(str) validates the alphabet, and str.length % 4 !== 1 confirms the string is not truncated. These checks are pure string operations — no allocation, no decode attempt — and run in microseconds even for large strings.
A try-catch around the decode call is the minimum viable approach and is correct for production code, but it gives you the least information. In browsers, atob() throws InvalidCharacterError; in Python, base64.b64decode() raises binascii.Error; in Node.js, Buffer.from() does not throw at all and may silently produce wrong output. The try-catch approach only catches the error cases that throw — it does not detect the Node.js silent-corruption scenario. Combine the regex check with the decode call for reliable coverage.
For security-sensitive contexts — decoding tokens before inspecting claims, decoding encrypted payloads, decoding signed values — validation should include length bounds. A Base64url-encoded 32-byte random nonce should be exactly 43 characters (without padding). A JWT access token from a given issuer will fall within a predictable length range. If the input is wildly longer or shorter than expected, reject it before decoding — do not let unbounded input reach the decode step. The estimate formula Math.floor(str.replace(/=+$/, '').length * 3 / 4) gives the decoded byte count without performing the decode, which lets you enforce size limits cheaply.
Content validation after decoding is equally important. A string that passes Base64 syntax validation and decodes without errors might still be the wrong type of content. If you expect JSON, verify the decoded string starts with { or [ before calling JSON.parse. If you expect a UUID, verify it matches the UUID format after decoding. If you expect a PNG, verify the decoded bytes start with the PNG magic bytes (89 50 4E 47). Reject content that does not match the expected format at this stage rather than allowing the downstream parser to throw an opaque error.
For file upload workflows where users paste Base64-encoded files, enforce a maximum encoded length before decoding. Base64-encoded content is approximately 33% larger than the binary original, so a maximum decoded file size of 5 MB corresponds to a maximum encoded string length of approximately 6.8 million characters. Checking the string length before allocating a Buffer prevents memory exhaustion from pathological inputs.
Base64 decode checklist
- ✓Strip any data URL prefix (data:image/png;base64,) before decoding.
- ✓Check if the string uses - and _ (URL-safe) and use 'base64url' or normalize first.
- ✓Restore missing = padding: pad to the next multiple of 4 characters.
- ✓In browsers, pipe atob output through TextDecoder for any UTF-8 text.
- ✓In Node.js, use Buffer.from(str, 'base64url') for JWT and OAuth tokens.
- ✓For binary files, write the Buffer directly without converting to string.
- ✓Validate alphabet and length before decoding in security-sensitive code.
- ✓Avoid decoding twice — JWT payloads decode to JSON, not another Base64 string.
Frequently asked questions
Why does atob() throw InvalidCharacterError?
atob() only accepts the standard Base64 alphabet: A-Z, a-z, 0-9, +, /, and = padding. It throws InvalidCharacterError if the string contains - or _ (URL-safe characters), whitespace, newlines, or a data URL prefix like 'data:image/png;base64,'. Replace - with + and _ with / first, then strip any whitespace or prefix before calling atob().
How do I fix 'Incorrect padding' in Python?
Add trailing = characters until the string length is a multiple of 4. The shortest correct fix is s += '=' * (-len(s) % 4). This formula adds 0, 1, or 2 equals signs as needed and is always safe. If the string uses the URL-safe alphabet, use base64.urlsafe_b64decode(s + '==') — Python ignores excess padding characters.
What is the difference between Base64 and Base64url?
Base64url replaces + with - and / with _ and makes trailing = padding optional, producing strings safe for URL path and query parameters without percent-encoding. Standard Base64 uses + and / and always pads to a multiple of 4. JWTs, OAuth tokens, and PKCE verifiers use Base64url. PEM certificates and email attachments use standard Base64.
Can I use atob() in Node.js?
Yes, since Node.js 16 atob() and btoa() are available as globals that mirror the browser API. However, Buffer.from(str, 'base64url') is preferred for URL-safe tokens and Buffer.from(str, 'base64') for standard strings — they return a Buffer directly, accept larger inputs without stack overflow, and are more explicit about which variant is being decoded.
Why does my decoded image look corrupt?
Most likely you called .toString('utf8') or TextDecoder on the decoded binary bytes. Binary image data is not valid UTF-8, so the encoding conversion corrupts bytes above 127. For images and all binary files, keep the decoded result as a Buffer or Uint8Array and write it directly to disk or a Blob without any text encoding step.
How do I decode a data URL in JavaScript?
Split on the first comma and decode the remainder: const base64Part = dataUrl.split(',')[1]; const bytes = Uint8Array.from(atob(base64Part), c => c.charCodeAt(0)). The prefix 'data:image/png;base64' before the comma is metadata describing media type and encoding. Passing the full data URL directly to atob() throws InvalidCharacterError because the colon, slash, and semicolons are not valid Base64 characters.
What is the decoded size of a Base64 string?
Every 4 Base64 characters decode to 3 bytes. The formula is Math.floor(str.replace(/=+$/, '').length * 3 / 4) bytes. Base64 adds approximately 33% overhead compared to the original binary size — a 100 KB file becomes roughly 133 KB as Base64. For large payloads consider multipart uploads or pre-signed URLs instead.
Is it safe to decode Base64 from user input?
Decoding is mechanically safe, but trusting the decoded content is not. Validate the decoded output for expected format, size, and content type before using it. For tokens, verify the signature cryptographically — do not trust claims in the decoded payload. Never eval() decoded content or use it to construct SQL queries or shell commands without sanitization.
Why does my decoded output show é instead of é?
This is UTF-8 bytes being interpreted as Latin-1. The two UTF-8 bytes for é (0xC3 0xA9) display as the Latin-1 characters à and ©. This happens when atob() output is used directly as a string. Fix: convert the atob binary string to a Uint8Array and pass it to new TextDecoder('utf-8').decode(bytes).
Related guides
All tools run in your browser. Your data never leaves your device. Last updated: 2026-05-07.