Binary to Text Encoding: How Base64, Hex, and MIME Work

Quick answer

💡Binary-to-text encoding converts arbitrary bytes to a safe subset of printable ASCII so binary data can travel through text-only channels like email, JSON, HTML, and HTTP headers. Base64 is the most efficient scheme at 33% overhead; hex is simpler but doubles the size. For MIME email, use Content-Transfer-Encoding: base64. For URLs, use base64url (RFC 4648 §5) which replaces + with - and / with _.

Error symptoms

  • Email attachment corrupted after SMTP transmission — binary bytes mangled by line-ending conversion
  • JSON.parse error when including raw binary bytes in a JSON string value
  • HTTP 400 Bad Request when posting binary data without Content-Type: application/octet-stream
  • PNG or PDF file is unreadable after being stored as a plain string in a text database column
  • atob() throws InvalidCharacterError on a base64 string received from a non-browser environment
  • Multipart email part missing because the MIME boundary string appeared in the binary attachment

Common causes

  • Treating binary data as a UTF-8 string, causing bytes above 0x7F to be mangled by UTF-8 validation
  • Missing or incorrect Content-Transfer-Encoding header in a MIME email part
  • Using raw binary in a JSON body without encoding, which is invalid JSON
  • SMTP servers performing CRLF normalization on binary data that was not encoded before transmission
  • Selecting the wrong encoding (quoted-printable instead of base64) for binary attachments
  • Not accounting for the 76-character line-length limit in MIME base64, causing decoders to reject the payload

When it happens

  • When sending file attachments over SMTP where 8-bit binary is not guaranteed to pass through unchanged
  • When embedding a binary public key, certificate, or image in a REST API JSON response
  • When storing a file upload in a text-based storage system that does not support binary data
  • When generating a data: URI (data:image/png;base64,...) for inline images in HTML or CSS
  • When creating a multipart/mixed MIME message with both text and binary parts

Examples and fixes

Convert a file buffer to base64 for embedding in JSON, and decode it back on the client side.

Embedding a binary image in a JSON response using base64

❌ Wrong

// Server: sending raw binary in JSON — invalid
const fs = require('fs');
const imageBuffer = fs.readFileSync('logo.png');
res.json({
  name: 'logo.png',
  data: imageBuffer.toString() // UTF-8 decode of binary = garbled
});
// Client receives: { data: '\uFFFD\uFFFD...' } — replacement chars

✅ Fixed

// Server: encode binary to base64 for JSON transport
const fs = require('fs');
const imageBuffer = fs.readFileSync('logo.png');
const base64Data = imageBuffer.toString('base64');
res.json({
  name: 'logo.png',
  mimeType: 'image/png',
  encoding: 'base64',
  data: base64Data // safe ASCII string in JSON
});
// Client: decode back to binary
const imgBuffer = Buffer.from(response.data, 'base64');
// Or for browser: <img src={`data:image/png;base64,${response.data}`} />

Raw binary data cannot be placed in a JSON string. JSON strings must be valid Unicode sequences, but arbitrary binary bytes include values that are not valid UTF-8 sequences. Converting a binary buffer to a JavaScript string with toString() (defaulting to 'utf8') causes the JSON encoder to replace invalid byte sequences with the Unicode replacement character U+FFFD. Base64 encoding converts every 3 bytes to 4 printable ASCII characters, producing a string that is valid UTF-8, valid JSON, and identical on every platform. The 33% size overhead is the accepted cost for text channel compatibility.

A MIME email part carrying a binary PDF attachment must declare base64 encoding and wrap output at 76 characters per MIME spec.

Correct MIME Content-Transfer-Encoding for email attachments

❌ Wrong

// Wrong: binary content in MIME without encoding
const pdfBytes = fs.readFileSync('invoice.pdf');
const mimeMessage = [
  'MIME-Version: 1.0',
  'Content-Type: application/pdf; name="invoice.pdf"',
  'Content-Disposition: attachment; filename="invoice.pdf"',
  '', // missing Content-Transfer-Encoding
  pdfBytes.toString() // raw binary — SMTP will corrupt it
].join('\r\n');

✅ Fixed

const pdfBytes = fs.readFileSync('invoice.pdf');
// Encode to base64 with 76-char line wrapping (MIME RFC 2045)
const raw = pdfBytes.toString('base64');
const wrapped = raw.match(/.{1,76}/g).join('\r\n');
const mimeMessage = [
  'MIME-Version: 1.0',
  'Content-Type: application/pdf; name="invoice.pdf"',
  'Content-Disposition: attachment; filename="invoice.pdf"',
  'Content-Transfer-Encoding: base64',
  '',
  wrapped
].join('\r\n');

SMTP was originally designed for 7-bit ASCII text. Although modern SMTP servers support the 8BITMIME extension, not all mail relays and spam filters guarantee binary data passes through without modification. CRLF normalization, line-length limits, and dot-stuffing rules can all corrupt binary content. The MIME standard (RFC 2045) specifies that binary attachments must use Content-Transfer-Encoding: base64, with output wrapped at a maximum of 76 characters per line using CRLF line endings. The Content-Transfer-Encoding header tells the receiving mail client which decoding to apply when reconstructing the attachment.

Why binary data needs text encoding

Modern computer systems store and process data as bytes — values from 0 to 255. However, many of the protocols and file formats that move data between systems were designed for text, not binary. SMTP email, JSON, XML, HTML attributes, HTTP headers, and many database text column types can only reliably handle printable ASCII characters (0x20 to 0x7E, approximately characters 32-126) or a well-defined Unicode encoding like UTF-8. Bytes outside these ranges may be silently dropped, modified, or rejected entirely.

The problem is particularly acute for SMTP email. The original SMTP specification (RFC 821) required all message content to be 7-bit ASCII. Although the 8BITMIME extension (RFC 6152) has been widely supported since the 1990s, not every mail relay, spam filter, or legacy mail server in a message's path guarantees that 8-bit bytes will pass through unchanged. Line-ending normalization, dot-stuffing, and maximum line-length enforcement can all corrupt binary data that was not pre-encoded.

JSON has a different constraint: string values must contain valid Unicode sequences, not arbitrary bytes. A JPEG image file contains many byte sequences that are not valid UTF-8. Attempting to embed raw image bytes in a JSON string causes the JSON serializer to emit Unicode replacement characters (U+FFFD) for invalid sequences, corrupting the data silently. The receiver decodes what appears to be a valid JSON string but gets a different byte sequence than the original image.

HTTP headers are restricted to ISO 8859-1 characters with further restrictions on control characters. HTTP/2 headers use HPACK compression and require valid UTF-8. Neither supports arbitrary binary. This means authentication tokens, encryption keys, and other binary values that need to appear in HTTP headers must be encoded as base64 or hex before placement.

Identifying and diagnosing binary encoding problems

When a file arrives corrupted after email transmission, the most reliable diagnostic is to compare the SHA-256 hash of the attachment as sent with the hash of the file as received. If the hashes differ, the binary data was modified in transit. Open the raw MIME source of the email (most email clients have a Show Original option) and look for the Content-Transfer-Encoding header of the attachment part. If it is missing, or set to '7bit' or '8bit' instead of 'base64', the binary data was sent without encoding and was subject to modification.

For corrupted binary data in API responses, log the raw HTTP response body before any JSON parsing. Check whether the Content-Type response header is 'application/json' or 'application/octet-stream'. If JSON is expected but the body contains binary bytes, inspect the byte at offset where corruption begins. If you see EF BF BD (the UTF-8 encoding of U+FFFD), the server performed an incorrect UTF-8 conversion on binary data before serializing to JSON.

For data URIs not rendering images, verify the MIME type prefix matches the actual file format. data:image/jpeg;base64,... must contain valid base64-encoded JPEG bytes. A common mistake is using data:image/png for a JPEG file, which some browsers refuse to render. Also check for whitespace or newline characters in the base64 string — data URI parsers in some browsers reject base64 with embedded newlines, even though MIME base64 requires them.

To quickly validate a base64 string, check three conditions: the length is a multiple of 4 (including = padding), all characters are in the set A-Za-z0-9+/=, and the = padding appears only at the end as 0, 1, or 2 characters. For base64url, the valid characters are A-Za-z0-9-_ with no padding. Any deviation from these rules means the string is malformed.

Implementing binary-to-text encoding correctly

For MIME email attachments in Node.js, use the nodemailer library which handles Content-Transfer-Encoding: base64 and the 76-character line wrapping automatically when you provide an attachment as a buffer or file path. If you are constructing raw MIME manually, encode with buffer.toString('base64') and then wrap the result: base64Str.match(/.{1,76}/g).join('\r\n'). Always use CRLF (\r\n) as the line separator in MIME, not LF alone, per RFC 2045.

For embedding binary data in JSON, use buffer.toString('base64') on the Node.js side and Buffer.from(str, 'base64') to decode. Include an 'encoding' field in your JSON schema that declares the encoding used, so that future maintainers do not need to reverse-engineer whether a field is hex, base64, or UTF-8. A clear contract looks like: { "data": "...", "encoding": "base64", "mimeType": "image/png" }.

For data URIs in HTML and CSS, construct the URI as: 'data:' + mimeType + ';base64,' + buffer.toString('base64'). Data URIs have practical size limits in browsers — Chrome has approximately a 2 MB limit for img src data URIs. For larger images, use object URLs created with URL.createObjectURL(blob) instead. Reserve data URIs for small icons and inline SVGs that are used on every page load, saving one HTTP request per resource.

For UUencoding (the historical predecessor to MIME base64, used in USENET and early email), modern systems should not generate new UUencoded content. If you receive UUencoded files, most Unix systems include the uudecode utility, and Node.js has npm packages for legacy UUencode support. The UUencode alphabet maps 6-bit values to characters starting at ASCII 32 (space), producing output that is visually similar to random text and harder to debug than base64.

MIME line lengths, streaming, and data URI limits

MIME base64 (RFC 2045) requires output to be split into lines of at most 76 characters. This is distinct from standard base64 encoding, which produces no line breaks. Some MIME parsers reject base64 with lines longer than 76 characters. The safe approach is always to include line breaks when generating MIME base64, and to strip all whitespace when decoding any base64 string that might have come from a MIME source. Node.js Buffer.from(str, 'base64') already ignores whitespace; browser atob() does not and will throw on strings containing newlines.

When streaming large binary data through a base64 encoder, you must buffer input in multiples of 3 bytes to avoid incorrect padding mid-stream. Encoding a 1024-byte chunk produces a 1368-byte base64 string with a trailing = because 1024 mod 3 = 2. If the next chunk starts immediately after this =, the concatenated base64 is invalid because padding should only appear at the end of the complete base64 string, not at chunk boundaries. The correct approach is to accumulate input bytes and only encode when you have complete 3-byte groups, flushing any remainder only when the stream ends.

Data URIs are cached by the browser as part of the document, not as separate cacheable resources. The same image embedded as a data URI on 100 different pages will be decoded and stored 100 times in memory, whereas a linked image URL would be cached once via the HTTP cache. For performance, prefer linked image URLs for any image used more than once, and reserve data URIs for single-use icons or dynamically generated images.

URL length limits also affect base64 in URLs. While the HTTP specification does not define a URL length limit, practical limits exist: many web servers reject URLs longer than 8192 characters, and AWS CloudFront has a 20,480-character URL limit. A 1 KB binary value encoded as base64url produces approximately 1368 characters, safely within most limits. Anything larger should use POST request bodies instead of URL parameters.

Common binary encoding mistakes in production

Treating base64 encoding as encryption is a common and dangerous misunderstanding. Base64 is a lossless encoding, not encryption. Anyone who receives a base64 string can decode it instantly — base64 provides no confidentiality. A common mistake is encoding a user password in base64 before storing it in a database, believing it provides some protection. It does not. Use bcrypt or Argon2id for password storage.

Using quoted-printable encoding for binary attachments is another mistake. Quoted-printable is designed for text content that is mostly ASCII with occasional non-ASCII characters. It represents non-ASCII bytes as =XX hex escapes. For binary data, almost every byte triggers an escape, producing output that is larger than base64 and barely readable. Always use base64 for binary attachments in MIME email.

Not handling the 76-character MIME line limit is a frequently encountered bug in custom email generation. A developer generates correct base64 output but does not add line breaks. The resulting email works with Gmail and Outlook (lenient decoders) but corrupts attachments when relayed through older or stricter SMTP servers. Always add 76-character line wrapping to base64 content in MIME messages.

Double-encoding is a common bug in multi-layer systems. If middleware base64-encodes an already-base64-encoded value, the output looks like valid base64 but decodes to another base64 string rather than the original data. Debug by checking the length: a 32-byte input should produce 44 base64 chars after one encoding. If you see 60 or more chars, it has likely been double-encoded. Establish a clear data contract where encoding is applied exactly once at the serialization boundary.

Best practices for binary-to-text encoding

Choose the right encoding for each channel and document it explicitly. For HTTP APIs: base64 in JSON body fields, with an explicit 'encoding' metadata field. For URLs and HTTP headers: base64url with no +, /, or = characters. For MIME email: base64 with 76-character line wrapping and CRLF. For human-readable checksums: hex. For binary columns in PostgreSQL: no encoding needed — use BYTEA and let the ORM or database driver handle serialization transparently.

Never apply base64 encoding more than once to the same data. Double encoding is a common bug in systems where multiple layers each independently decide to safely encode data. Establish a clear data contract: encoding is applied exactly once at the serialization boundary (when writing to disk, network, or database) and removed exactly once at the deserialization boundary.

For large binary files such as images, videos, and compressed archives, avoid base64 entirely when possible. Use multipart/form-data for file uploads, which sends binary data in its native form with MIME boundaries and no encoding overhead. Use pre-signed URL patterns (AWS S3 pre-signed URLs, Google Cloud Storage signed URLs) for large file downloads to avoid proxying binary data through your application server. Reserve base64 for values that are small enough (under 64 KB) that the 33% overhead is acceptable.

Test your encoding implementation with edge cases: zero-length input (should produce an empty string), inputs of exactly 1, 2, and 3 bytes (tests all three base64 padding scenarios), and inputs containing all 256 possible byte values (stress-tests the character mapping). For MIME, also test with input lengths that are not multiples of 57 bytes, since 57 bytes produce exactly 76 base64 characters — the standard MIME line length.

Quick fix checklist

  • Add Content-Transfer-Encoding: base64 to every MIME part carrying binary data
  • Wrap MIME base64 output at 76 characters per line using CRLF line endings
  • Use base64 not quoted-printable for binary attachments in MIME email
  • Strip whitespace from base64 strings before decoding with atob() in the browser
  • Use base64url for tokens in URLs, cookies, and HTTP headers
  • Never use base64 as a substitute for encryption — it provides zero confidentiality
  • Use BYTEA or BLOB columns for binary data in databases rather than text encoding
  • Test encoding with inputs of 1, 2, and 3 bytes to cover all base64 padding cases

Related guides

Frequently asked questions

What does ASCII-safe mean in binary-to-text encoding?

ASCII-safe means the encoded output only uses characters in the printable ASCII range (approximately 0x20 to 0x7E, characters 32-126). These characters are guaranteed to pass through any text-only channel — SMTP servers, JSON parsers, XML processors, HTTP headers, and terminal emulators — without modification. Binary bytes above 0x7F can be modified, rejected, or interpreted as multi-byte UTF-8 sequences, corrupting the data.

What is UUencoding and should I still use it?

UUencoding (Unix-to-Unix encoding) was invented in 1980 for transferring binary files through USENET and early email. It was replaced by MIME base64 in the early 1990s because base64 has a cleaner alphabet and better parser support. Do not generate new UUencoded content. If you receive it, decode with uudecode on Unix or the npm uuencode package in Node.js.

What is Content-Transfer-Encoding in MIME email?

Content-Transfer-Encoding is a MIME header that declares how a message part has been encoded for transport. Valid values are: 7bit (plain ASCII text), 8bit (allows bytes 128-255), quoted-printable (text with occasional non-ASCII characters), base64 (arbitrary binary), and binary (raw binary, rarely used). For file attachments, always use base64. For HTML or plain text email bodies with mostly ASCII content, 7bit or quoted-printable are appropriate.

Why does atob() fail on base64 with newlines?

The browser's atob() function strictly validates its input and rejects any character not in the standard base64 alphabet, including whitespace and newlines. MIME base64 includes newlines for line-length compliance, but atob() is not a MIME base64 decoder. Strip all whitespace before calling atob(): base64Str.replace(/\s/g, ''). Node.js Buffer.from(str, 'base64') is more lenient and silently ignores whitespace.

What is a data URI and when should I use it?

A data URI embeds file content directly in a src or href attribute using the format data:mimeType;base64,base64data. It is useful for small icons, inline SVGs, and dynamically generated images where avoiding an HTTP request improves performance. Avoid data URIs for images larger than 5-10 KB — the base64 overhead plus the lack of separate HTTP caching makes them slower for larger resources than a properly cached external URL.

How do I detect if a string is base64-encoded?

Check three conditions: the length is a multiple of 4 (allowing for = padding), all characters are in the set A-Za-z0-9+/=, and the = padding (if present) only appears at the end as 0, 1, or 2 characters. Note that many strings satisfy these conditions by coincidence — 'test' is valid base64 that decodes to different bytes. Heuristic detection is unreliable; document the encoding in metadata instead.

Does base64 compress well with gzip?

Base64-encoded data compresses poorly compared to the original binary. Base64 uses only 64 distinct character values, reducing entropy per byte. Gzip and Brotli work less effectively on base64 than on original binary. If you need to send large binary data over HTTP, compress the raw binary first with gzip, then base64-encode the compressed result — compression before encoding, not after.

What is the maximum practical size for base64 in a JSON body?

There is no hard limit in JSON or HTTP specifications, but practical limits apply. Most HTTP frameworks have default body size limits of 1-10 MB. Base64 adds 33% overhead, so a 10 MB limit supports about 7.5 MB of binary data. AWS API Gateway has a 10 MB body limit and a 6 MB base64 limit for Lambda integration. For files larger than a few MB, use direct binary upload with multipart/form-data or pre-signed upload URLs instead.

All tools run in your browser. Your data never leaves your device. Last updated: 2026-05-06.