Unicode Encoding Guide — UTF-8, UTF-16, Code Points, and Bytes

💡Unicode encoding maps characters to bytes so text can move safely between systems. UTF-8 is the default on the modern web, and ToolDock Base64 and URL tools help inspect encoded payloads when text breaks.

Pattern Examples

Wrong source encoding

❌ Wrong

Buffer.from('café', 'latin1').toString('utf8')

✅ Fixed

Buffer.from('café', 'utf8').toString('utf8')

Decoding with the wrong source bytes corrupts accented characters.

URL without encoding

❌ Wrong

https://example.com/search?q=München

✅ Fixed

https://example.com/search?q=M%C3%BCnchen

Unicode query strings should be URL-encoded for transport safety.

Emoji in limited charset DB

❌ Wrong

store '✅ done' in a latin1 column

✅ Fixed

store '✅ done' in a utf8mb4-capable column

Older charsets cannot represent many modern characters and emoji.

Inspect Encoded Text

Real-World Usage

CSV import bug

Buffer.from(file, 'latin1').toString('utf8')

The wrong source encoding turns names like `José` into corrupted text.

API payload mismatch

fetch('/api', { body: JSON.stringify({ city: 'München' }) })

If one side assumes the wrong encoding, signatures and body hashes stop matching.

Database migration

ALTER DATABASE app CHARACTER SET utf8mb4

Unicode-safe storage prevents emoji and multilingual text loss.

Related Guides

Frequently Asked Questions

What is Unicode encoding?

It is the system that maps characters to bytes so text can be stored, transmitted, and decoded consistently across systems.

Why is UTF-8 used so often?

UTF-8 is compact for ASCII-heavy text, backward-compatible with ASCII, and widely supported across browsers, servers, and databases.

What causes mojibake or broken characters?

The most common cause is decoding bytes with the wrong character encoding, such as treating UTF-8 data as latin1.

All tools run in your browser. Your data never leaves your device.