Unicode Encoding Guide — UTF-8, UTF-16, Code Points, and Bytes
💡Unicode encoding maps characters to bytes so text can move safely between systems. UTF-8 is the default on the modern web, and ToolDock Base64 and URL tools help inspect encoded payloads when text breaks.
Pattern Examples
Wrong source encoding
❌ Wrong
Buffer.from('café', 'latin1').toString('utf8')✅ Fixed
Buffer.from('café', 'utf8').toString('utf8')Decoding with the wrong source bytes corrupts accented characters.
URL without encoding
❌ Wrong
https://example.com/search?q=München✅ Fixed
https://example.com/search?q=M%C3%BCnchenUnicode query strings should be URL-encoded for transport safety.
Emoji in limited charset DB
❌ Wrong
store '✅ done' in a latin1 column✅ Fixed
store '✅ done' in a utf8mb4-capable columnOlder charsets cannot represent many modern characters and emoji.
Inspect Encoded Text
Real-World Usage
CSV import bug
Buffer.from(file, 'latin1').toString('utf8')The wrong source encoding turns names like `José` into corrupted text.
API payload mismatch
fetch('/api', { body: JSON.stringify({ city: 'München' }) })If one side assumes the wrong encoding, signatures and body hashes stop matching.
Database migration
ALTER DATABASE app CHARACTER SET utf8mb4Unicode-safe storage prevents emoji and multilingual text loss.
Related Guides
Frequently Asked Questions
What is Unicode encoding?
It is the system that maps characters to bytes so text can be stored, transmitted, and decoded consistently across systems.
Why is UTF-8 used so often?
UTF-8 is compact for ASCII-heavy text, backward-compatible with ASCII, and widely supported across browsers, servers, and databases.
What causes mojibake or broken characters?
The most common cause is decoding bytes with the wrong character encoding, such as treating UTF-8 data as latin1.
All tools run in your browser. Your data never leaves your device.