The Unicode Text Converter transforms text between multiple Unicode representations — Unicode escape sequences, HTML entities, code point notation, UTF-8 hex bytes, and plain text. Use it to encode special characters for source code, HTML templates, JSON strings, and regex patterns, or to decode escape sequences back to readable text.
\uXXXX escapes, Python \UXXXXXXXX escapes, HTML numeric entities (XXXX;), CSS Unicode escapes (\XXXX), or raw code point values (U+XXXX).Different languages and formats use different conventions for embedding Unicode characters as escape sequences:
\uXXXX for BMP characters (U+0000–U+FFFF). Supplementary characters use a surrogate pair: 😀 for 😀 (U+1F600). ES6 added \u{1F600} syntax to avoid surrogate pairs.\uXXXX for BMP characters; \UXXXXXXXX (8 hex digits) for supplementary characters. Python 3 strings are Unicode natively, so escapes are only needed in byte literals or regex patterns.€) or hexadecimal (€) for the Euro sign (€). Named entities like & or © are supported for common characters.\XXXX followed by optional whitespace. Used in content: declarations and @font-face glyph ranges. Example: \2764 for ❤.\uXXXX (4 hex digits). Supplementary characters require two \u escapes forming a surrogate pair, same as JavaScript.\uXXXX (4 hex digits) or \UXXXXXXXX (8 hex digits) in string literals. Supported in C99 and C++11 and later.Unicode code points are written as U+ followed by 4–6 hex digits. Common ranges:
"😀" and "\u{1F600}" both represent 😀, but str.length returns 2 for the surrogate pair form. Use the ES6 spread or Array.from() to get the correct character count.&) with numeric entities (&) is valid but inconsistent. Parsers handle both, but tools that process HTML as text may not expand numeric entities unless explicitly configured.é and é are equivalent), but some validators or linters enforce a specific case. Check your style guide.\E9 a encodes éa, not U+00E9A.\uXXXX escapes for control characters (U+0000–U+001F). Characters above U+FFFF must use surrogate pairs in strict JSON; some parsers accept the ES6 \u{...} extension."😀".codePointAt(0).toString(16) → 1f600. In Python, ord("😀") → 128512, then hex(128512) → 0x1f600. In the browser DevTools console, the same JavaScript expression works instantly.?) means the font does not contain a glyph for that code point, or the text was decoded with the wrong encoding. Install a Unicode-complete font (Noto, Symbola) for the missing script, or verify the encoding used to read the source data.é can be a single code point (U+00E9) or two (e + U+0301 combining acute accent). String length operations count code points, not grapheme clusters. Use ICU or language-specific Unicode segmentation libraries when counting user-visible characters.I think it is inevitable that people program poorly. Training will not substantially help matters. We have to learn to live with it.