{ Unicode Code Point Inspector }

// inspect code points, names, blocks, and escapes

Inspect Unicode code points, names, blocks, and escape sequences for any character or string. Free, browser-based, no sign-up required.

Paste or type any text, emoji, symbols, or Unicode characters
π•Œ

Ready to inspect

Paste a string and click Inspect

HOW TO USE

  1. 01
    Paste your string

    Type or paste any text into the input box β€” plain text, emoji, symbols, CJK, RTL, or any Unicode characters.

  2. 02
    Choose escape format

    Select your preferred output format: JavaScript, Python, CSS, HTML entities, URL encoding, or plain U+ code points.

  3. 03
    Inspect and copy

    Click Inspect to see every code point, its Unicode name, block, plane, and escape sequence. Copy individual rows or the full table.

FEATURES

Code Points Unicode Names Block Detection Multi-format Escapes Emoji Support Surrogate Pairs

USE CASES

  • πŸ”§ Debugging Unicode encoding issues in code
  • πŸ”§ Finding the escape sequence for a special character
  • πŸ”§ Identifying unknown glyphs or symbols
  • πŸ”§ Working with emoji and multi-codepoint sequences
  • πŸ”§ Preparing Unicode strings for CSS content property
  • πŸ”§ Auditing user input for invisible/control characters

WHAT IS THIS?

The Unicode Code Point Inspector breaks any string into its individual Unicode code points, revealing the name, block, category, plane, and escape sequences for every character β€” including emoji, CJK ideographs, combining marks, and invisible control characters.

RELATED TOOLS

FREQUENTLY ASKED QUESTIONS

What is a Unicode code point?

A Unicode code point is a unique number assigned to every character in the Unicode standard. Written as U+XXXX (e.g. U+0041 for "A"), code points range from U+0000 to U+10FFFF, covering over 1.1 million possible values across 17 planes.

Why does one emoji show as multiple code points?

Many emoji are sequences of multiple code points β€” for example, family emoji combine base characters with Zero Width Joiners (U+200D), and flag emoji use pairs of Regional Indicator letters. This tool displays each code point individually so you can see every component.

What is a Unicode block?

Unicode is divided into named blocks β€” contiguous ranges of code points grouped by script or purpose. Examples include "Basic Latin" (U+0000–U+007F), "CJK Unified Ideographs" (U+4E00–U+9FFF), and "Emoticons" (U+1F600–U+1F64F). Knowing the block helps identify the script a character belongs to.

What is the difference between UTF-8, UTF-16, and code points?

A code point is the abstract number for a character (e.g. U+1F600). UTF-8 and UTF-16 are encoding schemes that store those numbers as bytes. UTF-8 uses 1–4 bytes per code point; UTF-16 uses 2 or 4 bytes. This tool shows the code point value and the UTF-8 byte count, independent of any specific encoding.

What escape formats are supported?

The tool supports JavaScript (\uXXXX / \u{XXXXX} for supplementary characters), Python (\uXXXX / \UXXXXXXXX), CSS (\XXXXXX), HTML numeric entities (&#xXXXXX;), URL percent-encoding, and plain U+ notation.

Can it handle right-to-left (RTL) text?

Yes. Arabic, Hebrew, and other RTL scripts are fully supported. The inspector identifies each code point regardless of writing direction, and includes any directional control characters (like U+200F RIGHT-TO-LEFT MARK) that may be present in the string.

Are invisible or control characters shown?

Absolutely. Control characters, zero-width spaces, non-breaking spaces, directional marks, and other invisible characters are often the cause of hard-to-debug text issues. This tool makes them visible by displaying their code point, official Unicode name, and category.

Is there a character limit?

The tool works entirely in your browser with no server-side processing, so there is no hard limit enforced. For very long strings (thousands of characters), rendering may slow slightly, but the inspection will still complete accurately.

What Is the Unicode Code Point Inspector?

The Unicode Code Point Inspector is a free browser-based developer tool that breaks any string β€” no matter how complex β€” into its individual Unicode code points. For every character in your input, it reveals the official Unicode code point value (written as U+XXXX), the character's assigned name from the Unicode standard, the Unicode block it belongs to, the plane, and multiple escape sequence formats. Whether you're dealing with plain ASCII text, emoji sequences, CJK ideographs, combining diacritics, or invisible control characters, this tool makes every byte visible and understandable.

πŸ’‘ Looking for premium web development assets? MonsterONE offers unlimited downloads of templates, UI kits, fonts, and more β€” worth checking out for your next project.

Understanding Unicode Code Points

Unicode is the universal character encoding standard that assigns a unique number to every character used in written language worldwide. These numbers β€” code points β€” range from U+0000 to U+10FFFF, providing space for over 1.1 million characters, of which approximately 149,000 are currently assigned. Code points are organized into 17 planes, each containing 65,536 positions. The Basic Multilingual Plane (BMP, Plane 0) covers most commonly used characters, while supplementary planes house historic scripts, musical notation, mathematical symbols, and emoji.

When developers write code, they frequently need to express characters that are difficult to type directly β€” for example, a zero-width non-breaking space (U+FEFF), a soft hyphen (U+00AD), or a complex emoji like πŸ‘¨β€πŸ‘©β€πŸ‘§β€πŸ‘¦. Escape sequences provide a way to represent these characters in source code. Different programming languages use different escape syntax, and keeping track of which format to use where is exactly the kind of friction this tool eliminates.

Escape Sequences Across Programming Languages

JavaScript represents Unicode characters using \uXXXX for code points in the BMP and the ES6 syntax \u{XXXXX} for supplementary characters. Python uses \uXXXX for BMP characters and \UXXXXXXXX (eight hex digits) for code points above U+FFFF. CSS uses a backslash followed by up to six hex digits β€” for example \1F600 β€” and this is commonly used in the content property and @font-face declarations. HTML uses numeric character references such as 😀 for the grinning face emoji. URL encoding converts a character's UTF-8 bytes into percent-encoded sequences.

Unicode Blocks and Scripts

The Unicode character chart is divided into named blocks β€” contiguous ranges of code points allocated to a specific script, symbol set, or purpose. The Basic Latin block (U+0000–U+007F) is identical to ASCII. The Latin Extended blocks cover accented characters used in European languages. The Arabic block (U+0600–U+06FF) contains the letters, vowel marks, and punctuation used in Arabic script. Hangul Syllables (U+AC00–U+D7A3) covers the 11,172 precomposed Korean syllable blocks. The Miscellaneous Symbols and Pictographs and Emoticons blocks contain the bulk of commonly used emoji.

Knowing a character's block is extremely useful when debugging font rendering issues, as fonts are typically designed to cover specific blocks. If characters appear as blank boxes (the Unicode replacement character U+FFFD), it often means the current font does not include glyphs for that block.

Emoji and Multi-Codepoint Sequences

Modern emoji are often not a single code point but a sequence of multiple code points joined together. Skin tone modifiers (U+1F3FB through U+1F3FF) follow a base emoji to apply a color variant. The Zero Width Joiner (ZWJ, U+200D) is used to combine multiple emoji into a single glyph β€” for example, the family emoji πŸ‘¨β€πŸ‘©β€πŸ‘§ consists of three separate person emoji connected by two ZWJ characters. Flag emoji are encoded as pairs of Regional Indicator Symbol Letters (U+1F1E6–U+1F1FF). This tool displays each code point in the sequence individually, making it clear exactly how an emoji is constructed at the Unicode level.

Invisible and Control Characters

Some of the most frustrating bugs in text processing involve characters that are invisible in most text editors. The zero-width space (U+200B), zero-width non-joiner (U+200C), and zero-width joiner (U+200D) affect how text is laid out or how characters are combined without contributing visible glyph width. The byte order mark (U+FEFF) is often silently prepended by text editors when saving UTF-8 files, causing unexpected parsing errors. Bidirectional control characters like U+202A (LEFT-TO-RIGHT EMBEDDING) and U+202B (RIGHT-TO-LEFT EMBEDDING) can cause text to display in unexpected directions. Pasting content copied from a PDF or word processor frequently introduces non-standard whitespace such as the non-breaking space (U+00A0) or the en quad (U+2000). The Unicode Code Point Inspector makes all of these visible, named, and identifiable.

UTF-8 Byte Counts

Unicode code points are abstract numbers. When text is stored or transmitted, those numbers must be encoded into bytes. UTF-8 is the dominant encoding on the web, and its byte count for a character depends on the code point range: ASCII characters (U+0000–U+007F) use one byte, most European and Middle Eastern scripts use two bytes, most CJK characters use three bytes, and supplementary characters (including most emoji) use four bytes. This tool shows the total UTF-8 byte count for your entire input string β€” useful when you need to know how many bytes a particular string will occupy in a database column or HTTP header.

Developer Use Cases

The Unicode Code Point Inspector is useful in many real-world development scenarios. When writing CSS, you might need to use a Unicode escape in the content property for a custom icon font β€” this tool instantly shows you the correct CSS escape syntax. When debugging a regular expression that's not matching as expected, checking for invisible characters or unexpected code points in your test string can reveal the problem. When building multilingual applications, verifying that user-submitted strings do not contain homoglyph attacks β€” where characters from different scripts look visually identical to ASCII letters β€” is an important security check. When working with APIs that return escaped Unicode JSON strings, this tool helps you decode and understand the actual characters being transmitted.

β˜•