{ Zero-Width Character Detector }

// scan text for invisible zero-width characters

Scan any text for invisible zero-width characters: ZWSP, ZWNJ, ZWJ, BOM, and other hidden Unicode. Highlight, count, and remove them instantly.

Supports plain text, code, HTML, Markdown — any content
0 characters
🔬

Ready to scan

Paste your text and click Scan — zero-width characters will be highlighted

HOW TO USE

  1. 01
    Paste Text

    Copy any suspicious text — emails, copied web content, code, or documents — into the input field.

  2. 02
    Click Scan

    Hit "Scan for Hidden Chars" to instantly detect all zero-width and invisible Unicode characters.

  3. 03
    Review & Clean

    See highlighted positions in context, review the findings table, then click "Clean Text" to strip them all.

DETECTED CHARACTERS

U+200B ZWSP U+200C ZWNJ U+200D ZWJ U+FEFF BOM U+00AD Soft Hyphen U+2060 Word Joiner U+200E LRM U+200F RLM U+180E MVS U+2061–64 Invisible Ops

USE CASES

  • 🔍 Detect plagiarism-watermarking tricks
  • 🔐 Find hidden data exfiltration characters
  • 🐛 Debug rendering glitches in web content
  • 📋 Clean pasted text from web or CMS editors
  • 🔒 Security audit of user-submitted content

WHAT IS THIS?

Zero-width characters are invisible Unicode code points that have no visible glyph. They are used legitimately for text shaping (like ZWJ in emoji sequences) but are increasingly misused for watermarking copied text, hiding secret messages, bypassing spam filters, and tracking data leaks.

This tool scans your text client-side, highlights every hidden character in context, and lets you remove them all with one click.

RELATED TOOLS

FREQUENTLY ASKED QUESTIONS

What is a zero-width character?

A zero-width character is a Unicode code point that has no visible width or glyph when rendered. Examples include the Zero-Width Space (U+200B), Zero-Width Non-Joiner (U+200C), and Zero-Width Joiner (U+200D). They are invisible in normal text but can be detected programmatically.

Why would zero-width characters be in my text?

They appear for several reasons: copy-pasting from websites that inject them for tracking, CMS editors that auto-insert them, intentional watermarking to track document leaks, or legitimate Unicode text shaping (such as emoji sequences using ZWJ). Not all occurrences are malicious.

Is it always safe to remove zero-width characters?

In most cases yes, especially for plain text, emails, and web content. However, in certain scripts (like Arabic, Hindi, or some Asian languages), ZWNJ and ZWJ can be used for correct text shaping. For those contexts, review before removing rather than blindly stripping all occurrences.

Can zero-width characters be used maliciously?

Yes. They are increasingly used to watermark leaked documents (unique invisible patterns identify the recipient), to bypass spam or keyword filters, to hide secret messages (steganography), and to alter the appearance of URLs in phishing attacks while keeping them visually identical to legitimate ones.

What is the difference between ZWSP, ZWNJ, and ZWJ?

ZWSP (U+200B) is a zero-width space that allows line breaking without a visible space. ZWNJ (U+200C) prevents two adjacent characters from forming a ligature. ZWJ (U+200D) encourages ligature or joined form, and is used in emoji sequences (e.g., family emoji = man + ZWJ + woman + ZWJ + child).

What is a BOM and why is it harmful?

The Byte Order Mark (U+FEFF) was originally used to indicate byte order in UTF-16 files. When present at the start of UTF-8 text it is technically unnecessary and can cause bugs — breaking JSON parsing, CSV imports, command-line scripts, and HTTP headers. It is invisible but frequently causes hard-to-debug issues.

Does this tool send my text to a server?

No. All scanning and processing happens entirely in your browser using JavaScript. Your text is never sent to any server. The tool runs completely client-side, making it safe to use with sensitive or confidential content.

How do I check for zero-width characters programmatically?

In JavaScript, use a regex like /[\u200B-\u200D\u200E\u200F\uFEFF\u00AD\u2060-\u2064\u180E]/g. In Python, use unicodedata module or the same regex pattern with the re module. In PHP, use preg_match with Unicode patterns enabled.

What Are Zero-Width Characters and Why Do They Matter?

Zero-width characters are a family of Unicode code points that render with no visible width on screen. They have no glyph — you cannot see them, you cannot select them easily, and copy-pasting text containing them silently carries them along. Yet they are fully present in the underlying data, and their presence can cause surprising, frustrating, and sometimes serious problems.

The most common zero-width characters include the Zero-Width Space (ZWSP, U+200B), the Zero-Width Non-Joiner (ZWNJ, U+200C), the Zero-Width Joiner (ZWJ, U+200D), directional marks like the Left-To-Right Mark (LRM, U+200E) and Right-To-Left Mark (RLM, U+200F), the Byte Order Mark (BOM, U+FEFF), and the Soft Hyphen (SHY, U+00AD). There are also several invisible operator characters defined in the Unicode Mathematical Operators block (U+2061 through U+2064).

💡 Looking for premium web development assets? MonsterONE offers unlimited downloads of templates, UI kits, fonts, and design assets — worth checking out.

Legitimate Uses of Zero-Width Characters

Not every zero-width character is a threat. Several serve important typographic and Unicode shaping functions. The Zero-Width Joiner is essential for complex emoji sequences: the "family" emoji 👨‍👩‍👧 is actually three separate emoji (man, woman, girl) joined by ZWJ characters. Remove them and the family splits into three distinct emoji. ZWJ is also used in certain South Asian and Arabic scripts to control how adjacent characters join into ligatures.

The Zero-Width Non-Joiner has the opposite function — it prevents a ligature from forming when two characters that normally join appear adjacent. In Persian and Arabic typography, this is crucial for correct text rendering. The Soft Hyphen (U+00AD) is used by HTML authors to indicate optional hyphenation points in long words, allowing the browser to hyphenate at that point if needed for line wrapping.

The BOM (Byte Order Mark) was originally used to indicate byte order in UTF-16 encoded files. While it has no legitimate purpose in UTF-8 files, many text editors on Windows still prepend it automatically, which can cause subtle bugs when the file is read by parsers expecting clean UTF-8.

Malicious and Problematic Uses

The invisible nature of zero-width characters makes them attractive for several types of misuse. Document watermarking is one of the most common: a unique pattern of ZWSP characters invisibly embedded in a document can uniquely identify which recipient leaked it. If the document appears online, the watermark reveals the leaker. This technique is used by both legitimate security teams (to catch internal leaks) and malicious actors (to track stolen data).

Spam and content filter evasion is another major concern. A filter looking for the word "casino" will not match "ca​si​no" if zero-width spaces are inserted between letters. This allows spam messages and phishing content to bypass keyword-based filters while appearing identical to human readers. Similarly, zero-width characters can be used to defeat plagiarism detectors.

In security contexts, zero-width characters in URLs are particularly dangerous. A URL containing invisible characters may look identical to a legitimate URL in a browser's address bar or in plain text, but point to a completely different domain. This technique has been used in sophisticated phishing attacks targeting developers and security researchers.

How Zero-Width Characters End Up in Your Text

The most common source is copy-pasting from websites. Many content management systems, rich text editors (like Medium, Notion, and WordPress's Gutenberg editor), and web pages silently insert zero-width spaces at paragraph boundaries, after headings, or within links. When you copy text from these sources and paste it into your own content, code, or database, these invisible characters travel along.

Another common source is PDF extraction. Converting PDF documents to text often introduces zero-width characters, especially at line boundaries or around complex typographic elements. Markdown editors and email clients can also introduce them when rendering and then re-exporting content.

Smart quotes, special dashes, and other typographic replacements by word processors sometimes come accompanied by invisible characters. Even some programming fonts and code editors have been known to silently insert zero-width characters in certain contexts.

Debugging Problems Caused by Hidden Characters

Zero-width characters cause a distinctive class of bugs: code that looks correct but does not work. A variable name with an invisible ZWSP embedded in it will look exactly like the correct variable name but will be treated as a different identifier by the compiler or interpreter. This is maddening to debug without a tool specifically designed to reveal hidden characters.

JSON and CSV parsing failures are common when a BOM or ZWSP appears at the start of a file or value. HTTP header parsing can break when zero-width characters appear in header values. Database queries fail when string comparisons include unexpected invisible characters. String length checks produce unexpected values. Regular expression matches fail despite apparently matching text.

Our Zero-Width Character Detector solves this by visually annotating every hidden character inline within its context, so you can see exactly where in the text each invisible character sits, what type it is, and how many there are. The findings table provides a complete breakdown by character type, and the one-click clean function removes them all.

Programmatic Detection and Removal

For developers who need to handle zero-width characters in production code, the key Unicode ranges to target are: U+200B through U+200F (the core zero-width and directional marks), U+FEFF (BOM), U+00AD (soft hyphen), U+2060 through U+2064 (invisible mathematical operators), and U+180E (Mongolian Vowel Separator). A comprehensive regex covering all of these can be applied consistently across JavaScript, Python, PHP, Java, and most other modern languages that support Unicode regular expressions.

In security-sensitive applications — user-submitted content, authentication systems, URL handling, configuration parsers — it is good practice to sanitize input by stripping unknown zero-width characters before processing, while preserving ZWJ and ZWNJ when the application handles non-Latin scripts.