Unicode Inspector

Analyze text to reveal hidden Unicode characters, code points, categories, and encoding information

29
Total Characters
20
Unique Characters
35
UTF-8 Bytes
58
UTF-16 Bytes
Showing 20 of 20 characters
CharacterCode PointDecimalHexCategoryBlockUTF-8Actions
\tCTRL
U+000990x0009Control CharacterBasic Latin0x09
\nCTRL
U+000A100x000AControl CharacterBasic Latin0x0A
␣WS
U+0020320x0020Space SeparatorBasic Latin0x20
!
U+0021330x0021Other PunctuationBasic Latin0x21
,
U+002C440x002COther PunctuationBasic Latin0x2C
:
U+003A580x003AOther PunctuationBasic Latin0x3A
H
U+0048720x0048Uppercase LetterBasic Latin0x48
N
U+004E780x004EUppercase LetterBasic Latin0x4E
T
U+0054840x0054Uppercase LetterBasic Latin0x54
a
U+0061970x0061Lowercase LetterBasic Latin0x61
b
U+0062980x0062Lowercase LetterBasic Latin0x62
e
U+00651010x0065Lowercase LetterBasic Latin0x65
i
U+00691050x0069Lowercase LetterBasic Latin0x69
l
U+006C1080x006CLowercase LetterBasic Latin0x6C
n
U+006E1100x006ELowercase LetterBasic Latin0x6E
o
U+006F1110x006FLowercase LetterBasic Latin0x6F
w
U+00771190x0077Lowercase LetterBasic Latin0x77
δΈ–
U+4E16199900x4E16UnassignedCJK Unified Ideographs0xE4 0xB8 0x96
η•Œ
U+754C300280x754CUnassignedCJK Unified Ideographs0xE7 0x95 0x8C
🌍
U+1F30D1277570x1F30DUnassignedUnknown Block0xF0 0x9F 0x8C 0x8D

Character Categories

Lu
3 characters
Ll
8 characters
Po
3 characters
Zs
1 character
Cn
3 characters
Cc
2 characters

Unicode Blocks

Basic Latin
17 characters
CJK Unified Ideographs
2 characters
Unknown Block
1 character

Pro Tips:

  • Control characters are shown with escape sequences (\\t for tab, \\n for newline)
  • Spaces are shown as ␣ and non-breaking spaces as ⍽ for visibility
  • Use filters to focus on specific character types like control or non-ASCII characters
  • UTF-8 and UTF-16 byte representations help understand storage requirements
  • Unicode blocks group related characters from the same writing system or purpose

About This Tool

How It Works

  • Analyzes each character in your text individually
  • Reveals Unicode code points, categories, and properties
  • Shows hidden control characters and whitespace
  • Displays UTF-8 and UTF-16 byte representations
  • Categorizes characters by Unicode blocks and types

Common Use Cases

  • Debugging text encoding and character issues
  • Identifying hidden or invisible characters
  • Analyzing internationalization problems
  • Understanding Unicode composition of text
  • Validating character compatibility across systems

Frequently Asked Questions

What is a Unicode inspector and why would I need one?

A Unicode inspector analyzes text to reveal detailed information about each character, including Unicode code points, categories, encoding properties, and hidden characters. It's essential for debugging text encoding issues, identifying invisible characters causing problems, and understanding how text is composed at the Unicode level.

How does the tool help identify hidden or invisible characters?

The tool displays all characters in your text, including control characters (like tabs, newlines), non-breaking spaces, and other invisible Unicode characters that might be causing formatting or processing issues. Control characters are shown with escape sequences (\t, \n) and special symbols for visibility.

What information does the tool provide for each character?

For each character, the tool shows: the character itself, Unicode code point (U+xxxx), decimal and hexadecimal values, Unicode category and block, character name, UTF-8 and UTF-16 byte representations, and properties like whether it's printable, ASCII, control character, or whitespace.

What are Unicode categories and blocks?

Unicode categories classify characters by their general type (like Uppercase Letter, Decimal Number, Punctuation). Unicode blocks group characters by script or purpose (like Basic Latin, Greek and Coptic, Mathematical Operators). These help understand the nature and origin of characters in your text.

How can I use this tool for debugging encoding problems?

The tool helps identify encoding issues by showing unexpected characters, revealing byte sequences that don't match expected encoding, displaying characters from wrong Unicode blocks, and highlighting control characters that shouldn't be present in your data.

What do the UTF-8 and UTF-16 byte representations show?

These show how each character is stored in memory using different Unicode encoding schemes. UTF-8 uses 1-4 bytes per character, while UTF-16 uses 2 or 4 bytes. This information helps understand storage requirements and encoding compatibility between systems.

How do the filtering and sorting options work?

You can filter characters by type (all, control characters, whitespace, printable, ASCII, non-ASCII) to focus on specific character sets. Sorting options include by Unicode code point, character appearance, or category to organize the analysis based on your needs.

Can this tool help with internationalization (i18n) issues?

Yes, the tool is excellent for i18n debugging. It helps identify characters from unexpected scripts, verify proper Unicode composition for different languages, detect encoding issues in multilingual text, and ensure character compatibility across different systems and locales.

How does the tool handle different writing systems and scripts?

The tool supports all Unicode characters and scripts, including Latin, Cyrillic, Arabic, Chinese, Japanese, Korean, Thai, Hebrew, and many others. It correctly identifies the Unicode block for each character, helping you understand which writing systems are present in your text.

What should I do if I find unexpected characters in my text?

First, note the Unicode code point and category of unexpected characters. Check if they're control characters that can be safely removed, encoding artifacts that need fixing, or legitimate characters from a different script. The tool's copy feature lets you extract specific characters or their code points for further investigation.

How can I use this tool for data validation and cleaning?

Use the tool to detect unwanted characters in datasets, verify that text contains only expected character types, identify and locate problematic characters for removal or replacement, and ensure data meets specific Unicode requirements for your application or database.

Can I copy characters or their properties from the analysis?

Yes, each character row includes copy buttons to copy the character itself or its Unicode code point. This makes it easy to extract specific characters for testing, documentation, or further analysis in other tools or applications.

Share this page