Unicode Character Detector

How to Use the Unicode Character Detector

With this simple tool, you can instantly identify GSM characters and Unicode symbols in your text messages. Characters in the GSM charset will be grey, while Unicode special characters will be highlighted in red.

  • Step #1: Copy and paste a text message into the empty box. Characters will automatically be displayed in the results box.
  • Step #2: Identify the different symbols in your SMS message. GSM characters will be displayed in grey, Unicode characters will appear in red and escape characters will be displayed in orange.
  • Step #3: The tool also calculates the number of characters in the text and the number of parts of a split message, thus allowing you to control concatenation.

Why you should use the Unicode character detector

As you probably already know, text messages are limited to 160 characters if they are all from the GSM character set. However, if your text contains Unicode symbols, it will be limited to 70 characters instead of 160.

Of course, messages longer than 70 characters can still be sent, but they will become multipart. This means that a 160-character SMS message will be split into three text messages if they have Unicode symbols. This can be extremely frustrating. What is even more frustrating is when your client’s phone crashes due to the Unicode character strings (this has actually happened on several occasions).

By using the Unicode character detector, you can identify and replace symbols that aren’t part of the 7-bit GSM charset to avoid splitting text messages into multiple segments.

Why we built this tool

Unicode characters not only break up text, but sometimes they do not show up at all, or they appear as the dreaded □ □ □. To ensure that the information is passed correctly to the SMS gateway, text messages must be properly encoded. The problem is that many characters are extremely difficult to encode, and because the GSM 3.38 charset is almost impossible to support, many providers have decided to quit altogether.

We created the Unicode character detector tool to help our clients avoid the problems listed above and to ensure that your messages are delivered as intended.

Benefits of using the Unicode character detector

Here are the main benefits of using our Unicode character detection tool:

  • Identify GSM and Unicode characters in your text messages.
  • Identify the number of characters and parts in a text.
  • Based on the number of Unicode characters, find out if the text will be segmented.
  • Remove Unicode symbols and replace them with GSM characters.
  • Preview your text messages before sending them to customers.
  • Control how a text message will be split if it contains Unicode.

Why are text messages that contain Unicode segmented?

When you try to send a text message with symbols that fall outside the GSM character set, you have to use Unicode, which assigns a unique code to every character that isn’t part of the standard charset. Because several GSM characters are used to describe a Unicode character, you will only be able to send text messages of 35–70 characters.

Can I avoid text message segmentation and still use Unicode?

To avoid SMS segmentation and to convert Unicode symbols to Latin only, you can use our Text Transliterator.

GSM describes the protocols for second-generation cellular networks and mobile devices. Presently, it is the standard for mobile communications, holding over 90% of the market share. Therefore, all messages sent to such devices must respect the standard GSM charset.

When a text message contains non-GSM characters, it will be limited to 70 characters. The only solution to avoid having your texts split is to check for Unicode characters and to replace them with their equivalent in the GSM charset (if such an equivalent exists).

What characters are part of the GSM charset?

The standard GSM character set contains the letters of the English alphabet, digits and some special characters, including a few Greek ones.

GSM character list: here

What characters are part of the Unicode charset?

The Unicode character list contains symbols from the Cyrillic, Chinese, Arabic, Korean and Hangul alphabets. It also contains several special symbols (such as emoticons, emoji and kanji).

Unicode character list: here

Other Free Tools: