Confused about Unicode, code-points and other techie details about inserting symbols? Here’s how it works and can work for you in Word, Excel, PowerPoint and other apps.
Like many things in modern computing, you mostly don’t have to know the details of how things work. But occasionally things don’t work the right way and then a little background info will help you fix the problem and get working again.
Computers only understand numbers. That means each letter, number, symbol and emojis needs a number linked to that character. For example, capital A is 65 (decimal), lower case m is 109.
Normally these numbers are hidden from view because the characters are on your keyboard or there’s a shortcut. You don’t need to know that typing capital K is saved as code 75 (decimal) or 4B (Hex) in the document because Word or other programs handles that for you.
The code numbers become important if you want to quickly type some less common, non-keyboard, non-shortcut symbols. There are thousands to choose from in the Unicode standard.
A short history of computer symbol standards
Translating characters into computer numbers needs a standard so that a document or email from one computer is understood elsewhere even if the receiver uses totally different software.
The original standards for computer symbols where ANSI and ASCII (effectively the same for this discussion). They covered 255 characters in Latin-based languages (English, French, German, American etc). 26 letters, upper and lower plus accented characters like àáæçèëðñøüýÿ. Numbers and some common symbols like $£ %^&½©®™()[]{} etc.
255 characters was enough in the early days of computers, but it clearly wasn’t enough for global use. There are many other characters needed for Asian, African and other languages. Something much, much bigger than ASCII was necessary for computers to work globally and seamlessly. Simplified Chinese needs 8,105 symbols let alone Traditional Chinese, Japanese, Thai, Hindi, Arabic and plenty of other languages.
There were complex systems to handle non-Latin languages but they weren’t standardized. There was a lot of confusion and difficulty.
Enter UNICODE
Unicode started in 1991 as a new character standard that could handle a much larger range of symbols. Not just the current languages but plenty of room for future expansion into areas not yet envisioned. That need for future expansion happened sooner and, in a way, perhaps not expected by the original Unicode designers.
Unicode started by adopting ASCII (strictly ISO/IEC 8859-1) for the first 255-character values. That made software and document compatibility a lot easier.
Then they allocated character codes for more symbols and characters in languages. Also commonly used symbols like musical notes, chess pieces, maths and technical symbols even Egyptian hieroglyphs.
Over time, the Unicode Consortium has added more symbols and languages now it’s up to version 13 and can handle millions of characters. All major companies adopted Unicode including Microsoft and Apple. They gradually adapted their software to work with Unicode and that’s been in place for a long time.
Most fonts don’t have all the Unicode symbols. Some cover a lot of symbols (Arial Unicode MS has wide coverage) but most limit themselves to a tiny fraction of the Unicode possibilities.
Long time Word users will remember seeing a box symbol ▯ , (itself a Unicode symbol U+25AF ) when the font didn’t have that character. Modern Word has a fallback feature to stop you choosing a font which doesn’t have the symbol you’ve typed.
These days Unicode is the accepted computing standard for all manner of characters and symbols. You don’t have to worry about compatibility across the globe.
Yes, there are a few minor exceptions to the Unicode standard that we’ll explain later but almost always Unicode rules, which is great for everyone.
U+Nnn for Code Points
Each character in Unicode is a ‘code point’ or number allocated to that symbol.
Unicode values are normally shown in the form U+nnnnn
The number is in Hexadecimal (base 16) so there are letters (A to F) as well as numbers.
Leading zeros can be ignored so U+00A9 (Copyright ©) is the same as U+A9
That number or code-point is really useful in Office and especially Word for Windows.
Knowing the Unicode value means you can quickly jump to that character in Insert Symbol in any Office app. Type the Unicode value (code point) into the character code box and press Enter.
Word for Windows makes it even easier. Type the Unicode number then press Alt + X and presto! The symbol will appear. No-one expects you to remember all the Unicode numbers but it’s a handy shortcut for any symbols you use regularly.
Unicode Blocks or Groups
The long Unicode list is broken up into Blocks or groups of characters (code points). That means similar or related characters are likely to be near each other in the list.
Insert | Symbol in Office calls Unicode Blocks a Subset in the pull-down list at top right.
There are Unicode Blocks for languages and also specialist areas like maths, drawing and emoji.
Remember that when you’re looking for a symbol that’s related to one you know. For example, go to the Currency Symbol block/subset to see many (but not all) currency symbols in one place.
Unicode Blocks aren’t consistent, unfortunately. Sometimes like symbols are in different places for historical or compatibility reasons. The plain old $ dollar or British £ pound sign aren’t in the Currency Symbol block because they are in the ‘original’ ASCII 255-characters that Unicode inherited (U+0024 and U+00A3 respectively).
In a follow-up article we’ll look at the exceptions and important details of Unicode in the real world.
Word’s symbol trickery, the good and the bad
Word’s Find can’t find all Unicode symbols and emoji
Behind letters, characters and emoji is Unicode
Use the latest Unicode symbols in Office