Skip to content

Why Word's Alphabetical order is different from a dictionary

Which version of alphabetical order does Microsoft Word use? I thought that alphabetical order was, well, alphabetical order. That’s what I was taught at school and apparently I was taught wrong. There are at least three ways, two used by dictionaries, another by Microsoft and that’s for English alone!

Reading Sue Butler’s delightful book ‘The Aitch Factor’ I found that there’s more than one alphabetical order! Naturally, that got me wondering which alphabetical order is used by Microsoft Word and Office when sorting.

Alphabetical orders

Look at these two word lists, both have the same words and both are in alphabetical order.

http://img.office-watch.com/ow/Word%20alphabetical%20order%201.png image from Alphabetical order in Word at Office-Watch.com
Source: ‘The Aitch Factor’ and the Macquarie Dictionary.

The left-hand list is in the more common order. Space and hyphens are ignored then the sort order is applied by each letter.

The right-hand list is another alphabetical order, used with some dictionaries. The key or head word (‘bush’) is followed by the key word compounds (‘bush wire’) .

The left-hand list is more common these days. Ms Butler suggests that because spacing and hyphenation can vary too much (we’ve seen ‘bushbaby’ ‘bush-baby’ and ‘bush baby’). But we think it’s also because it’s easier and faster to program on a computer. To replicate the other method takes some work by splitting the entries up, sorting then putting the entries back together.

Sorting in Word

Now we turn to Microsoft Word. You might expect the same words to be sorted using the left-hand or common method. But no, there’s another ‘alphabetical order’ from Word.

  • bush
  • bush breakfast
  • bush wire
  • bushbaby
  • bush-bash
  • bushed
  • bushwhacker
Sorted using Home tab | Paragraph Sort or Table | Layout | Sort with Type ‘Text’, English (US, UK or Australian) language and Windows regional settings

Why is the Word ‘alphabetical order’ different?

The usual answer

Word, and most computer programs, don’t really sort alphabetically, they sort numerically using the ASCII/Unicode value for each character from left to right. A lower character value comes before higher ones.

‘b’ has the value 98 so it comes before words starting with ‘c’ which is 99. Spaces have an ASCII code 32 so they sort ‘above’ any letters so that explains why ‘bush wire’ with a space in the fifth position is above ‘bushbaby’ where ‘b’ (98) is the fifth letter. For alphabetical sorting upper and lower case letters are treated the same, despite having different ASCII values.

The usual answer is wrong.

Modern software and a multi-lingual world means that things are a lot more complicated. The ASCII or Unicode character value does NOT necessarily determine the ordering of a list. You can try this in Excel by comparing a sorted list of characters with the CODE() or UNICODE() values for those sorted cells. As you can see the sorted Character column doesn’t follow the number values for those symbols.

http://img.office-watch.com/ow/Word%20alphabetical%20order%202.png image from Alphabetical order in Word at Office-Watch.com
Sorted ‘A to Z’ by the first column but the code values aren’t in the same order.

Another example, what about ‘bush-bash’ in the Word sorting example above? A hyphen is ASCII 45 so that should sort below spaces but above any letters – but it isn’t.

In a test of single characters (same settings as above), the sort order is space, apostrophe, hyphen, many other characters, digits and finally letters.

What’s going on?

Sorting isn’t done by ASCII/Unicode values anymore. That numerical order doesn’t always match the accepted sorting order for a language. For example the accented characters like Á Å Æ Ç É Ó Ö have high ASCII values but are normally sorted with their base letter (AEIO or U).

Collation order

Each language has a collation order or sorting order — which defines how to sort text in that language. It allows sorting in any order, regardless of the ASCII/Unicode numeric value.

Collation orders are part of the language setting, they can’t be manually set.

Regional Differences

The Regional Settings affect the collation order (aka sort). The usual advice is to change the Regional Settings in Windows to change the sort order but that’s not necessary in modern Word.

Go to Sort | Options to choose the Sorting language.

Word will automatically select the sorting language based on the language of the selected paragraph. You could change the sorting language to something different from the proofing language for the text.

Curiously, Excel doesn’t have an equivalent setting and you have to rely on the Windows Regional setting.

About this author

Office-Watch.com

Office Watch is the independent source of Microsoft Office news, tips and help since 1996. Don't miss our famous free newsletter.