Which version of alphabetical order does Microsoft Word use?
I thought that alphabetical order was, well, alphabetical order. That’s what I was taught at school and apparently I was taught wrong. There are at least three ways, two used by dictionaries, another by Microsoft and that’s for English alone!
Reading Sue Butler’s delightful book ‘The Aitch Factor‘ I found that there’s more than one alphabetical order! Naturally, that got me wondering which alphabetical order is used by Microsoft Word and Office when sorting.
Look at these two word lists, both have the same words and both are in alphabetical order.
Source: The Aitch Factor and the Macquarie Dictionary.
The left-hand list is in the more common order. Space and hyphens are ignored then the sort order is applied by each letter.
The right-hand list is another alphabetical order, used with some dictionaries. The key or head word (‘bush’) is followed by the key word compounds (‘bush wire’) .
The left-hand list is more common these days. Ms Butler suggest that because spacing and hyphenation can vary too much (we’ve seen ‘bushbaby’ ‘bush-baby’ and ‘bush baby’). But we think it’s also because it’s easier and faster to program on a computer. To replicate the other method takes some work by splitting the entries up, sorting then putting the entries back together.
Sorting in Word
Now we turn to Microsoft Word. You might expect the same words to be sorted using the left-hand or common method. But no, there’s another ‘alphabetical order’ from Word 2013 (using Table | Layout | Sort with Type ‘Text’, English (US, UK or Australian) language and Windows regional settings).
- bush breakfast
- bush wire
Why is the Word ‘alphabetical order’ different?
The usual answer
Word, and most computer programs, don’t really sort alphabetically, they sort numerically using the ASCII/Unicode value for each character from left to right. A lower character value comes before higher ones.
‘b’ has the value 98 so it comes before words starting with ‘c’ which is 99. Spaces have an ASCII code 32 so they sort ‘above’ any letters so that explains why ‘bush wire’ with a space in the fifth position is above ‘bushbaby’ where ‘b’ (98) is the fifth letter. For alphabetical sorting upper and lower case letters are treated the same, despite having different ASCII values.
The usual answer is wrong.
Modern software and a multi-lingual world means that things are a lot more complicated. The ASCII or Unicode character value does NOT necessarily determine the ordering of a list. You can try this in Excel by comparing a sorted list of characters with the CODE() or UNICODE() values for the sorted cells; the two lists won’t always be in the same order.
The above example is sorted ‘A to Z’ by the first column but the code values are somewhat mixed up.
What’s going on?
For example, what about ‘bush-bash’ in the Word sorting example above? A hyphen is ASCII 45 so that should sort below spaces but above any letters – but it doesn’t. In a test of single characters (same settings as above), the sort order is space, apostrophe, hyphen, many other characters, digits and finally letters.
We don’t know why ‘bush-bash’ is in that place, perhaps one of our smart readers can figure it out? Presumably it’s some detail in the US English / other English collation settings.
The Regional Settings affect the sort order (aka collation). The usual advice is to change the Regional Settings in Windows to change the sort order but that’s not necessary since at least Word 2007.
Go to Tools | Layout | Sort | Options to choose the Sorting language.
Curiously, Excel doesn’t have an equivalent setting and you have to rely on the Windows Regional setting.
- Two ways for sorting by Number
- Sort by hidden column in Word
- How to hide a column in Word
- Sorting in Word
- Saving Sort Criteria in Word
- Sorting Reports by Date, Part 2
- Sorting Reports by Date, Part 1
- How to avoid trillion dollar mistakes in Word
- Sorting paragraphs using Word – Part 2
- Sorting paragraphs using Word – Part 1
- Table tricks in Word 2003