Why Excel PivotTables get messed up and how to stop it.
Time and again we’ve heard from people who try PivotTables and only get nonsense tables. That’s understandable since Microsoft’s hype about PivotTable ignores the basics that we’ll cover in this article.
You’ve probably seen a Microsoft demonstration of PivotTables where it looks sooooo easy. They have a long list, click a couple of buttons and presto! a nice PivotTable all done. Of course, those demos are well rehearsed with carefully selected examples but more than that, there’s a deeper secret.
All the source data for the demo has been carefully checked and tidied up before a PivotTable even gets a chance at it.
That’s it … it’s that simple. PivotTables are only as good as the information you put into it. More often than you might think, that information needs a little ‘massaging’ before it’s ready for the PivotTable treatment. The Excel defaults, in particular ‘General’ cell formatting, can confuse PivotTables.
Of course, what you need to fix depends on your individual list but here’s some things to look for.
Not really numbers
We mentioned this in our first PivotTable article. There’s often data lists that seems like numbers to Excel but are really text labels. Phone numbers, order numbers, customer ID numbers, some postal/ZIP codes etc will look like numbers to Excel. If you don’t change that, PivotTables will try to treat them as numbers it can add up, average etc. The Recommended PivotTables in Excel 2013 can get especially confused.
The fix is simple, make sure that any text data is explicitly formatted as ‘Text’ not ‘General’. Select the data and choose Home | Number | Text (at the bottom of the cell format list).
Office-Watch.com reader Greg P. adds:
“Another one of my aggravations with Pivot Table data sources is leading/trailing spaces, non-printing characters, and excess spaces. And as noted special or inconsistent formatting … the variations of phone numbers (domestic and international), circuit IDs, etc. … Separators can be slashes, dashes, dots, spaces, or even all collapsed together can affect the pivot table reporting.”
Formatting the numbers in the source columns can help setting up a PivotTable. Sure, you can set the formatting in the PivotTable but it’s usually easier to set it correctly in the source columns.
Rather than relying on ‘General’ format to figure it out, select Currency or Number (with the decimals places) that’s appropriate.
PivotTables gets horribly confused if a column of numbers has a blank cell, just one blank cell, it will Count the column instead of Sum.
In a perfect world all the dates and times will arrive in a format that Excel will immediately recognize as a date. In practice, dates arrive in a wide variety of text formats that need some changing into a form Excel can convert into a date.
How that’s done depends on the incoming data and your regional settings. Here’s a quick example:
The source data (copied from a web table) looks like this: 8 Jul, 10:49am
To get a Excel date from that use the DATEVALUE() function plus a little string manipulation to remove the comma and time details e.g.:
As you can see above, there’s now an additional column with the date in a form that Excel can use. It’s that new column which is used as a PivotTable data source.
This is a common trick to make PivotTables easier to make, use and understand. Change the incoming data into a useable or more convenient form for PivotTables. The original data is in columns with some related columns of ‘massaged’ ‘computed’ or ‘helper’ data ready for use in the PivotTable.
An alternative is to make the changes as a Calculated Field within the PivotTable. That’s beyond the scope of this article but you can see Calculated Fields at Analyze | Calculations | Fields, Items, & Sets.
Of course, if you have control of the incoming data you can change the format before it even arrives in Excel and save yourself some work.
Same thing, different name
PivotTables are NOT case sensitive so all the following items are grouped together:
Count = 4
Most of the time that’s a useful feature but occasionally it’s a right nuisance. Microsoft doesn’t provide a simple way to fix this and you have to resort to various workarounds.
This is a good example of how some advance planning can help make things easier later on. If someone is designing a database with case-sensitive information, think again and consider the trouble that can cause later. If you have access to the original source data or the formatting of the output (to a web page or direct SQL output) consider the advantages of formatting the output in an Excel/PivotTable friendly way.
Some smart readers took us to task, politely, over our use of the word ‘normalization‘ to describe a general tidy-up or data clean up. They are right, strictly speaking, ‘data normalization’ is specifically for the rearrangement of data into a more structured form with related tables. We’ve found that the term ‘normalization’ is becoming used in a broader sense and that’s what we intended.