A simple Excel worksheet mistake caused a change in economic policy. Was it a user’s mistake or Microsoft’s?
A few weeks ago, came the news that a much-quoted academic paper was based on a faulty Excel worksheet.
The paper titled “Growth in a Time of Debt.” from Reinhart & Rogoff was widely used to support the idea of limiting government expenditure during the fiscal crisis. If the proper Excel formula was used on all the data, the opposite conclusion would have been reached.
They concluded (wrongly) that there was a 0.1% decrease in economic growth across various countries when the real figure was 2.2% Increase. A difference of 2.3% seems small but that’s a massive difference when we’re talking about economic growth of countries.
There were several problems with the calculation and methodology. We don’t intend to deal with the level of disclosure nor conclusions in the paper let alone the subsequent use in economic policy. That’s an economic and political area that us mere computer nerds are quite unqualified to comment upon.
We’ll just look at the mistake made in Excel. Paul Krugman called it a ‘coding error’ and that phrase was used elsewhere without further detail. The term ‘coding error’ led us to expect that some complex Excel macro or long formula was wrong – but no.
The ‘coding error’ was a simple mistake that any of us could make, and probably has.
Even more interesting were some problems in Excel. We found some situations where we’d expect Excel to fix or highlight problems in the worksheet but did not.
What went wrong
There are columns of numbers with a row at the bottom showing the Average of each column.
Problem is, the AVERAGE formula didn’t include all the cells in the column above.
Instead of AVERAGE(L30:L49) the formula was AVERAGE(L30:L44) – 5 cells at the bottom were omitted.
Perhaps the additional countries/rows were added but the AVERAGE formula wasn’t adjusted?
Excel can help
There are a few Excel features that can help avoid such errors but they aren’t fool-proof and so a little blame can be shared with Microsoft which could do better.
When you add a row to an existing list, the SUM or AVERAGE formulas at the bottom will usually update automatically to include the new row. For example you have =Average(L30:L35) at the bottom of a column and add two more rows then the formula may change to =Average(L30:L37).
We say ‘may‘ because Excel can’t be relied upon to do it every time. We tested Excel using a sample of the data above and was dismayed to see that the Average formulas on the bottom row got out of sync. It seems to happen when you add the rows to the bottom of the list (ie just above the SUM or AVERAGE cell).
It seems that the presence of ‘n.a.’ text in cells confused Excel and prevented proper updating of the Average formulas. The same happened with =SUM formulas at the bottom of the column.
You can call that a bug in Excel (we do) but people should not rely on Excel entirely and check for themselves.
Using SUM or AVERAGE with a mix of number and text cells is asking for trouble. Better to use one of the more subtle versions like SUMA or AVERAGEA which include text fields (instead of ignoring them) or SUMIF / AVERAGEIF , SUMIFS / AVERAGEIFS which give you more control over what to do with text cells.
Using NA( )
The cells with no value have text ‘n.a.’ in them which, as we’ve seen, causes Excel some trouble. An alternative is the NA() function (or just typing ‘#N/A’) which returns an error that shows up as ‘#N/A’ which Excel understands as meaning ‘no value available’.
Visual confirmation of ranges
As you can see from the above image, Excel will show you on the worksheet with a blue box, the range of cells used in the formula. That should have alerted users to the discrepancy.
In some cases, Excel will warn you of inconsistencies in a row or column of cells. A small green triangle will appear in the cell. But that audit feature doesn’t go very far and does NOT necessarily pick up all inconsistencies in the AVERAGE formulas going across a row.
Place NOT all thy trust in Excel. Excel is great but the auditing and inconsistency checking features won’t always warn you of problems.
Check and recheck your worksheets. That’s obvious but mistakes can and do still happen so you also need to …
Share your working with others. In this case the original paper didn’t come with worksheets or other data to support their results. Alarms bells went off when other researchers could not replicate the calculations, but it was only years later when Reinhart & Rogoff released their worksheet to Thomas Herndon that he was able to figure out the problem.
Use named ranges. In the above example there’s no named ranges, just cell references. We don’t know what the original authors used, but named ranges would have helped with the debugging and checking of the results. Comparing the cell ranges of adjacent named ranges would have revealed the inconsistency.
The original paper http://www.nber.org/papers/w15639 from Reinhart & Rogoff
The academic paper from Herndon, Ash and Pollin which revealed the errors
Paul Krugman’s article ‘The Excel Depression’ which brought the issue to the general public. Prof. Krugman expands on the effect of the Reinhart & Rogoff paper in a piece for The New York Review of Books.
Excel screen-shot above from NextNewDeal.net who had help from Herndon, Ash or Pollin
- Excel bug: Row Insertion and cell ranges
- Workarounds for Excel’s addition problems
- More Excel addition strangeness
- Excel bug for sale