What happened with UK COVID tracking and Excel?

The UK has been embarrassed (again) with the news that around 16,000 COVID-19 test results were missing from the official database and so contact tracing wasn’t done.  The culprit?  Excel.

Or, to be fair, bad use of Excel is to blame. Not understanding it’s limits and how to workaround them.

We’ve tried to figure out what happened which isn’t easy because the early explanations are conflicting and messy.

Excel maximum row or column limit

They tried to load too much data (rows or records) into Excel. Modern Excel can cope with up to 1,048,576 rows.  It’s been that way for at least the last decade.  It’s a sign of the global Coronavirus problem that the normal disease tracking systems have been pushed beyond their normal limits.

OR (depending on which report to believe)

The data was imported with each test or case in a separate COLUMN (not row). If true, this is unbeleivabley stupid. Excel’s maximum columns is 16,384 (col XFD)

Whatever the problem was, keep in mind that the UK Test & Trace system is “world beating” (according to PM Boris Johnson) that’s reported to cost £12 billion (US$15.5 Billion) mostly outsourced to private contractors.

Right Excel version – wrong file type

According to The Guardian, Public Health England (PHE) may have been using older Excel which only coped with 65,536 rows.  That’s hard to believe because the last Excel with that limit is Excel 2003. 

It more likely the problem wasn’t the version of Excel being used, it was the old document format.

PHE was using modern Excel but with the older .XLS format. Doing that limits Excel to 65,536 rows. With the number of COVID tests being done each day, it’s easy to go over that threshold. New tests added to the bottom of the CSV were ignored by the crippled Excel.  

(This is a bit we find hard to understand. The ‘new’ Excel format .xlsx is over 13 years old now. Why was such a limited and sub-standard document format still being used? Quite aside from the higher row/record limits, .xlsx is smaller, more robust and more secure.).

PHE gets COVID test results from labs all over England.  Those results come in various formats including CSV.  CSV is a common, plain text way to share data, but there are problems if imported into Excel the wrong way.  See Gene names changed to stop confusing Excel for an example of how not to import CSV into Excel.

Whatever happened and why … the result was an Excel worksheet that maxed out it’s row or column limit.  Ouch.

The simple and immediate fix is to convert the .xls file into .xlsx and let Excel work with much larger data lists. There are workarounds for the million-ish Excel row limit, which is probably why Microsoft hasn’t upped the million row limit since 2007.

Why didn’t anyone notice?

How did this happen without anyone noticing the problem? 

It’s quite possible to exceed the size limits for an XLS workbook open in Excel 2007 or later. Either by importing data or making it in the workbook. Excel will behave normally with over-limit data until it tries to save to the XLS file.

Any way we try exceeding the XLS row or column limit gets a warning when the XLS is saved.  Were those warnings ignored?

A typical Excel warning when saving a XLS workbook that exceeds the row or column limits.

Most likely, an Interdepartmental enquiry will be struck with broad terms of reference which, in the fullness of time and after due care and diligence, will produce a report to be classified under the Official Secrets Act 1989. Or so Sir Humphrey assures us <g>.

Asking for trouble

This isn’t a job for Excel at all. Managing data of this size and complexity needs a proper database system like Access or SQL Server. Use Excel to query that database and analyse that data (PowerQuery and PivotTable are great for that). But using Excel as a database for that amount of information is asking for trouble.

PowerQuery workaround

The best workaround is using PowerQuery which is in Excel 365, 2019 and 2016 for Windows. With PowerQuery you can import records from multiple sources into an Excel Data Model and do filtering and analysis from there.

The Excel Data Model has an upper limit of 2 Billion rows (1,999,999,997).  With Excel 64-bit there’s no limit on data size either.

Access option

Another option is loading the data into an Access database (accdb or SQL Server).  Excel can view and analyse the data from Access.

Access might be a better option because forms can be made to view individual records for contact tracing.

Excel – a history of rows and columns
Gene names changed to stop confusing Excel
Excel’s General problem that messes up what you type
Custom Data Types are great in Excel 365