Scientists still don’t know how to use Excel

Importing genetic codes into Excel is still a problem despite widespread reports of the problem.  And still Microsoft is blamed, despite the real solution is with the lazy scientists who need to learn how to use Excel.

A new report in the journal Nature shows that Autocorrect errors in Excel still creating genomics headache with 30% of papers still showing messed up data after being incorrectly imported into Excel.

The problem happens when .CSV (plain text) data is wrongly imported into Excel.  Gene names like MARCH1 and SEPT1 are converted into dates.  It’s been a problem for a long time, Office Watch first reported on gene naming trouble back in 2016.  The names of those genes have now been changed to prevent the problem, but that takes a long time be implemented.

The accepted wisdom is that this is all Microsoft’s fault by auto-correcting data coming into Excel:

“Microsoft has never indicated that it will alter its software to accommodate the genetics community”

Excel “tries to be helpful but makes the wrong choice and ends up corrupting the data,” … “And this can be silent. And there’s no way to properly turn these smart features off.”

There’s even a convoluted and unnecessary workaround that puts the codes into equations to stop Excel’s autocorrect.

Not Microsoft’s fault

Just for once, we leap to the defence of Microsoft.  It doesn’t happen often, so enjoy this rare event <g>.

The solution has been in the geneticists hands all the time.  They are using Excel badly, taking the easy way to import data rather than one of the two better and more reliable options. Gene names changed to stop confusing Excel

It seems easy to just double-click on a .CSV and open it in Excel.  CSV (comma delimited data list) file type is associated with Excel but it’s a trap that many scientists seem unable to avoid.  Direct opening a .CSV means Excel will treat every incoming cell as if it was typed manually

Directly opening a .CSV text file is OK for small lists that can easily be checked.  For anything else, Excel has TWO different options for importing .CSV data in reliable ways that let the user choose the data formats with no autocorrection.

Text Import Wizard

Instead of double-clicking to open a .CSV go to File | Open (Mac: File | Import), that starts the Text Import Wizard. At Step 3, choose Text as the column data format. 

Get & Transform

Modern Excel for Windows has ‘Get & Transform’ more properly called PowerQuery.

PowerQuery is a powerful data import and management tool.  It can bring in external data files and manipulate them ready for use in Excel. It’s on the Data tab of modern Excel.

Of course, PowerQuery will import text fields without autocorrection.  It can also ‘Replace Values’ checking for old values (e.g. MARCH1) and replacing with the new version (MARCHF1).

Perhaps it would be better if directly opening a .CSV file started the Text Import Wizard but that might not suit many other Excel users.  In the meantime, some lessons in Excel would be more useful than blaming Microsoft.

Gene names changed to stop confusing Excel
Excel’s General problem that messes up what you type
Why Excel causes errors in gene studies and how to fix