Skip to content

Gene names changed to stop confusing Excel

The names of genes are being changed just to stop Excel confusing them with dates during the import process.  We’ll explain two ways Excel already works to avoid the geneticist’s problem.

Genes named like MARCH1 and SEPT1 which Excel converts into dates when imported the wrong way, usually directly via a CSV text file. Office Watch first mentioned this gene naming trouble in 2016.

Genes have short names because the full name is more than a mouthful.  MARCH1 is short for “Membrane-Associated Ring Finger (C3HC4) 1, E3 Ubiquitin Protein Ligase”

This is a big problem because gene data is often exchanged in text files (CSV or TSV, tab separated). Excel has auto-correct/conversion systems in place to ‘help’ users and those systems can mistake text for something else.

Back in 2016, talked about study found that about 20% of Excel gene list conversions contained errors.  The committee that manages these things is now making the necessary changes.

MARCH1 will now be MARCHF1 while SEPT1 is now SEPTIN1 .  tRNA lists are also being updated to avoid common words. WARS is WARS1 and CARS is CARS1.  There are 27 changes all up.

While it’s mostly gene names, there are other problems like RIKEN identifiers which can be mistaken for large exponential numbers (see above).

This might seem like an obscure problem, we see it quite often.  Incoming data being converted by a ‘helpful’ Excel in ways not appreciated by the customer.  For example incomplete dates ‘Jan 19’ might not be wanted as Excel dates but left as text.

Overly helpful Excel

Whenever importing data, especially from text files, it’s worth looking over it to make sure Excel hasn’t ‘helped’ too much.  In many cases unwanted changes are easy to spot because the cell formatting is also changed and the variations stick out on the screen.

The whole problem would be avoided if people learned how to use Excel properly.  The simple way of opening CSV files isn’t the most reliable option. Scientists still don’t know how to use Excel

There are already two ways to bypass conversion problems in Excel for Windows.

The Excel Fix, part 1

There’s a way to prevent these conversion problems and it’s been in Excel for a long time.

Use File | Open to reach a text data file starts the Text Import Wizard. It’s tempting to zip through the screens, accepting the defaults but Step 3 is there to help gene scientists.

Excel for Mac:  File | Import, select a file then Get Data will start the Text Import Wizard.

Unfortunately, double-clicking the same text file in Explorer opens it without the Text Import Wizard. That’s probably why so many geneticists are getting caught in Excel conversion traps.

Each column of the imported data can be specified to either General, Text or Date format.  The fix is to select a column of gene names and select ‘Text’.

That’s all you need to do.  Excel won’t try to change the text into a date if you choose the right format before importing.

Or, if do you want dates imported, select the date format.

The Excel Fix, part 2

There’s an even better fix for these problems in modern Excel.

Excel’s Get and Transform (aka PowerQuery) is a far better way to import data into Excel.  It’s highly recommended over just opening a .CSV or other data file directly into Excel.

Get and Transform can be told to NOT convert text to other formats.  It’s also got neat tricks for changing and transforming data ready for Excel. Replace Values in PowerQuery can automatically replace any old gene codes like MARCH1 with MARCHF1.

Scientists still don’t know how to use Excel
Great text importing tricks for Excel

Import PDF’s into Excel and refresh later

Excel’s much better data import features

About this author

Office Watch is the independent source of Microsoft Office news, tips and help since 1996. Don't miss our famous free newsletter.