Safely import text into Excel without Mistakes

Modern Excel for Windows has a much better way to import text files and other data which will avoid or fix problems that the older Open file or Text Import Wizard options cause.

Excel tries to help when opening text files like CSV or TSV by converting text into cell types like dates, numbers etc.  But that’s not always wanted.  Numeric Identifying codes or SKU’s are converted to numbers.   Text can be mistaken for dates or even large numbers.

The old way to import text is simply to File | Open and select the file.  That opens the text file with the Text Import Wizard which has some conversion options including forcing a field to text.

CSV files can be opened directly with just a double-click in Explorer because .csv is usually associated with Excel. But that’s a bad move because that bypasses the Text Import Wizard and lets Excel play all merry hell with the original text.

Get and Transform

The better option for importing any type of data, including CSV files, is Get & Transform or PowerQuery (Microsoft plays a bit with naming).  It’s in Excel 365, 2021, 2019, 2016 and 2013 for Windows with features that have improved a lot over the years.

Excel for Mac: Excel 365 for Mac has a version of Get & Transform look at Data | Get Data (PowerQuery)

PowerQuery is an example of a feature that was dropped into Office before it was ready.  The original Get and Transform was limited and buggy.  If you tried and gave up on PowerQuery, we strongly suggest giving it another go.

It can import a wide range of types from file or online sources.  The beauty of this feature is that once the initial import is done, PowerQuery lets you transform the information to suit your needs.

Each time the data is imported, the same transformations and changes can be done automatically.  The same steps can be repeated and even copied to other conversions.

There’s a lot possible in PowerQuery, we’ll just look at a few simple options to get you started.

  • Make top row into headers
  • Forcing imported fields to a specific cell type
  • Substituting specific imported values to something else

We’ve looked at PowerQuery before to do things like transposing a table columns and rows.

It’s also used extensively in our book Real Time Excel to import info like currency exchange rates, stock prices and latest weather data for Excel 365, 2021, 2019 and Excel 2016.

Import Data File

Start at Data | Get and Transform then choose the file to import.  What you see next depends on the data type, here’s a typical text file import dialog.

Most likely Excel has made the correct choices based on the incoming fields but, if not, change the file encoding and delimiter.

File Origin – or file encoding system. UTF-8 is common these days but another setting might be necessary if a wider character set or language is being used in the data.

Delimiter – comma is the most common field separator (the ‘C’ in CSV) but there are other choices plus a custom and fixed width setting.

So far, Get and Transform seems little different from the old Text Import Wizard but here’s where the fun begins.

You could choose ‘Load’ and drop the data file into an Excel worksheet, better to choose Transform Data.

Transform Data

Transform Data takes you into PowerQuery.  Don’t let PowerQuery’s many tabs and options put you off. Yes, there’s a lot of complex options available, ‘Softies have done a good job of making common tasks easily available.

A sample of the incoming info appears in the main window.

On the right is the important ‘Applied Steps’ list.  This is a list of the actions that will be done on the imported text.  Steps can be moved, deleted and sometimes edited from that list.  Any step with a little ‘cog’ icon has underlying settings.

Top Row as Headers

A common need is to make the top row into headers. That’s a one click job from Transform | Use First Row as Headers.

See in ‘Applied Steps’ that ‘Promoted Headers’ is now a step in the query.

Change column type

Look at column headings and the little icon on the left.  That’s the column type indicator.

In our example, Get & Transform has guessed correctly that all three columns are Text. That’s because ‘Get and Transform’ looks at a sample of the incoming fields before deciding what type to use. 

(Unlike Excel importing directly which uses the ‘General’ default and treats each cell/data as a separate element, resulting in a mix of data types in the same column)

PowerQuery forces each field/column into a single type.

If Excel guessed wrongly, choose Transform | Data Type and the correct type.

Notice that the dreaded ‘General’ type has no place in PowerQuery.

And PowerQuery has cell types not immediately available in Excel like Duration, Date/Time/Timezone, True/False and Binary.

Changing Values

Sometimes the incoming data needs some changes, experts call this data normalization.  The rest of us say ‘fixing the list’.

A common need is to change a value.  For example, a sales region or product has changed name but the incoming file has old names.  We once needed to change an employee name that had been spelled wrongly and the incorrect spelling had got into too many lists and databases.

Or for geneticists, they need to change the old gene name like MARCH1 to the new name MARCHF1. We want to change the two gene names to their 2020 versions.

Select the columns to replace in and (optionally) the cell to change then go to Transform | Replace Values

Notice there’s also a ‘Replace Errors’ option that’s really handy for fixing data problems by inserting some value for errors.

Replace Values will search through the selected column/s and do a Find/Replace.

Definitely click ‘Advanced Options’ because it has a vital choice.

Match entire cell contents – defaults off but often it should be on.  If this choice is off, there’s a real risk of a cell value being changed unintentionally by a partial match.

Replace using special characters – let you add normally un-typable characters into the Replace field.

We’ve done that twice to replace two incorrect values. The changes are Applied Steps and you can see the change directly in the preview window.

Over and over again

Open the Query then go to Home | Data Source Settings | Change Source to select a different file.

If the file is the same name and location, just Refresh the query.  The source file will be imported again and the steps applied.

Load into Excel

When you’ve done adding PowerQuery steps go to Close and Load.  That drops the data into an Excel worksheet as a Table.

The Queries and Connections pane shows the Connections.  Right-click and choose Edit if you need to make more changes in PowerQuery.  Refresh will grab the source file again, apply the steps and update the worksheet table.

Gene names changed to stop confusing Excel

Excel’s General problem that messes up what you type

Import PDF’s into Excel and refresh later

Excel’s much better data import features