PowerQuery mystery – how to Remove Duplicates, keeping most recent record

Office for Mere Mortals
Your beginners guide to the secrets of Microsoft Office
Invalid email address
Tips and help for Word, Excel, PowerPoint and Outlook from Microsoft Office experts.  Give it a try. You can unsubscribe at any time.  Office for Mere Mortals has been running for over 20 years, we've never, ever revealed or sold subscriber details.  Privacy policy

Excel 365’s PowerQuery has many wonderful features but occasionally there are common needs that it can’t easily handle.  One of those surprising complexities is removing older duplicates but retaining most recent row for each. We’ll explain the simplest solution then why it works.

Here’s some sample data on the left and the most recent purchase for each customer at right. 

The solution

There are various ways to solve this problem, this is the easiest we’ve found. 

  1. Load the data into Excel PowerQuery.
  2. Change the fields / columns to whatever formats you want.  Make sure the Date field is in Date format.
  3. Sort the Date field, descending (so most recent dates are on top).


The code is : 
Table.Sort(#”Changed Type”,{{“Date”, Order.Descending}}
We’ll come back to this code a little later.

  • Select the Name field/column then Remove Duplicates

     The code is:  Table.Distinct(#”Sorted Rows”, {“Name”})

That’s how you’d expect to get the right result, but it doesn’t work properly.  You have to add a PowerQuery fix that gives the correct result.

  • Go back to the Sorted Rows step, it needs a little extra to fix the PowerQuery problem.

            In the formula bar wrap the existing Table.Sort command with the mysterious Table.Buffer function.

            For example

Table.Sort(#”Changed Type”,{{“Date”, Order.Descending}}

Becomes …

Table.Buffer(Table.Sort(#”Changed Type”,{{“Date”, Order.Descending}})

There are other solutions, most commonly using Group By.  We prefer this method because it’s simple to implement and understand.

Why it works

You’d think it’s a two-step process in PowerQuery would be enough; Sort into date order, descending then Remove Duplicates.  Alas that doesn’t work, here’s what happens when you try.

The result for ‘Stephen’ is wrong – 17 July instead of 22 November,

The problem is the way PowerQuery handles a sort in the middle some applied steps.

Table.Buffer()

Table.Buffer() forces PowerQuery to load the list after sorting.  That means the following steps (like Remove Duplicates) work off the entire sorted list and give the correct results.

All Microsoft says about Table.Buffer is “Buffers a table in memory, isolating it from external changes during evaluation.” which doesn’t even begin to explain it’s vital importance in some situations and the performance gains possible elsewhere.

Excel 365’s data sources now include dynamic arrays
Solved! The problem with Excel Tables and Transpose
Better management of large databases in Excel

Latest news & secrets of Microsoft Office

Microsoft Office experts give you tips and help for Word, Excel, PowerPoint and Outlook.

Give it a try. You can unsubscribe at any time.  Office Watch has been running for over 20 years, we've never, ever revealed or sold subscriber details.  Privacy policy
Invalid email address