Benford’s Law and Excel

Office for Mere Mortals helps people around the world get more from Word, Excel, PowerPoint and Outlook. Delivered once a week. free.


Benford’s Law is an amazing tool that is simple to use in Excel.

Benford’s Law is one of those mathematical laws that seems to defy common sense but works for most naturally occurring number sets.

It says that in most groups of naturally occurring numbers, the leading digit 1 will occur more than 2 as a leading digit and so on down to numbers starting with 9 occurring least often.

Sounds screwy, but it works, the Wikipedia page has a great example where the heights of the 60 tallest buildings are compared. Whether you use meters or feet, Benford’s Law applies!

Benford’s Law in Excel

Applying Benford’s Law in Excel is quite simple.

Firstly, create a column of leading digits only using the LEFT() function. Despite what Excel documentation sometimes says, LEFT() works with numbers (not just text) and will ignore any currency symbol if defined in the cell formatting. For Benford’s Law use LEFT(<cell ref>,1)

Then use COUNTIF to count the instances of each leading digit from 1 to 9 eg COUNTIF(<cell range>,"1") – remember that LEFT() returns a string/text value so the COUNTIF comparison is “1” not the digit 1 .

Finally display the count for each leading digit as a percentage of the total count. See if the % values descend in the order suggested by Benford’s Law.

Leading Digit

% distribution according to Benford’s Law for base 10

1

30.1%

2

17.6%

3

12.5%

4

9.7%

5

7.9%

6

6.7%

7

5.8%

8

5.1%

9

4.6%

With the above table of ideal values according to Benford’s Law you can compare against the actual values using:

  • Make a line chart to compare the two percentage ranges visually. This is the easiest way to compare at a glance.
  • Use the CHITEST function to compare the two ranges. The closer the ChiTest value is to 1, the more alike the two ranges are.
  • Another set of cells showing the variation above or below the Benford’s Law distribution – simply Actual – Benford’s expected
  • Conditional Formatting to show variations above or below the threshold.

You don’t need to copy the Benford’s distribution values above. For a more exact set of comparison values use the formula =LOG10(1/+1)

Example worksheet

We’ve made a deliberately simple example worksheet for you to download and try for yourself. The small (about 30kb) XLSX file is available here.

Benfords Law example worksheet image from Benford

All the comparisons and formulas are there. We’ve shown the leading digit value in a separate column but usually you’d skip that step.

Change the values in Column A (white background cells) to see changes in the leading digit distribution.

A second tab has the same formatting but with random values which normally should not follow Benford’s Law. Press F9 to refresh the values.

Use with care

In the real world, Benford’s Law is often applied to check if data has been tampered with or outright made up.

If someone has faked data or tinkered with the numbers that will affect the Benford’s Law distribution. This makes it a useful tool for auditors or others checking for fraudulent data.

But Benford’s Law needs to be used with care because not all data sets are distributed evenly or widely enough.

When Benford’s Law does NOT work

An example that would NOT work with Benford’s Law is a list of petty cash receipts.  If the petty cash limit is say $40 most of the amounts will have leading digits between 1 and 3 only and probably many just under the $40 limit.

Similarly a list of large check approvals, because of the arbitrary definition of ‘large’ in any organization. However if you had a list of all outgoings from small to large, Benford’s Law might apply.

A series of adult human heights or weights also don’t obey Benford’s Law because most people are within a narrow range of heights or weights (ie you won’t have adults weighing 10lb or 20kg). Telephone numbers won’t work because there are arbitrary prefixes or blocks of numbers issues. On the other hand a large list of street numbers from an address list probably will obey Benford’s Law.

A series of random numbers in Excel (pseudo random) multiplied by 100 will not obey Benford’s Law because the digits are random. Our example worksheet demonstrates that.

Large scale without limits

In short, Benford’s Law is a useful tool for checking data, but it needs to be used with care and understanding of the data source. Large scale numbers without arbitrary limits work best.

A history of Benford’s Law is littered with people who falsely claim fraud based on a mistaken understanding of the data source.


Want More?

Office Watch has the latest news and tips about Microsoft Office. Independent since 1996. Delivered once a week.