Benford’s Law is an amazing tool that is simple to use in Excel.
Benford’s Law is one of those mathematical laws that seems to defy common sense but works for most naturally occurring number sets.
It says that in most groups of naturally occurring numbers, the leading digit 1 will occur more than 2 as a leading digit and so on down to numbers starting with 9 occurring least often.
Sounds screwy, but it works, the Wikipedia page has a great example where the heights of the 60 tallest buildings are compared. Whether you use meters or feet, Benford’s Law applies!
Benford’s Law in Excel
Applying Benford’s Law in Excel is quite simple.
Firstly, create a column of leading digits only using the LEFT() function. Despite what Excel documentation sometimes says, LEFT() works with numbers (not just text) and will ignore any currency symbol if defined in the cell formatting. For Benford’s Law use LEFT(<cell ref>,1)
Then use COUNTIF to count the instances of each leading digit from 1 to 9 eg COUNTIF(<cell range>,"1")
– remember that LEFT() returns a string/text value so the COUNTIF comparison is “1” not the digit 1 .
Finally display the count for each leading digit as a percentage of the total count. See if the % values descend in the order suggested by Benford’s Law.
Leading Digit | % distribution according to
Benford’s Law for base 10 |
---|---|
1 |
30.1% |
2 |
17.6% |
3 |
12.5% |
4 |
9.7% |
5 |
7.9% |
6 |
6.7% |
7 |
5.8% |
8 |
5.1% |
9 |
4.6% |
With the above table of ideal values according to Benford’s Law you can compare against the actual values using:
- Make a line chart to compare the two percentage ranges visually. This is the easiest way to compare at a glance.
- Use the CHITEST function to compare the two ranges. The closer the ChiTest value is to 1, the more alike the two ranges are.
- Another set of cells showing the variation above or below the Benford’s Law distribution – simply Actual – Benford’s expected
- Conditional Formatting to show variations above or below the threshold.
You don’t need to copy the Benford’s distribution values above. For a more exact set of comparison values use the formula =LOG10(1/+1)
Example worksheet
Here’s an example of Benford’s Law at work with full working. We’ve also included the ChiTest() function to give a single, simple value comparing the actual and predicted results.
All the comparisons and formulas are there. We’ve shown the leading digit value in a separate column but usually you’d skip that step. Change the values in Column A (white background cells) to see changes in the leading digit distribution. A second tab has the same formatting but with random values which normally should not follow Benford’s Law. Press F9 to refresh the values.
Get the Benford’s Law example
The Benford’s Law example worksheet is available free to all Office Watch and Office for Mere Mortals subscribers (also free).
The Exclusive Downloads page link is in every issue of both email newsletters. Join Office Watch and Office for Mere Mortals and you’ll get the link with the next issue.
Use with care
In the real world, Benford’s Law is often applied to check if data has been tampered with or outright made up.
If someone has faked data or tinkered with the numbers that will affect the Benford’s Law distribution. This makes it a useful tool for auditors or others checking for fraudulent data.
But Benford’s Law needs to be used with care because not all data sets are distributed evenly or widely enough.
When Benford’s Law does NOT work
An example that would NOT work with Benford’s Law is a list of petty cash receipts. If the petty cash limit is say $40 most of the amounts will have leading digits between 1 and 3 only and probably many just under the $40 limit.
Similarly a list of large check approvals, because of the arbitrary definition of ‘large’ in any organization. However if you had a list of all outgoings from small to large, Benford’s Law might apply.
A series of adult human heights or weights also don’t obey Benford’s Law because most people are within a narrow range of heights or weights (ie you won’t have adults weighing 10lb or 20kg). Telephone numbers won’t work because there are arbitrary prefixes or blocks of numbers issues. On the other hand a large list of street numbers from an address list probably will obey Benford’s Law.
Benford’s Law does NOT work checking vote counts, which is surprising but very, very true. The voting population of an electorate or county is artificially limited by redistricting. Each voting area is managed so the voter count is within a range — for example a county won’t be very small or very large (e.g. an overpopulated county would be split into smaller ones for voting purposes). Enforced size limits mean that a Benford’s analysis will be invalid. If you’d like to know more, this YouTube video is a good explanation or check out “Benford’s Law and the Detection of Election Fraud“
A series of random numbers in Excel (pseudo random) multiplied by 100 will not obey Benford’s Law because the digits are random. Our example worksheet demonstrates that.
Large scale without limits
In short, Benford’s Law is a useful tool for checking data, but it needs to be used with care and understanding of the data source. Large scale numbers without arbitrary limits work best.
A history of Benford’s Law is littered with people who falsely claim fraud based on a mistaken understanding of the data source.
See Also
- Styles in Excel
- Big numbers rounded in Excel
- New Prime Number is too big for Excel
- Comparing series of values in Excel
- Random Numbers in Excel