How to compare one set of numbers with another list of values using ChiTest() in Excel.
ChiTest() or ChiSq.Test() sounds like a complex formula that only expert statisticians can understand – not at all! While the math behind ChiTest() is formidable, it’s easy to use and understand.
We all know how to compare one number against another – equal to, less than, greater than etc. But what if you have a series of numbers that you want to compare, as a group, with a similar set of numbers to see how different each set of numbers is from the other.
Comparing a series of values happens quite often and is a useful tool for professionals and casual Excel users. It’s not only practical but an example of how a once complex mathematical function is now available with a few keyboard strokes.
CHITEST function in Excel
ChiTest() is most commonly used to compare a new set of results from the real world against the standard or expected results. It can also be used to compare one period’s results against another periods or the new results against the past average.
The syntax is:
CHITEST (actual range, expected range)
The above works in all versions of Excel but the preferred naming in Excel 2010 and later:
CHISQ.TEST (actual range, expected range)
Use CHITest if your worksheet may be opened in Excel 2007 or previous.
The function is defined as “the test for independence. The value from the chi-squared distribution for the statistic and the appropriate degrees of freedom.“
In English that means a fully matching set of values returns 1. Any variation in the two series means a value less than one.
The more variation between the two lists – the lower the ChiTest result.
Remember that ChiTest is only a guide to variance and it works better with larger sets of numbers.
Here’s an example of ChiTest using tests of air samples that might be used by Scuba divers to check the air in their tanks.
As you can see, Column B has the expected results for a standard air sample.
Pedantry note: the Standard Air values are approximate and rounded – see Wikipedia for a more precise and detailed list.
The other columns have the results from various tests including some for ‘Enriched Air’ options (aka Nitrox 32 and Nitrox 40) as well as a deliberately extreme set for demonstration purposes.
The Variance row 7 has the ChiTest results comparing the column above with column B eg
Test 1 in column C has exactly the same values as column B so ChiTest returns 1.
Column D has a fairly small variation – so the ChiTest returns 0.99335
The later columns have increasingly greater differences from the Standard Air values in column B.
Once you have the ‘raw’ ChiTest values you can massage them into something more obvious.
In row 8 we’ve copied the Chitest values from row 7 and applied the Percentage format instead.
In row 9 we’ve inverted the ChiTest value to show the percentage ‘off’ standard the test was – 1%, 3% etc.
Which value use depends on whether you want to show how close the results are or how far apart.
How much variation is too much?
Obviously how much ChiTest variance is acceptable depends on the circumstances – this isn’t a function you can use on its own without consulting other measures. It’s a broad test that’s good for narrowing down larger sets of values to a smaller field of candidates.
The ‘Test 2’ result shows only a .007 variance according to ChiTest but even that ‘small’ variation would be unacceptable as breathable air with only 9% oxygen!
Column F and G (Nitrox 32 and Nitrox 40) have almost the same or great ChiTest variance from air as ‘Test 2’ but would be acceptable in some circumstances. In fact they are the correct values for some Enriched Air options used by divers.
The simple conditional formatting we put in row 7 above isn’t appropriate in this situation with only the most extreme result in deep red. In practice you’d need conditional formatting that highlighted anything out of range as well as excessive overall variance.
Here we’ve added some conditional formatting to highlight high or low oxygen values in addition to the ChiTest.
We’ve also changed the formatting on the Variance row 7 to highlight values with more than .995 variance.
Why are we babbling on about the ChiTest function in Excel? Because it’s a useful prelude to a coming article on Benford’s Law.