More useful stats functions in Excel
by David Goodmanson
Continuing our series on Excel’s statistical functions … we started with Average/Mean and now move onto the misunderstood cousin Median plus also Mode.
Median and Mode
If we are looking for work and are told that the average salary is $45,000, then we have an idea of what to expect but it’s not a good indication if the salary range is skewed with extremely low or high values.
Figure 2 – Using median, mode and trimmean
That’s where Median() comes in – it tells you the point at which half the values are above and half are below.
Median and Average are often misunderstood and misused – only a few days ago an ‘average’ was badly misquoted at a Microsoft keynote address! When people who should know better get it wrong, confusion is understandable.
Tabloid newspapers can generate a scary headline simply by taking an average and ignoring the median.
One way to ‘lie’ with statistics is to quote either the average or median as suits your agenda, but not both.
In a list of evenly distributed values, the Average() and Median() will be about the same. But when there are extreme values the Average and Median will be further apart. If you see an Average and Median quoted that are quite different, it’s time to look more closely at the underlying data. This is because the average is influenced by the extreme values whereas the Median is independent of those extremes, and will be therefore closer to the “middle”.
In Excel Median() and Mode() are other types of measures of central tendency that can be used as alternatives or complements to the average.
The Median is the number in the middle of a set of numbers.
Mode()
The Mode is often referred to as the most common number in the set of data or the most frequently occurring number.
It assumes there are at least two exactly matching values if there aren’t any matches you’ll get a #N/A error.
A simple example of Mode using the values:
1,1,2,3,5,8,13,21
Mode will return ‘1’ since it’s the only repeated value even though it’s the lowest value.
If you remove the repeating 1 in the Fibonacci series Mode returns #N/A instead.
For a more complete discussion of the distinctions between Average, Median and Mode go to the Wikiversity
As can be seen from the example above, the Average for the salary data set is $20 million. This is brought about by the $200 million value for Chief Software. This value distorts the centre of the data set and, for this reason, other measures – such as Median and Mode – are more appropriate.
The median formula =MEDIAN(B2:B11) returns $200,000.
The mode formula =MODE(B2:B11) returns $200,000.
Median and Mode are very useful when the average is distorted by very large or very small values. Real Estate prices are often denoted by a Median value. From looking at the salaries data, it makes sense to expect that the likely salary is going to be $200,000 (not $20M), hence the median and mode are going to be representative.
Don’t forget to visit the Microsoft Excel 2007 Statistical Function Reference by clicking on this link.
Excel 2007 Statistical Functions
Well, that is it! There you have the average and some of its derivatives. Next article will look at measures of dispersion – how varied the data is. This will include standard deviation and variance. See you then.