We bring together earlier articles to create a quick and simple Excel data analysis.
by David Goodmanson
In the final article of this series we will bring together what we have learnt in earlier articles to create a data analysis.
Entering all those formulas, even simple ones like Average() is a pain. Thankfully Microsoft has put in a simple way to get statistical details from a list of numbers.
Using the descriptive statistics tool we can create an analysis quickly and easily in Excel 2002 (XP) and later. The descriptive statistics tool simplifies the calculation and display of statistics.
The analysis below starts by considering a sample of student scores from a history test.
Open Excel to a new worksheet. The data consists of a sample of 18 scores from a history test. Key the student results data into columns A and B of your worksheet. Scores are marked out of 100.
Once the data is in, the Excel Data Analysis tool can be used to produce a range of descriptive statistics.
First, select the history scores in column B rows 2 to 19 as shown.
Next, we engage the Excel Data Analysis tool.
In Excel 2007, go to the Data tab on the ribbon and in the Analysis section click on Data Analysis.
If you are using Excel 2003 and Excel 2002 (XP), go to Tools -> Data -> Data Analysis.
You may need to install the Data Analysis add-in from Tools | Add-ins or Excel Options | Add-ins | Go… in Excel 2007/2010.
Clicking “Data Analysis” calls the Data Analysis dialogue box, as shown below. From the Analysis Tools list, select “Descriptive Statistics” and click OK.
This brings up the Descriptive Statistics dialogue. We need to choose all the options we want for our analysis:
In the Descriptive Statistics dialogue box:
- Enter the “Input Range” as B2:B19. (Or collapse the dialogue box and select the input range using the mouse.
- Select “Grouped by” Columns.
- “Labels in first row” should be clear
- Select “Output Range” and enter E1 (this indicates the top left of the output range).
- Then tick “Summary statistics”
- When finished click OK.
The descriptive statistics are inserted as shown below.
As can be seen from the data analysis table, the spread for history scores is from 7 to 100 – equating to a range of 93 marks. This indicates how dispersed the data is, considering the test was scored out of 100. However the Range is limited by the fact that it only uses 2 values. These two values could be outlier values thereby give a false indication of the spread of the data.
The mean and the median are relatively close in value (mean = 53 and median = 54.5) indicating that the distribution of results are not significantly skewed to one end. The spread of values is evenly distributed around the mean and the median. This implies that students were equally as likely to score above the mean as below it.
Finally, there are two other measures that provide information about how the data is spread – they are Percentiles and Quartiles. Percentiles indicate the percentage of values that lie below the percentile value. Quartiles are simply specific Percentiles, e.g. 1st quartile = 25th percentile, 2nd quartile = 50th percentile (also the median) and 3rd quartile = 75th percentile.
Sort the data as shown – from smallest to largest numbers. NB: sorting is not essential for percentiles but it helps us see how Percentile and Quartile work in practice.
For example to calculate the 50th percentile key in, =Percentile(B2:B19, 0.50) into cell D10 (or any cell) and press enter. The percentile function returns 54.5. Therefore 50% of the data lies above 54.5 and 50% below. Similarly the 2nd Quartile (identically equal to the 50th Percentile) can be calculated by entering =Quartile(B2:B19, 2). Again 54.5 is returned.
So, to calculate any percentile simply enter =PERCENTILE(Range of data, decimal fraction). If Quartiles are needed simple use the corresponding percentile as shown above.
Pity that ….
The Descriptive Statistics function creates a nicely formatted and comprehensive list of statistics but there’s one big gotcha – it puts numbers in the cells not formulas.
For example, the Mean above is just the number 53 in a cell not the formula =Average(B2:19) which Excel used to calculate it.
That’s a problem because any change in the source data (eg a students score is altered) the statistics are NOT updated as you’d be entitled to expect. Why Microsoft inserts the results in this way is a mystery. Aside from the risk of inaccurate analysis, inserting the formulas would be a useful teaching tool for Excel novices and experts have no way of knowing exactly how Excel has calculated a particular result.
In the above example, there are three results that are not covered in this article. But here’s some pointers for anyone who is interested.
The Standard Error of the mean is calculated by dividing the standard deviation by the square root of the population size. There is no Excel function.
Kurtosis and Skew
Kurtosis sounds like an alien disease that infects the crew of the Starship Enterprise .. in fact it’s a “a measure of the ‘peakedness‘ of the probability distribution of a real-valued random variable” – details. The Excel function is KURT()
Skew is a complementary measure of the “asymmetry of the probability distribution of a real-valued random variable” – details. The Excel function is SKEW()
Well, that is it for Data Analysis with Excel. I hope you’ve enjoyed the series and you have gained insights into the analytical process using Excel. Thanks very much for your readership and feedback. Much appreciated
- Data Mining ‘fun’ coming soon
- Status Bar stats in Excel
- Variability in Excel
- Median and Mode functions in Excel
- Average / Mean options in Excel