Putting Health Data into Excel
Examine US public health data in Excel.
One of the great things about the recent Internet is the availability of previously hidden data. Large blocks of information have been collected, checked and published online. Combine that with a tool like Excel and you can do your own analysis!
Here’s just one example of what’s available and how to get it into Excel.
Recently the University of Pittsburgh released Project Tycho, a painstaking compilation of US public health data. It’s taken a lot of work to find the information, check it, enter into a computer form and normalize it.
The headline in the press was the unsurprising result that vaccinations improved public health and saved many millions of lives.
We were more interested in looking at the ‘raw’ data and seeing how it looks in Excel. Project Tycho lets you do that for polio, measles, rubella, mumps, hepatitis A, diphtheria and pertussis. Go to university web site, create an account and login.
Just as an example, we selected Diphtheria data for four major cities in table format.
We chose ‘Incidence’ because it gives a number of cases per 100,000 people instead of ‘Cases’ that gives raw numbers with no adjustment for the city population.
The results appear in a table below the selection dialog with a link to an Excel version at the top.
As you can see, data isn’t always available for all dates.
The ‘Excel’ version is actually a CSV file (comma delimited data) that Excel can read but, in case, the columns get muddled up.
The commas separating city and state (Denver, CO) are interpreted as column separators.
You can fix this by adjusting the column labels or copy the data again from the web table.
From there you can ‘play’ with the data to make charts, tables and pivottables. For example, you could use the weekly data to compare the same week in different years to see if there’s any annual pattern,
Note: examining raw data like this can be interesting but can lead to incorrect conclusions. Maybe misunderstanding the data, don’t check the worksheet carefully or start their analysis with a bias or preferred conclusion. Please read the notes accompanying the data.
Citation: Willem G. van Panhuis, John Grefenstette, Su Yon Jung, Nian Shong Chok, Anne Cross, Heather Eng, Bruce Y Lee, Vladimir Zadorozhny, Shawn Brown, Derek Cummings, Donald S. Burke. Contagious Diseases in the United States from 1888 to the present. NEJM 2013; 369(22): 2152-2158.
The Project Tycho™ database is funded by the Bill & Melinda Gates Foundation and the US National Institutes of Health.