Data Mining 'fun' coming soon

The new Wikidata site should provide hours of Excel ‘fun’.

The WikiMedia foundation, operators of Wikipedia among others, is developing a new site called Wikidata.

It will be a home for public ‘structured data’ in other words, lists. There are plenty of lists already in Wikipedia. Wikidata will link with Wikipedia and help structure those lists into standardized forms.

Unlike Commons, which collects media files, and the Wikipedias, which produce encyclopedic articles, Wikidata will collect data, in a structured form. This will allow easy reuse of that data by Wikimedia projects and third parties, and will enable computers to easily process and “understand” it.”

At the moment you have to manually copy tables from Wikipedia – a messy process detailed in the past Copying tables from Wikipedia to Word

Hopefully you’ll be able to download these lists in a form directly accessible in Excel – ideally XLS or XLSX. Maybe even ‘subscribe’ to a data feed so you can automatically get any updates to the list.

Hint, hint: Microsoft could assist Wikimedia to ensure Excel compatibility for their data feeds and downloads. How about it guys?

Currently there’s a lot of publically available data out there – statistics, lists etc but they are spread all over the place in different format. Wikidata is an opportunity to bring lots of information into one place with common formatting.

You’d be able to grab information and then massage it yourself with filtered lists, PivotTables and other clever stuff in Excel. If the data is structured well you’ll be able to grab different lists and cross-reference them – for example get a list of countries with population and another of income distribution then compare the two.

We can’t wait …