I've recently read two interesting posts on the process and principles of data journalism. One, by Joel Gunter, captures Matthew Ericson's workshop on data journalism at the New York Times. The other, by Simon Rogers of the Guardian, reviews the processes employed at the company in collecting and preparing data for data driven stories.
Neither of these stories reviews any process around assessing or questioning the quality of the data employed, or its source. I don't mean to indicate that these institutions aren't concerned with the quality of the data they report - Simon replied to my comment on this point on his blog post. But just as we expect accountability regarding the sourcing of information and redundancy of sources for traditional journalism, we should expect these data sensibilities from data journalists.


I'm curious about your comment to Simon that
> The open data movement has, to some degree, spread the assumption that government data is correct.
What's that opinion based on?
Posted by: Markschaver | April 17, 2011 at 10:36 AM
Mark - firstly, yes this is an opinion. It is based on observations from reading posts by data journalists (e.g. the two cited here), and the data presentations that I see online in places like NYT, Guardian, etc. When I see an article or blog post that visualizes some data, cites a government source and draws a conclusion - with no discussion about the data itself - it makes me nervous.
In addition, I see a lack of tools in data engines like DataMarket, Timetric, etc., that suggest that users of that data should even think about the quality of the data. Where are the margins of error? Where are the tools to discover other data sets that describe the same variables but perhaps with different values?
Eidosearch might be one of the first visible online systems that allows us to ask some of these questions.
Posted by: Matthew Hurst | April 17, 2011 at 10:47 AM