Firstly, what is BuzzData? Functionally, it supports the following features:
creating identity: a user has a profile, etc. with all the normal social capabilities between objects in the BuzzData universe (following people, following data)
uploading data: as per other data markets, BuzzData permits the uploading of data files
associating objects with data: these can be visualizations (note that it doesn't provide its own visualization technology) or articles (discovered online, relating to the data set in question)
searching for data sets: the usual keyword interaction
This set of functionality supports an ecosystem intended to snowball value on to data sets. Users follow datasets, users currate the data (e.g. I find a visualization of a data set and share it). I can comment on data sets, etc. Like any ecological system, one has to figure one of two strategies. You either have to provide value to individual users independent of the design ecosystem (this was exactly the clever part of delicious. It was useful to the user for bookmarking even without all the social effects of discovery and sharing) or you have to ensure there is not a cold start issue (in the case of BuzzData, this would mean that the site was already rich with data sets).
Independent of what the use-cases, persona or other design intentions of the site are, I'm not sure that BuzzData has yet solved the initial conditions in either of the two ways described above. It doesn't have the data coverage of other sites like timetric, ZanRan or d8taplex, yet it doesn't provide data tools such as visualization or statistical analytics or manipulation. However, perhaps this points to the intended value proposition of the site - bringing social to data. It is the user base that provides both of these (or will if things turn out right). That being the case, the data priming challenge is perhaps where the company needs to focus.
Overall, I like the design principles and implementation of the site. True, there are some beta (and alpha) level bugs (I'm having trouble loading up a small data set right now), but that is not exceptional in the highly iterative web application world.
It is going to be very interesting to see how the site grows and evolves as a consequence. Is it a commercial version of IBM's Many Eyes? A twist on DataMarket or InfoChimps? A reimplentation of Swivels (the YouTube of data)?
I've been thinking a little about how to make d8taplex more accessible. One of the challenges is that users don't necessarily know what data - or what variables - are available in the million plus time series in the system. One idea to help surface this sort of information is to mine the tables for concepts in the labels given to the variables.
I've done a little of this and while there is a long way to go, I can see a couple of trends.
Firstly, country names and the names of other geographic and political areas are extremely common. Thus it would seem appealing to provide some sort of location based pivot to the data.
In the last roll out of features on d8taplex I included an experimental dynamic filter for data sets. To access it, you expand the graph, click in the filter text box and start typing. As you do, only time series whose names match in part the regular expression you type will be present in the graph.
I've finally rolled out some updates to d8taplex that I've been tinkering with this summer:
table title extraction: where possible the system now extracts the title of tables and displays it in the results page (it will be added to the data set pages shortly)
correlated data set visualization: as per a few posts on the topic on this blog, I've added a visualization of correlated time series to the data set page allowing you to spot variables that are highly correlated
improved relevance: the system now uses more of the textual context to help rank time series (though there is still lots of work to be done here)
speed improvements: I've made some improvements to the speed of serving search results
I've also introduced a breaking change that improves the id system for data sets so in some older blog posts that embedded d8taplex data you will now see an error message.
As I've mentioned in several posts about d8taplex, my belief is that there is sufficient data on the web that can be discovered, crawled and automatically interpreted by a system like d8taplex or Timetric. Making the automated access to this data more complex or impossible is against the spirit of open data.
Of course, I'm not 100% sure that the data is not openly crawlable on the Oregon site, but my initial inspection suggests that it isn't - I'd be very happy to be proved wrong on this!
It is 18 months until US citizens will have decided to keep their current president or roll in another. While hard many to imagine, this means that chatter is starting now about who will be running for election. This means that we have already started to hear sound bites about why A is better than B and how party X did this and party Y did that. Naturally, bikini statistics will play a major part in the discussion. Or rather, will be used as a tool to bamboozle the electorate.
It doesn't have to be that way.
With services like d8taplex, Timetric, BuzzData, Socrata and other data engines, there is a real opportunity to help people cut through the mumbo-jumbo and go directly to data assets to help make better informed decisions and, perhaps more importantly, to hold the circus accountable for honesty in the use and presentation of data.
A simple idea that I plan to further play with is to create data sets in d8taplex as well as some specialized visualizations to help people understand a number of key points:
The statistical history of their parties
Relative measures of different countries (what does a country with good health care look like?)
Straight forward presentations of scientific data (should we invest in ethanol?)
I rattled out an example of the first area tonight. The graph below shows the spend on national defense in billions of FY 2000 dollars. Overlaid on this data set are coloured areas that represent the party in power at any given time (red = republican, blue = democrat). The data is taken from www.census.gov and is available currently in d8taplex (though not in the form below and not as discoverable as it could be).
I would love to see the other data engines help get out the data!