It is fascinating to see how many of these (almost all) are still in business and how rich their online experiences and product suites have become.
Now, there is another site to add to the list of data engines: Quandl.Quandl offers search over 8 Million data sets. A search brings up a results page with a list of data sets, related topics and relevant sources. For example, a search for 'french unemployment' brings up the following:
From here, the user can drill down to a specific data set and get the usual interactions with time series graphs, downloading of data sets, etc. The graphing tool allows a number of modifications (e.g. raw data, % change, etc.).
There isn't much on the site about the history of the company, but the wayback machine tells me that the root URL was first archived in April 2012. Whois tells me the domain was registered in 2012.
Firstly, what is BuzzData? Functionally, it supports the following features:
creating identity: a user has a profile, etc. with all the normal social capabilities between objects in the BuzzData universe (following people, following data)
uploading data: as per other data markets, BuzzData permits the uploading of data files
associating objects with data: these can be visualizations (note that it doesn't provide its own visualization technology) or articles (discovered online, relating to the data set in question)
searching for data sets: the usual keyword interaction
This set of functionality supports an ecosystem intended to snowball value on to data sets. Users follow datasets, users currate the data (e.g. I find a visualization of a data set and share it). I can comment on data sets, etc. Like any ecological system, one has to figure one of two strategies. You either have to provide value to individual users independent of the design ecosystem (this was exactly the clever part of delicious. It was useful to the user for bookmarking even without all the social effects of discovery and sharing) or you have to ensure there is not a cold start issue (in the case of BuzzData, this would mean that the site was already rich with data sets).
Independent of what the use-cases, persona or other design intentions of the site are, I'm not sure that BuzzData has yet solved the initial conditions in either of the two ways described above. It doesn't have the data coverage of other sites like timetric, ZanRan or d8taplex, yet it doesn't provide data tools such as visualization or statistical analytics or manipulation. However, perhaps this points to the intended value proposition of the site - bringing social to data. It is the user base that provides both of these (or will if things turn out right). That being the case, the data priming challenge is perhaps where the company needs to focus.
Overall, I like the design principles and implementation of the site. True, there are some beta (and alpha) level bugs (I'm having trouble loading up a small data set right now), but that is not exceptional in the highly iterative web application world.
It is going to be very interesting to see how the site grows and evolves as a consequence. Is it a commercial version of IBM's Many Eyes? A twist on DataMarket or InfoChimps? A reimplentation of Swivels (the YouTube of data)?
I've been thinking a little about how to make d8taplex more accessible. One of the challenges is that users don't necessarily know what data - or what variables - are available in the million plus time series in the system. One idea to help surface this sort of information is to mine the tables for concepts in the labels given to the variables.
I've done a little of this and while there is a long way to go, I can see a couple of trends.
Firstly, country names and the names of other geographic and political areas are extremely common. Thus it would seem appealing to provide some sort of location based pivot to the data.
In the last roll out of features on d8taplex I included an experimental dynamic filter for data sets. To access it, you expand the graph, click in the filter text box and start typing. As you do, only time series whose names match in part the regular expression you type will be present in the graph.
I've finally rolled out some updates to d8taplex that I've been tinkering with this summer:
table title extraction: where possible the system now extracts the title of tables and displays it in the results page (it will be added to the data set pages shortly)
correlated data set visualization: as per a few posts on the topic on this blog, I've added a visualization of correlated time series to the data set page allowing you to spot variables that are highly correlated
improved relevance: the system now uses more of the textual context to help rank time series (though there is still lots of work to be done here)
speed improvements: I've made some improvements to the speed of serving search results
I've also introduced a breaking change that improves the id system for data sets so in some older blog posts that embedded d8taplex data you will now see an error message.
As I've mentioned in several posts about d8taplex, my belief is that there is sufficient data on the web that can be discovered, crawled and automatically interpreted by a system like d8taplex or Timetric. Making the automated access to this data more complex or impossible is against the spirit of open data.
Of course, I'm not 100% sure that the data is not openly crawlable on the Oregon site, but my initial inspection suggests that it isn't - I'd be very happy to be proved wrong on this!