Jure just pinged me about a new project – Memetracker.org - he’s been working on with Lars Backstrom and Jon Kleinberg. The project analyses content for quotes and then displays them temporally using an interactive stacked plot. In their own words:
MemeTracker builds maps of the daily news cycle by analyzing around 900,000 news stories per day from 1 million online sources, ranging from mass media to personal blogs.
We track the quotes and phrases that appear most frequently over time across this entire online news spectrum. This makes it possible to see how different stories compete for news coverage each day, and how certain stories persist while others fade quickly.
I’m assuming that they mean they analyse 900k articles from 1 million sources including MSM and weblogs (not 900k news articles).
This is an interesting project, but I’m not a big fan of stacked plots. Peaks in the data for a variable may appear as artifacts of (the aggregates) of other variables, so while they are good at showing overall trending, they are poor at showing trending for individual items.
See Google’s InQuotes for related stuff.
Comments