It's interesting to see an article on BlogPulse's top news stories for yesterday about Intelliseek/BuzzMetrics (and by implication, BlogPulse). It's also interesting to hear that BuzzMetrics is referred to as a 'giant' in this space:
To capture the chatter, Nielsen BuzzMetrics, a giant in the industry,
uses software that collects hundreds of thousands of comments a day.
The technology can scan for specific companies, products, brands,
people -- anything searchable. It can slice data into a range of
categories to quantify the number of times a subject was discussed
online, the individuals who mentioned it and the communities where it
Hundreds of thousands is probably an underestimate. Also the 'anything searchable' is an undervaluation of our technology. The expectations of search are far weaker than what true text mining, nlp and categorization technology can do. Anyway, nice article.
The blogosphere has made plenty of noise around the idea that it scoops main stream media. Personally, I don't believe this happens as often as some would have us believe, though it certainly does happen and often, as in the case of certain types of events like natural disasters, with clear impact and value.
I do believe, however, that there is steadily increasing delay in ideas getting picked up and amplified by the A-list. Of course, this is the type of claim that needs far more than the single point of anecdotal evidence that I'm going to point to, but the hypothesis suggests that, as the blogosphere matures, how it operates, and the role and influence of the A-listers is going to start mirroring much of main stream media.
On Feb 24th, Steve Rubel posted about What's Up? - a news/geolocation visualization. I had posted about this on Feb 14th after reading about it on the most excellent Infosthetics that same day. Looking back further, using BlogPulse's Conversation Tracker (o how I love thee), we can see that Peter Conolly posted about it on January 27th when it was being digged. It turns out that there are a number of different URLs pointing to the page, and so the earliest post I can find is actually from Jeroen Leijen, who posted on Jan 4th. Looking at the Alexa stats for the author's site:
shows us the digg day (Jan 27th) and possibly a couple of earlier days (late December and early January).
Searching on digg shows us that the site was put there by MilkAndCookies - I'm guessing related to the site which appears to have posted the link on Feb 8th after digging it.
Now, I'm not 100% sure that Rubel's post was the first A-lister to blog this (Technorati doesn't yet have Rubel's post as far as I can tell, and the highest ranking blogger for this link when using Technorati's rank by authority is Infosthetics). However, if we follow the story, it shows that Rubel picked this up a couple of months after it was launched, and about a month after it was digged. This is not really a criticism of the system, more an observation and a heads up about how to use A-listers in your reading habits. What I would criticise is that when something like this does surface, the commentary is not really interesting or insightful. Rubel gives a 'isn't this cool' post and fails to link to or compare with other similar services. The whole notion of citizen journalism surely implies something more than passing links around - don't these people have something to say?
On the 12th of February, New Yorkers received the biggest dump of snow the city had every experienced. Looking at the blogging activity around this event we can see a clear peak reflecting that fact. In addition, there is another larger peak. Memory being what it is, I wondered where that other snow fall had hit. However, as you can see from the trend line for New York, when people blog about the weather, they don't seem to explicitly state where it is. People in New York don't say 'it's snowing in New York', they just say 'it's snowing'. Only a small percentage actually provide both the meteorological and geographic information in the text. This means that, yes, knowing where people are located is a key dimension in analysing online data - otherwise, when the aliens land, we won't know where they hell they are.
Actually, the fun way to view this is to consider how one picks up the smaller signal with the intersection of weather and location information automatically.
This trend, showing discussion about earthquakes, suggests that these disasters are blogged about with more geographic information. However, on the one hand the story is international and on the other, the earthquake in question is historical and being discussed in the context of hurrican Katrina.
The BlogPulse team believes very much in nurturing researchers in the area of blog analysis. This is why we made a data set of blog content available in conjunction with the upcoming Workshop on Weblogging Ecosystems. Recently, there have been a couple of projects launched that use either BlogPulse or Intelliseek data. One is BlogsLikeThis - a system that helps you look for blogs on certain topics. The second is a piece of art/data visualization called The Dumpster which is generating quite a bit of buzz. I find the conversation around the later to be quite fascinating. Having observed the effort required to create the data that backs the project, the superficial 'this is cool' commentary that it receives is, quite probably, the online equivalent of the few seconds an art museum visitor spends on each piece as they walk, without pause, through the Louvre.
The Superbowl, which finished just a few minutes ago, provides a great example of real time blog monitoring, as exemplified by BlogPulse's new BlogPulse Live feature. In the graph below, the line for Sports, which - like all the lines in the trend chart - shows the percentage of posts for the topic, has a clear rise after 10 - as the results were becoming clear.
A brief note on a very cool feature that has just gone live on the BlogPulse home page: BlogPulse Live. BlogPulse Live shows a time series of posting behaviour, updated every minute. Let me repeat that: updated every minute. This features is based on technology created by Robert Stockton - one of our blogosphere whizards.