December 27, 2007

Debugging BlogPulse

I love to use BlogPulse. I always get a kick out of seeing trends like this:

Blogpulsedebug1_2

Or this:


Blogpulsedebug2

These graphs often have a straightforward story behind them, allowing for a reasonable comparison between mentions of different words. Note, of course, that the graphs show the percentage of blog posts that contain a term, giving a normalized view.

But, what could explain something like this:

Blogpulsedebug3

Here we see a term 'movie' which appears to have some sort of seasonal trend, dipping in autumn and rising again in the winter. However, the term 'guitar' appears to have a very odd shape, with a dramatic and sharp increase in the winter. Looking at this term on its own, we see:

Blogpulsedebug4

There is a reason for this. If we look at links to weblogs published on MySpace, we see a matching pattern.

Blogpulsedebug5

The reason that there are changes in the number of blog posts which link to MySpace blogs is that BlogPulse (Nielsen Online) is adjusting its crawling strategy over time (the above suggests that there was an increase in July, a decrease in September and another increase in October).

So, while I continue to believe that BlogPulse, and the trending tool in particular, are very useful, one has to be careful (and informed) regarding the base data that these analytics are built on.

 

July 10, 2007

Live Earth Reactions

Some trends associated with the recent Live Earth concerts. Interesting to note the relative association between Live Earth and the bands, brands and celebrities that helped promote the event.

Bands

Bandsliveearth

The graph below suggests that Philips, via their promotion of efficient light bulbs, got the biggest jump. Note that the absolute numbers are pretty low, so we can't read too much in to this.

Brands

Brandsliveearth

Below we see that bands, obviously, get far more attention than brands.

Bandsbrands

Celebsliveearth

Finally, the main man:

Algore      

April 09, 2007

BlogPulse Founder Moves to Google

Some exciting news: Natalie Glance, whom I worked with at WhizBang!Labs, Intelliseek and BuzzMetrics, has taken a new position at Google's Pittsburgh location. Natalie was the driving force behind BlogPulse - a portal to the blogosphere. BlogPulse has been responsible for many innovations (some of which have become mainstream features in a number of other sites) and provided some analytical tools which illustrated the power of social media analytics.

BlogPulse was definitely a labour of love for all those involved. BlogPulse ultimately provided significant value to both our employer and to the blog community in general and it is a testament to Natalie's vision that this skunk works project has continued since its inception over 3 years ago.

Google is lucky to have Natalie on board, and I wish her all the best in her new position.

December 09, 2006

The News Width

One of the basic properties of old world programmed media was the fixed time slot. This was a particularly defining feature of the news. There had to be exactly 30 minutes (or whatever) of news to fill the slot. If that day was slow, the news was padded with light stuff, if that day was hot, then the light stuff was dropped as were other moderately important features.

You would think that that approach to media would be one of the first to drop online. We may, perhaps, be forgiving of the online presence of mainstream media - there is a fixed real estate on their front page. However, having become aware of the issue, I'm less forgiving of Web 2.0 aggregators including: memeorandum/techmeme, TailRank and even our own BlogPulse. Take techmeme for example. Stories are ranked according, in part (I'm guessing) to how many citations they get from which bloggers. However, on one day, a story with citations from A and B may appear on the front page whereas on another it may not - depending not on how important that story is absolutely, but how important it is relative to all other stories.

I say that we should be looking for interfaces to information that reflect how important that information is and which don't persist artifacts of the very media that we are (apparently) trying to escape.

November 12, 2006

A Well Stocked Bookshop

Blogpulse

May 21, 2006

An Inconvenient Truth

Technorati's tie in with Paramount to promote Al Gore's 'An Inconvenient Truth' appears now to be active. The news and blogs page contains snippets from blog posts about the movie. Interestingly, the posts appear to be ranked by something other than time (some model of importance/influence). Drilling down it looks that all of the posts mention the name of the film, some link to the youtube version of  the trailer, and others link to the home page of the documentary. As for the issue of dealing with appropriate content, one of the blogs currently linked to from Paramounts site for the movie goes by the title 'Americunt'. I certainly don't have any prudish issues with that sort of thing, but I wonder if the name is palatable for others. Here is a sample from the live page:

Climatechange

BlogPulse provides some additional perspective.

Algore1

Algore2_1


Note that the posts that didn't mention documentary or "an inconvenient truth" could mention other related terms like "global warming". What we would hope to see is the effect of this documentary on the conversation about global warming. Currently, that is not yet visible:

Globalwarming

It is interesting to note, however, that in terms of percentage of authors, usenet is the place to be for this issue - not blogs. The graph below shows the percentage of posts in blogs, boards and usenet which mention the phrase "global warming".

Globalwarming2


Finally, the PR News Wire service has distributed the press release for the Technorati, Paramount relationship, but I've yet to see it show up on the Technorati site. On the reciprocal end, Technorati is currently serving ads for Lions Gate's Peaceful Warrior - expect to see Paramount advertising in this space in the near future. As I mentioned earlier, Technorati recently revised their home page to give more real estate to adverts - in anticipation of the Paramount deal going live.

March 06, 2006

BlogPulse Live: The Oscars

Here's the BlogPulse Live graph indicating the burst of blogging around tonight's Oscars (found in the MoviesTV category):

Oscarslive

March 05, 2006

I Work For a Giant

It's interesting to see an article on BlogPulse's top news stories for yesterday about Intelliseek/BuzzMetrics (and by implication, BlogPulse). It's also interesting to hear that BuzzMetrics is referred to as a 'giant' in this space:

To capture the chatter, Nielsen BuzzMetrics, a giant in the industry, uses software that collects hundreds of thousands of comments a day. The technology can scan for specific companies, products, brands, people -- anything searchable. It can slice data into a range of categories to quantify the number of times a subject was discussed online, the individuals who mentioned it and the communities where it appeared.

Hundreds of thousands is probably an underestimate. Also the 'anything searchable' is an undervaluation of our technology. The expectations of search are far weaker than what true text mining, nlp and categorization technology can do. Anyway, nice article.

February 25, 2006

The A-List Delay

The blogosphere has made plenty of noise around the idea that it scoops main stream media. Personally, I don't believe this happens as often as some would have us believe, though it certainly does happen and often, as in the case of certain types of events like natural disasters, with clear impact and value.

I do believe, however, that there is steadily increasing delay in ideas getting picked up and amplified by the A-list. Of course, this is the type of claim that needs far more than the single point of anecdotal evidence that I'm going to point to, but the hypothesis suggests that, as the blogosphere matures, how it operates, and the role and influence of the A-listers is going to start mirroring much of main stream media.

On Feb 24th, Steve Rubel posted about What's Up? - a news/geolocation visualization. I had posted about this on Feb 14th after reading about it on the most excellent Infosthetics that same day. Looking back further, using BlogPulse's Conversation Tracker (o how I love thee), we can see that Peter Conolly posted about it on January 27th when it was being digged. It turns out that there are a number of different URLs pointing to the page, and so the earliest post I can find is actually from Jeroen Leijen, who posted on Jan 4th. Looking at the Alexa stats for the author's site:

Whatsup

shows us the digg day (Jan 27th) and possibly a couple of earlier days (late December and early January).

Searching on digg shows us that the site was put there by MilkAndCookies - I'm guessing related to the site which appears to have posted the link on Feb 8th after digging it.

Now, I'm not 100% sure that Rubel's post was the first A-lister to blog this (Technorati doesn't yet have Rubel's post as far as I can tell, and the highest ranking blogger for this link when using Technorati's rank by authority is Infosthetics). However, if we follow the story, it shows that Rubel picked this up a couple of months after it was launched, and about a month after it was digged. This is not really a criticism of the system, more an observation and a heads up about how to use A-listers in your reading habits. What I would criticise is that when something like this does surface, the commentary is not really interesting or insightful. Rubel gives a 'isn't this cool' post and fails to link to or compare with other similar services. The whole notion of citizen journalism surely implies something more than passing links around - don't these people have something to say?

February 19, 2006

Where's the Snow?

On the 12th of February, New Yorkers received the biggest dump of snow the city had every experienced. Looking at the blogging activity around this event we can see a clear peak reflecting that fact. In addition, there is another larger peak. Memory being what it is, I wondered where that other snow fall had hit. However, as you can see from the trend line for New York, when people blog about the weather, they don't seem to explicitly state where it is. People in New York don't say 'it's snowing in New York', they just say 'it's snowing'. Only a small percentage actually provide both the meteorological  and geographic information in the text. This means that, yes, knowing where people are located is a key dimension in analysing online data - otherwise, when the aliens land, we won't know where they hell they are.

Actually, the fun way to view this is to consider how one picks up the smaller signal with the intersection of weather and location information automatically.

Snow_1

This trend, showing discussion about earthquakes, suggests that these disasters are blogged about with more geographic information. However, on the one hand the story is international and on the other, the earthquake in question is historical and being discussed in the context of hurrican Katrina.

Earthquake

May 2008

Sun Mon Tue Wed Thu Fri Sat
        1 2 3
4 5 6 7 8 9 10
11 12 13 14 15 16 17
18 19 20 21 22 23 24
25 26 27 28 29 30 31
Blog powered by TypePad