I'm trying to put together a post on the reaction to BlogPulse's recent launch which includes a site makeover and the introduction of a number of powerful new features. However, in getting some comments together on the context in which the launch took place (IceRocket's pending launch of 'BlogScour', the visibility of two new services - BIRG and CustomScoop, discussion of Technorati's service, Umbria Communications site makeover - no doubt anticipating something new coming from them) I got held up reading this post from InterAdvocacy (profile).
Chip Griffin (found partner of CustomScoop) wrote of CustomScoop's trending demo:
Basically, it's a free service that let's visitors see how two search terms do in head-to-head competition in the online media (including newspapers, mags, blogs and more). Obviously, it serves as a teaser for the full CustomScoop service, but it also generates interesting results in its own right -- not unlike Intelliseek's Blogpulse Trends, except that we have a more balanced mix of coverage where they are exclusively blogs.
It is always encouraging to see how people are taking the notion of trend search (which BlogPulse popularized, though we can't claim to have invented the idea of the time series ;-) and applying it in different areas. However, the idea that you can take a set of different types of data and count results over them to produce some insightful and actionable intelligence has more than a few problems.
Firstly, absolute count versus normalized count. Showing a graph of absolute count can tell you little. You have to factor in the volume of posts on the day (or whatever the granularity is).
Secondly, the type of document being counted has a huge implication as to the interpretation of volume from that content source. A single post in the New York Times is quite different from a single post in a LiveJournal blog.
Trend graphs have great potential, but they have to be handled with care. In addition, the type of data being counted and the way in which you mix the results has to be very transparent otherwise the results can not be used. Yes, it is true that CustomScoop is getting data from a number of different sources, but there is considerable value in knowing that BlogPulse's graphs are exclusively over blogs.
It will be interesting to see how free, self service portals like BlogPulse and CustomScoop deal with the complex issues of multiple data types. These are already being handled by fee based products (like Umbria's Buzz Report and Intelliseek's BrandPulse).
You make several excellent points here. Mixing data types can be useful to see an overall volume, but you are absolutely correct that more information can be gained by separating out content types.
In fact, our full product does precisely that (and more). But as I noted in the excerpt you used, the free tool is basic and also acts as a teaser for the subscription product.
One point I would differ on is your argument against absolute counts in favor of something normalized against total content for a day. Let's say that there are 10 blog posts about CustomScoop every day last week. We would all agree that represents a steady stream of "buzz." But one day the London bombings happen and far more blog posts occur that day. A chart would show a drop in CustomScoop's activity on the blogs, even though the other content was completely unrelated.
In addition, by normalizing to volume, a service would be effectively representing that it covers the entire universe of blogs (or other media types). That isn't a claim that any service can make. And as they add more blogs to their indexes, the number of total posts increases, thus applying automatic downward pressure to the trend lines.
Obviously your research tells you different, so I'd be interested in more of your perspective on it since BlogPulse obviously went the normalized route...
Posted by: Chip Griffin | July 25, 2005 at 06:59 AM
Chip,
The normalization issue is not without its problems. To take your example a little further, let's imagine that there is a day when there are 11 posts on CustomScoop. What can you tell from that? If you didn't know that on that day, for some strange reason, 50, 000 new blogs were created as opposed to say 20, 000 on previous days, then you couldn't judge your market share. Another twist is that when something big happens (bombings, sports events, etc.) you tend to see a downward trend as bloggers have a limited capacity (a simple model would be one topic per blogger per day).
As for the notion of covering all blogs - that is certainly the goal. There are broadly two techniques to analysing blogs and bloggers. The top down approach says there are a number of bloggers out there that have particular influence, and to whom we ought to attend (this is BuzzMetrics model). The bottom up model says that we can mine information from the entire data set that will provide insights. Both models have pros and cons, however, if you have the later, you can always use your data to provide the former, but not visa versa. Consequently, the goal of getting all blogs is a core strategic decision.
Perhaps the ideal in terms of services like BlogPulse and your CustomScoop demo would be to offer a choice (absolute or normalized). In which case, you would you normalize your counts that are aggregated over many sources?
Posted by: Matthew Hurst | July 25, 2005 at 01:35 PM
I like your idea about letting the user choose. I see your point about judging market share. I guess since we come at it more from the PR/Public Affairs side, our clients tend not to look at media coverage as a "market share" thing like they would with sales.
My personal feeling is that market share in the media monitoring sense has a role to play ... but probably measured against a subset of blogs/media. To continue the example, if CustomScoop is mentioned in 6 out of 10 PR blogs, that's meaningful market share. But it may be that it's only 6 out of 10 million blogs, which would appear inconsequential, even though it may be best for the business to target certain blog "markets."
Which of course just goes back to your original observation that some sources matter more than others. We are working with some tools to account for this -- and the necessity for it to differ from client to client -- in a future release of our Enterprise Edition product.
Posted by: Chip Griffin | July 25, 2005 at 02:25 PM