A logical continuation of my comments on Sifry's State of the Blogosphere post is to make some proposals regarding what would be acceptable observations to make about the blogosphere. Firstly, we can consider the things that we would like to know:
- The number of new blogs created per day.
- The number of posts published per day.
- The number of blogs which have updated at least N times within the last K time periods of duration D. For example, the number of blogs that have published posts 2 times per week for the last 10 weeks.
Secondly, we can consider the observations that can be made:
- The number of new blogs discovered per day by some system (e.g. Technorati or BlogPulse).
- The number of posts harvested per day by some system.
- The number of blogs which meet the post rate criteria according to some up-to-date index.
There are two key points here. One is the distinction between the true numbers that we could report if we had perfect visibility into the blogosphere and the observations made according to looking inside some index - this is the difference between the two blocks of points above. The second point is the proposal of some metrics that are transparent and useful rather than accumulations of historical data (as I described in my previous post).
The interesting part - the science - is figuring out how to take observations and project these with some confidence to predict the desired measurements.
Note comments that Kevin Burton (TailRank) has made regarding Sifry's claims.
Don't forget the tricky bit: filtering out auto-generated blogs that just steal random text from other blogs to generate google-juice...
Posted by: Bug | August 09, 2006 at 04:56 PM