TechCrunch describes an interesting interaction between a Technorati user and Technorati in which it appears that the blog search company has elected to archive data that is more than 6 months old. Most of the discussion around this appears to centre on the impact that might have on search, however there are a couple of more significant aspects.
Firstly, historical data is important for computing statistics and features of authors. Of course, one can create a theory of influence which only requires 6 months of data though this is not going to help you do longitudinal research.
Secondly, if Technorati really is in the business of deriving business and marketing intelligence from their huge weblog archive, they will certainly need access to this historical data. A big part of this analytical space is the ability to compare some current phenomenon (e.g. a new product launch) with others in the category - several years of data are required for this.
We have the full historical archive for Spinn3r users. We're working on making it ONLINE at any given moment.
I can certainly sympathize with the guys over at Technorati as this much data is HARD to keep always online.
It's a bit harder on Trati as they're a search engine where we're not.
Kevin
Posted by: Kevin Burton | November 06, 2007 at 07:31 PM