TechCrunch describes an interesting interaction between a Technorati user and Technorati in which it appears that the blog search company has elected to archive data that is more than 6 months old. Most of the discussion around this appears to centre on the impact that might have on search, however there are a couple of more significant aspects.
Firstly, historical data is important for computing statistics and features of authors. Of course, one can create a theory of influence which only requires 6 months of data though this is not going to help you do longitudinal research.
Secondly, if Technorati really is in the business of deriving business and marketing intelligence from their huge weblog archive, they will certainly need access to this historical data. A big part of this analytical space is the ability to compare some current phenomenon (e.g. a new product launch) with others in the category - several years of data are required for this.