Taking the domains of all the posts and sorting by posting frequency according to the 24 hour dataset, the top 20 domains are as follows:
146920 msn.com
62179 blogspot.com
53041 search-now130.com
40225 findallarticles.com
38509 search-now140.com
28478 look4articles.com
23049 findactions.com
13375 find4news.com
4472 wefindforyou.com
3919 livejournal.com
2633 persianblog.com
2604 blogdrive.com
2232 blogfa.com
2001 cocolog-nifty.com
1627 typepad.com
1141 kotobabooks.com
989 canalblog.com
987 choose-champagne.com
665 myblogsite.com
639 blog.com
It is interesting to see how the simple time series provides some pretty clear insights. Here is a graph showing the posting frequency for the total data set as well as those for msn.com, blogspot.com and findallarticles.com. Where as msn.com (blue) and blogspot.com (green) follow the global trend, findallarticles.com (red) shows a very steady flat trend. Not surprisingly, this is a spam blog. Note also how there are a couple of outage points for the spam blog which are reflected in the global time series as dips in the data.
It is also remarkable that blogspot's posts are far less organized than msn's:
I'm not going to guess why that is just now, but it is interesting to say the least. In fact, looking at this graph suggests that another interesting graphic is going to be the timeseries of posts with msn's removed to see how much it accounts for the overall trend.
Well, as far as I know most MSN pings reflect the actual posting time of a blog entry, photo, or list entry. Blogspot pings, on the other hand, are often manual pings coming from spammers, and bear no relation to the actual time of a post. More than 25% of all blogspot pings are for old posts (more than 30 minutes). I don't think this is the software acting up, but rather that folks are randomly repinging old posts. This might or might not account for some of the oddity that you are seeing.
Posted by: Robert Stockton | August 04, 2005 at 11:31 PM
Based on your comment, it sounds like we could rate the quality of a host based on the coherence of the distribution of pings as well as the general trend. This hides the fact that spam on host delivered pings (i.e. non manual pings) wouldn't show up unless it didn't fit the general trend, but I believe it is worth thinking about. In addition, looking at the periodicity of pings might give a clue as to the robot/human nature of the blog post. Okay - better stop there before we say too much!
Posted by: Matthew Hurst | August 05, 2005 at 09:27 AM