I've recently been playing around wth a simple news aggregation project (partly inspired by the most excellent backstage from the beeb). I'm currently gathering news from about 700 feeds (finding feeds is interesting, but picking the right feeds is hard). Part of this is going to feed into work I am doing on geocoding, but for now I am simply gathering data and indexing it in Lucene. Here is a graph comparing the volume of posts from the BBC and from the Scotsman. Note that I am only taking the BBC's feed for most recent posts, but am taking all of the Scotsman's feeds, so it is not quite a comparison of like with like.
The granularity of the data is hourly, so the graph tells us that the BBC data is updated far more frequently than the Scotsman's. It may look like the Scotsman has more posts but as there data is less regularly updated and as the graph uses lines to link consecutive points, this is not actually the case.



interesting.
Posted by: Ryan | July 11, 2005 at 04:46 AM