In the early 2000s, after the collapse of WhizBang!Labs, I joined Intelliseek as part of the group of researchers who formed the ARC. In a fit of hacking and inspiration, we created a simple blog tracking site called BlogPulse. We actually launched the site without informing our bosses, which led to some interesting dialgoue.
Back in those days, blogs were thin on the ground (I remember first hearing of them from an early advocate - Fernando Pereira). Instead of crawlling RSS feeds, we simply crawled the front page of the blog and did a diff on the content to extract the new material. We then took that content and ran it through some simple but (generally) elegant keyword and phrase extraction technology to determine, on a daily cadence, the hot topics that people were spending their attention on.
The internet archive's first capture of the site is dated Apirl 8th, 2003. Keyphrase from that day were as follows:
On that date, we crawled 31, 926 blogs, which contained 12, 954 new entries.
BlogPulse went on to innovate with what at the time was possibly the most popular time series presentation of search results (I don't believe we were the first, but we were copied, in at least one case with an exact replica of our HTML showing up on a competitors site).
Time series of topics, keywords extraction and trending, they are both still with us and going from strength to strength. Of course, the media has changed, though I find acclamations of the death of blogs to be generally exaggerated.
It is interesting to note that unrest in the middle east is still a key focal point of all media. It is ironic that as the frequency at which we live our lives, and the frequency with which we produce, consume and mine social content continues to increase, the big picture topics are constant: free speach, british troops, american troops, rubber bullets, ...