Glenn Fannick of Factiva posts about the long tail of the blogosphere:
Factiva Insight is offering content for our clients to mine from nearly 2 million of the most-read blogs (among lots of other content). We often have the discussion about the value of continuing to add more blogs from the end of the long tail. One argument goes that adding blogs that are read by EVEN FEWER people can't be good business. (It costs money to process more content and there has to be a point of diminishing returns.) But at least one person, Chris Anderson, who has authored a blog and a new book called The Long Tail, and an article by the same name in Wired, would disagree. He argues that there's gold in that tail. (Listen to him on NPR's ATC.)
Models like Netflix and Amazon show that there is money to be made when you continue to add lots of esoteric content because there are buyers for seemingly every last title. (Something like 98% of Netflix's huge list of titles have been rented at least once.)
While I respect Glenn's appreciation of the long tail phenomenon, I think this post is somewhat confused. I find it convenient to divide the blogosphere up in to 2 major categories: the head, which contains blogs like BoingBoing, Slashdot and probably a few thousand others; and the rest. When someone from the head posts about a bad experience with Dell, Dell needs to read the signal and react. This is a matter of influence.
The mining application of the rest is somewhat different. There are two important areas: early rumour detection, alerting etc.; and aggregate mining. It is of huge value to get all those facts about why people dislike Hummers, or like Pepsi, or Coke or what have you - the more the merrier, there is no diminishing return. The more data, the better the analytics. It is why we can produce charts like the following and make assertions based on them
Chris Anderson talks in his book about many examples of long tails. There is a particular kitchen gadget that is usually only sold in 2 colours - the company discovered that online there was demand for 50 colours. A long tail of colours. The blogosphere is not like that - there is no analogy. There is not a ranking of bloggers going off into the sunset, each representing a different metaphorical colour of blog, each with its own niche consumer (aka reader). It is a network with participants, structure and complexity.
This confusion is also why I believe there is a problem with the iconic 'long tail' distribution: see my earlier post on the topic.