Marketing Vox posts about Umbria's commentary on the state of the splogosphere:
Though some 80,000 blogs are being created every day, 10-20 percent of them may be spam, according to Umbria Communications, which monitors consumer-generated media, reports AdWeek. Umbria's research has found that 2.7 million blogs out of 20.3 million are spam blogs, or splogs, many of which are created solely as a shady marketing tactic, using stolen web content, often via RSS feeds, to profit from contextual ad programs.
Umbria examined blog search results in October from Technorati, IceRocket and BlogPulse, finding that on average 44 of the top 100 results were splogs.
There are some subtleties in here that are being used to confound the reader. Umbria is making a claim that 10-20 % of the global data is spam. They are then implying that 44 % of the results in the listed search engines are spam, suggesting that the search engines are effectively amplifying the problem.
This is a pretty ugly way to go about negative marketing. There is an important piece of missing information: they don't disclose any details of the search terms used. Did they search for 'real estate'? or 'yoga for toddlers'? Were the searches representatitve of search user behaviour? - I'm guessing not as Umbria has no consumer facing search application.
Umbria sells enterprise analysis applications over blog and other data, as do the most of the blog search engines cited. The processes involved in delivering quality results to such clients is quite different from those relating to supporting a consumer facing application like BlogPulse. In blog search engines, the primary dimension of ranking results is time, not relevance. Standard web search, where relevance is the primary ranking dimension, allows spam to simply fall to the bottom of the relevance ranking - never to be seen by the reader. Conflating time and relevance is a challanging problem. Determining the relevance of a post globally in a large-scale, data driven analytical engine is quite different and so the exposure to spam for customers of such systems is entirely different to exposure in time ranked blog search.