A while back, I wrote about how dangerous trend mining over blogs could be in the wrong hands. Stephen Baker writes something similar in a recent post, but he gets the message completely wrong. Stating that blog analytics are weak is not helping anyone. All that is being said in the post is that keywords offer no insight to or protection against the inherent ambiguity of words. Consumer facing search engines treat documents as objective, surface form data: sequences of characters forming words and phrases. Ambiguity is observed when the intent to express two different meanings results in the same sequence of characters being used. Or, to put it the other way around, a word has two meanings.
This is not a new problem, and any competent text mining outfit is going to take care of this issue. That is why any company providing non-trivial analytics over blog data - or any other data for that mater - has already solved this problem. The reason it is not yet visible in the consumer facing search space is often attributed to the lazy user problem. However, I believe it is more to do with the lazy developer problem - not looking for the right interface. Vivisimo is currently leading the way here with explicit representations of ambiguity in the form of clustering analytics layered on top of search results. Google's take on the problem is to make the underlying search engine so powerful you won't need to worry about this type of problem. This is fine if you are prepared to await the arrival of Deep Thought...