A couple of posts provoke an interesting discussion: William Cohen points to the issue of the popularity contest approach to ranking which may have undesirable consequences; The Measurement Standard provides an interesting angle on the fallibility of human judgments. In the area of sentiment analysis, one often hears skepticism around the quality of results (or even the possibility of automating this task). It is always informative to see how well humans do at these tasks (most reports in the literature of inter-labeler agreement for sentiment are pretty poor). My feeling is that while an automated approach can never be error free, the systematic nature of the errors lead to a more manageable result than the randomness of human error generated by poor methodology.
As for the issue of automated ranking of web pages. The problem cited above exposes the frailty of addressing a content problem (finding a document whose text is appropriate) via an orthogonal structural solution. The structural solution (counting links and propagating results) may do well in some domains where it is regarded as a proxy for measurements of 'authority', however, the ambiguity in the structure cannot be determined, leading to the type of problem William cites. This is where solutions like Powerset come in.
Note: I'm really happy to see that William is writing - grab his feed!
I don't agree - i see a random inter-labeler error as rebalancing bias in the classification. A sysematic error retains that bias towards incorrect results. Systematic error is fine if we know the nature of the resultant bias - but randomness is better if we don't know it. I don't know enough maths to prove this - perhaps someone can?
Posted by: Nathaniel Faery | July 06, 2007 at 05:34 AM
Consider the case where a client wants some sentiment data over time. In the first period, labeler A reports a certain result. In the second period, labeler B reports a result. Due to the difference in labeling, there is no real continuity between these periods, and any observation that sentiment is trending up or down is not really reliable.
Posted by: Matthew Hurst | July 06, 2007 at 09:08 AM
This reminds me of another story. I was playing with a French version of SO-PMI (Turney and Littman, 2003), which rates a French word as positive (praising) or negative (criticizing) by measuring its co-occurrence with seven positive French paradigm words and seven negative French paradigm words. I discovered that the word "fort" (the masculine form of "strong") is highly positive, whereas the word "forte" (the feminine form of "strong") is highly negative. Could be just random noise, I suppose, but it seemed revealing to me.
Turney, P.D., and Littman, M.L. (2003), Measuring praise and criticism: Inference of semantic orientation from association, ACM Transactions on Information Systems (TOIS), 21 (4), 315-346.
Posted by: Peter Turney | July 06, 2007 at 11:25 AM
Thanks for the link, Matt - but it's not just a content problem, it's also a problem of understanding the user's information needs, and deciding what is appropriate to present. Even if I perfectly understand the content of every web page, it's still not obvious what the right answer to the query should be. For the query "jew" do you present the most authoritative source on Judaism, as judged by the world as a whole? or the most most authoritative source on Judaism, as judged by the world of anti-semites?
BTW this isn't an unique case. Try Googling "Hellary Clinton" and note the polarity of the top ranked documents.
Posted by: William Cohen | July 06, 2007 at 12:53 PM
I think it raises an interesing question. For instance, I work at Fair Isaac and we get lots of criticism of credit scoring. Yet credit scores replaced human judgment and meant that good credit was determined by your behavior, not how well you knew your bank manager or what color your skin was. People say they want humans to make judgments and not machines but I think that's more an emotional reaction than anything. Automated decisions can offer a lot of consistency and precision but we should not underestimate the need to get people to understand and support the value as their gut reaction is to dislike "machine-made" decisions.
JT
Author, with Neil Raden, of "Smart (Enough) Systems" a book on this topic.
http://www.smartenoughsystems.com
Posted by: FICO | July 06, 2007 at 04:44 PM