Geoffrey Wiseman, in commenting on Positive and Negative National Influence, points to Sucks/Rocks - a site which uses search counts from Yahoo! to measure sentiment for search terms. Sucks/Rocks gives a score from 0 to 10 by normalizing the counts. Opinmind, is system which provides a search interface to blogs to provide a similar measure. I thought it would be interesting to compare the two. The chart below shows the Sucks/Rocks scores on the X axis and the Opinmind scores on the Y (I've translated Opinmind from a percent to a 0-10 score). The points are for the countries mentioned in original post.
Sucks/Rocks uses a simple approach to measurement - it looks for a small set of expressions:
When you enter a search term, sucks/rocks searches the web for several positive and negative phrases using that term. The score is the fraction of positive results to the sum of positive and negative results, normalized to 10.
The negative phrases are: X sucks, X is lame, X is crap, I hate X.
The positive phrases are: X rocks, X is sweet, X is awesome, I love X.
Opinmind, probably uses something a little more sophisticated (certainly their set of positive and negative terms and the way in which they are associated with the search term are both more advanced).
It is interesting to note how different the results are.
Here are some examples of false positives from Opinmind:
- We recommend that the Regulations for Great Britain make clear that the prohibition on discrimination applies to the curriculum and thereby avoid the considerable uncertainty to which the Northern Ireland Regulations have given rise on this question. [positive]
- I'm with Scamp: this little 3 liner from the Miss Great Britain organisation must be one of the best press releases ever ( which is why I'll repeat it here )..... [positive]
- In Japan it is considered rude to talk with mindi in your mouth. [negative]
-
they might go undefeated even tho that fuckin pitcher from Japan fuckin sucks. [negative]
These examples illustrate some basic problems with an index/search based approach. Firstly, term ambiguity: 'that pitcher from Japan' is not about the nation Japan, 'Miss Great Britain' is not the nation Great Britain. Secondly, association: 'In Japan it is considered rude to talk with mindi in your mouth' - the negative term 'rude' is being associated with the act of talking with your mouth full of mindi (whatever that is), not with Japan.
The association problem, in my opinion, requires at least grammatical analysis of the text and, in the big picture, probably discourse analysis.
However, it should be noted that when the goal is to provide a measurement of sentiment, it is entirely possible that a reliable system may be built that accommodates this type of error via a deeper understanding of the distribution of expressions, term ambiguity and association ambiguity. That being said, relying on knowledge of such biases is likely to lead to measurements that cannot be transfered from one domain to another, and so better approaches to the core sentiment mining problem cannot really be avoided.
This is going to be an interesting year for sentiment mining with at least one new company likely to appear - more on that later!
In some ways, that's the strength of sucks-rocks; it doesn't attempt something sophisticated, so while it certainly leaves behind many positive and negative comments that it doesn't understand, those it does understand are mostly unambiguous.
(Although it's not hard to construct a phrase that would fool sucks-rocks: "I can't say that I love Germany" might well come off in Germany's favour).
Posted by: Geoffrey Wiseman | March 09, 2007 at 09:34 AM
Sucks/rocks' queries are laughably naive - so far, we've just been trying to keep it afloat in the face of the search engines' tiny query limits. One of the worst offenders is its namesake - searching for "x rocks" is very problematic, and much less reliable than searching for "i love x" or "x is awesome". This screws up searches for "apache", for example, because it brings up many results for an album named "apache rocks". The most sophisticated thing sucks/rocks does is blindly search for both the singular and plural forms. It will search for "x rocks" and "x rock", "x is awesome" and "x are awesome", etc.
We actually talked about analyzing the distribution of expressions a bit while we were writing it. For example, we could analyze many known-good search terms and determine that the ratio of "x rocks" to "i love x" should be about 1:8 (a completely fabricated example). When we see that the ratio for "apache" is actually 50:1 (which it is), we could throw out "x rocks" as a search term.
Unfortunately, doing this requires making *many* more queries. Right now, we OR all of queries into one big positive query and one big negative one ("x rocks" OR "x rock" OR "x is awesome" OR ...) Since we're already butting up against our search limits, we can't afford to do every query separately to improve the quality of our results. If SOMEONE (*cough*Yahoo) would respond to our request for higher search limits, we might be able to do it. :)
Posted by: Gary Bernhardt | March 09, 2007 at 10:43 AM
Great post, Matt! There is definitely more than what you see on Opinmind.com. Our advanced query interface is not currently available publicly. More on that later.
Thanks
- Charles
Posted by: Charles | March 09, 2007 at 02:34 PM
If i write that Sucks/Rocks sucks does its value go down? (as in it just did?) Even though I secretly like it.
Posted by: Jeremy Kandah | March 09, 2007 at 04:16 PM