Nathan Gilliat (o excellent blogger) posts about BuzzLogic's new partnership with KDPaine which will deliver sentiment scores to BuzzLogic's clients. There are a number of approaches to delivering sentiment analysis including many automated approaches and some manual ones. The customers are still skeptical of automated methods and generally more comfortable with manual methods. Paine writes:
"Computers can do a lot of things well, but differentiating between positive and negative comments in consumer generated media isn’t one of them,” explained Katie Delahaye Paine, CEO of KDPaine & Partners. “The problem with consumer generated media is that it is filled with irony, sarcasm and non-traditional ways of expressing sentiment. That’s why we recommend a hybrid solution. Let computers do the heavy lifting, and let humans provide the judgment."
This kind of statement is particularly unhelpful. Let's break it down.
Why Do Sentiment Analysis?
There are a number of reasons for doing sentiment analysis. Firstly, to track the ups and downs of aggregate attitudes to a brand or product. Secondly, to compare the attitudes of the public (that is, of course, the blogging public) between one brand or product and another. Thirdly, to pull out examples of particular types of positive or negative statements on some topic.
The Challenge of Automated Approaches
Sentiment can be characterized as a triple of <author, polarity, object>. Sentiment analysis in addition to figuring out the direction of the sentiment needs to associate the evaluative language with a target and the whole statement with a speaker. Many automated methods do weak jobs of these tasks if they attempt them at all.
In addition to these association tasks, the basic problem of dealing with sarcasm and so on, as Paine rightly states, is hard.
The Challenge of Manual Approaches
While customers are often more comfortable with manual approaches, this comfort is not always well founded. Manual approaches often have to sample (to me, one of the key propositions in the space is being able to listen to every statement that is made, not a tiny fraction). Sampling is hard as it relies on comprehensive data acquisition at least. In addition, you may well be surprised to see the agreement rate between different human labelers. The literature reports agreements rates as low as 40% in some cases!
Application Details
If you are attempting to track the ebb and flow of sentiment, it is very likely that automated methods are fine as aggregate analysis can often be robust to a certain amount of measured error. If you need to surface individual positive or negative comments, you want to make sure that they really are positive or negative in which case using the confidence scores often available with machine learning approaches can be used to rank remarks (though this does introduce an unknown bias). It should also be noted that while there are many obvious problematic areas, the distribution of these problems needs to be understood before a solution that does better or worse on individual examples is evaluated.
In summary - the challenges of the space are more complex than Paine's statement suggests and this comment is likely more a marketing strategy to support comparative statements that all vendors in this space need to make to distinguish themselves from the competition. There is still research to be done in this space (and it is good to see companies like Cymfony hiring scientists in the field of Computational Linguistics) - and the game certainly isn't over yet!