Nathan Gilliat (o excellent blogger) posts about BuzzLogic's new partnership with KDPaine which will deliver sentiment scores to BuzzLogic's clients. There are a number of approaches to delivering sentiment analysis including many automated approaches and some manual ones. The customers are still skeptical of automated methods and generally more comfortable with manual methods. Paine writes:
"Computers can do a lot of things well, but differentiating between positive and negative comments in consumer generated media isn’t one of them,” explained Katie Delahaye Paine, CEO of KDPaine & Partners. “The problem with consumer generated media is that it is filled with irony, sarcasm and non-traditional ways of expressing sentiment. That’s why we recommend a hybrid solution. Let computers do the heavy lifting, and let humans provide the judgment."
This kind of statement is particularly unhelpful. Let's break it down.
Why Do Sentiment Analysis?
There are a number of reasons for doing sentiment analysis. Firstly, to track the ups and downs of aggregate attitudes to a brand or product. Secondly, to compare the attitudes of the public (that is, of course, the blogging public) between one brand or product and another. Thirdly, to pull out examples of particular types of positive or negative statements on some topic.
The Challenge of Automated Approaches
Sentiment can be characterized as a triple of <author, polarity, object>. Sentiment analysis in addition to figuring out the direction of the sentiment needs to associate the evaluative language with a target and the whole statement with a speaker. Many automated methods do weak jobs of these tasks if they attempt them at all.
In addition to these association tasks, the basic problem of dealing with sarcasm and so on, as Paine rightly states, is hard.
The Challenge of Manual Approaches
While customers are often more comfortable with manual approaches, this comfort is not always well founded. Manual approaches often have to sample (to me, one of the key propositions in the space is being able to listen to every statement that is made, not a tiny fraction). Sampling is hard as it relies on comprehensive data acquisition at least. In addition, you may well be surprised to see the agreement rate between different human labelers. The literature reports agreements rates as low as 40% in some cases!
Application Details
If you are attempting to track the ebb and flow of sentiment, it is very likely that automated methods are fine as aggregate analysis can often be robust to a certain amount of measured error. If you need to surface individual positive or negative comments, you want to make sure that they really are positive or negative in which case using the confidence scores often available with machine learning approaches can be used to rank remarks (though this does introduce an unknown bias). It should also be noted that while there are many obvious problematic areas, the distribution of these problems needs to be understood before a solution that does better or worse on individual examples is evaluated.
In summary - the challenges of the space are more complex than Paine's statement suggests and this comment is likely more a marketing strategy to support comparative statements that all vendors in this space need to make to distinguish themselves from the competition. There is still research to be done in this space (and it is good to see companies like Cymfony hiring scientists in the field of Computational Linguistics) - and the game certainly isn't over yet!
I have to agree whole heartedly with what you have said about human classification versus automated classification of sentiment.
In this field there is often the discussion of human vs. automated. With the volumes of data involved, using human classification can lead to a very expensive solution, and as you point out the accuracy of different labellers varies greatly.
Using our automated approach we hit on about 80% accuracy, which will get better with time as we retrain our machine learning algorithms. The incorrect classifications we see are generally due to sarcasm or miss spellings.
We always let the end user view the classification and the text too, so they can very quickly see the underlying data. This allows us to show a good high level view of overal sentiment on their brand, while also allowing clients to see individual comments.
We don’t feel that the accuracy of this automated approach is much different to a human based approach, and the cost savings allow us to bring www.sentimentmetrics.com to Market way below the competitors.
Thanks
Leon
Posted by: Leon | December 10, 2007 at 11:04 AM
Hi Matthew:
Great summary - and I couldn't agree more. Accurate, useful and meaningful sentiment scoring is hard - and complex.
Doing a good job of automating it requires a hybrid approach - man + machine.
More here:
http://humanvoice.wordpress.com/2007/12/10/sentiment-detection-mining/
TO'B
Posted by: Tom O'Brien | December 10, 2007 at 02:44 PM
If Microsoft doesn't trust computers to read for tone and sentiment, why should anyone else? They require Cymfony to use human coders for all their analysis. The answer is not automated vs human, but a combination of the two
Posted by: KD Paine | December 10, 2007 at 03:58 PM
I agree with the hybrid approach... or am I being sarcastic right now? :)
Posted by: the constant skeptic | December 10, 2007 at 08:19 PM
I would suggest that there is a distinction between expressed authorial sentiment (the manifest measure outlined here) and the latent measure of the sentiment that individual texts help to build among members of various social media communities. Any automated process would be limited to either the manifest conceptualization or a latent patterned measure (which would require well-tested operations), while a latent projective measure, used widely to approach constructs that rest in a community's or person's experience, is still best approached through raters trained within an acceptable level of intercoder reliability.
I certainly agree that crude agreement of .4 is blatantly unacceptable (as all crude agreement is, I prefer Cohen's kappa), but the mistakes of the few, in that regard, cannot be considered a stain on content analysis methodologies that are used throughout the social sciences. I tend to agree with Neuendorf when it comes to latent measures--when instructing coders to count chairs, we can rely on their existing operations of what a chair is, rather than risk measurements that are made invalid by them following the letter, rather than a spirit, of a patterned law.
The question of volume is quite a different matter entirely, but can also be addressed through representative sampling, if general brand sentiment is the research question in mind, as long as findings are generalizable within an acceptable confidence interval.
A definite hat tip to everyone on the forefront of computer-aided text analysis. Its incorporation into content analysis is welcome, and I for one am looking forward to the release of Diction 6.0.
Posted by: Peter Kowalski | December 11, 2007 at 02:01 PM
A very interesting post (along with its comments of course) on the long-debated sentiment analysis of media articles or blog posts. Actually the debate did not wait for the advent of the web 2.0 to take place.
It has been mentioned above, human sentiment scoring of large data sets relies on the different mental approaches of the individuals doing the analysis. Hence different interpretations for similar contents. This difficulty can however be partially overcome with the help of rigorous analysis grids, like questionnaires, established by the study manager for the case at hand, along with their clients for instance. With such grids, giving guidelines as to how to interpret an article filled with sarcasm, quotations, understatements and so on, the analyst is not left sitting alone in front of the articles awaiting analysis. Furthermore, media analysis (and monitoring) having become a true profession, with dedicated experts, media analysts tend to finetune incrementally the way they interpret articles, based for instance on the language and the culture of the country where they are operating - as a corollary thereto, foreign outsourcing of media analysis should be done with great caution.
As far as the Internet is concerned, I actually think sampling is rather a good approach towards CGM analysis. Indeed, in the incredibly vast sea of articles and posts on a given topic, only few of them actually emerge to the surface, being visible for the human eye - and producing an impact on the human brain. The question is not whether sampling should be done or not, but rather how it should be done: how do you determine which posts are visible, or have authority if you will, on a given topic. At the end of the day, that's what brands will care about. What's being said that could have an impact upon my target audiences. Of course, the surface of opinions rests upon the various depths of less visible opinions sitting right below, hence the interest to analyse it as well. That's where our hybrid approach, human+machine, really makes sense.
Posted by: anham | December 13, 2007 at 09:32 AM
I was looking into this a few months ago, but wasnt able to get good literature on this. Any recommendations for algos in NLP/ML for this field ??
Posted by: sdey | December 14, 2007 at 10:48 AM
Interesting post.
Shouldn't (author, polarity, object) be: (author, time, polarity, object) - as sentiment is not necessarily static. Perhaps it depends on the resolution of capture w.r.t to time, however for individuals sentiment can change drastically in a short space of time, being able to map that could definitely be useful. Add to that correlation of timelines and it may indicate who is talking to/reading who (not all bloggers link to sources).
Thoughts?
Posted by: indo | January 06, 2008 at 07:33 AM
could any readers send me info or links to rivals or alternate, better types of service providers than sentiment metrics who offer a downloadable tool for our account management people to work with.
My e mail is [email protected]
Posted by: charlie salem | March 27, 2008 at 09:13 AM