Sentimetrix: Sentiment Analysis Company
I've been keeping an eye out for information on Sentimetrix for the past 6 months or so. Today I came across this pdf document from UMaryland which gives a little bit of concrete information about the company:
With a University of Maryland College Park technology spurring the idea, a group of local businessmen and scientists have launched a new company that measures a tough to quantify concept – worldwide opinion.
SentiMetrix, of Bethesda, Md., was launched in fall 2006 by university professor V.S. Subrahmanian, graduate student Diego Reforgiato, and two businessmen who spent time with online giant AOL LLC, Vadim Kagan and Michael Rozenman. The company is based on Subrahmanian’s Opinion Analysis SYStem. commonly known as OASYS. It was developed at the University of Maryland Institute for Advanced Computer Studies, and is based on a series of complex algorithms.
“Recent surveys show that the marketing research market, including opinion research and brand monitoring, is growing rapidly, as new technologies get applied to electronic media, both professionally and consumer generated media,” Rozenman said. “We have started SentiMetrix because we believe that the OASYS technology is the best response to what these markets need today: sentiment tracking in multi-lingual data, done in a timely, cost effective way.”
OASYS, which was a finalist for the OTC Invention of the Year Award, is capable of tracking the media on the Internet in many languages, measuring the intensity of sentiment expressed on a variety of subjects. For this reason, it is unique, the businessmen say: most programs detect just “polarity” on a subject (like/don’t like, for example), and most are not multi-lingual.
So far, the founders said they intend to build “an extensive data collection operation” for traditional and consumer-generated media, including mainstream news, blogs and message boards, starting
first with English-language sources. They will then move on to other languages, starting with the most frequented Web sites.
V.S. Subrahmanian
A SentiMetrix customer will use OASYS through a controlled access Web site, with a search engine-like interface available to run queries. A visual representation and quantitative data is available, and a free Web site with limited options will be developed as a market tool, Rozenman said.
Thus far, OASYS has won Computerworld Magazine’s 2006 Horizon Award, which goes to the most innovative pre-commercial technology.
Opinion analysis program leads to new start-up company
The most interesting paragraph in this text to me is (my emphasis):
OASYS, which was a finalist for the OTC Invention of the Year Award, is capable of tracking the media on the Internet in many languages, measuring the intensity of sentiment expressed on a variety of subjects. For this reason, it is unique, the businessmen say: most programs detect just “polarity” on a subject (like/don’t like, for example), and most are not multi-lingual.
Much of the published work on sentiment/opinion fails to really define what sentiment or opinion is (taking the machine learners path of least resistance: a data set, an algorithm and a result). As a customer, I'd firstly want to know what their precise definition of sentiment is and then how they measure intensity. In addition, there are many types of expressions which while not opinions or sentiment still convey important information about topics and products. For example 'my Hummer broke down' isn't a subjective, opinionated statement but closer to an objective reporting of facts. It is still an important class of statement to capture as it reflects on the quality of the product.
One of the challenges of creating a single valued metric for sentiment or opinion is that equally mixed (aggregated) opinion tends to score the same as neutral opinion. In the simple case, if you have 1 person expressing a strongly negative opinion and another expressing a strongly positive opinion, then the aggregate may be some number in the middle of the range (say, 0 in a -1 to +1 scale). However, let's say that for a topic there is no expression of opinion - that too would score a 0. From the description provided here of the system, it seems that OASYS may suffer from this problem.
There are a number of publications available describing the research in OAYS.



You really want two scales, amount of positiveness and amount of negativeness. Critically, you need to know what they're about.
Most of the reviews we've looked at had elements of both, often in the same sentence, often qualified by product and feature. We found things like "loved the monitor but hated the keyboard on my new powerbook". This statement's about the powerbook, but the sentiment is about two different aspects of it, the monitor and keyboard. Overall, the review's about some powerbook (we don't know which one) and expresses both strong positive and strong negative sentiment.
Posted by: Bob Carpenter | May 22, 2007 at 11:18 AM
Per Bob's comment, you may recall University of Washington's Professor Oren Etzioni's presentation of Opine. This seemed like a good way to deal w/the nuances of product reviews. Here's how they describe Opine:
"OPINE is an unsupervised information extraction system which mines product review data in order to build a model of important product features, their evaluation by reviewers and their relative quality across products. First, the system automatically identifies product features, both explicit and implicit. For example, the sentence "Our room's temperature was just right" mentions the explicit feature "RoomTemperature" whereas the sentence "The hotel is ridiculously expensive" refers to the implicit feature "HotelPrice". Second, the system identifies opinions regarding product features and establishes their polarity. For example, "fantastic" is a positive opinion, whereas "disappointing" is a negative opinion. Finally, the system ranks opinions corresponding to the same feature based on their strength. For example, "great" is stronger than "almost great" which in turn is stronger than "mostly ok"."
You can go check it out at http://www.cs.washington.edu/research/knowitall/opine/. Using this method, a break down of the important parameters can make the task of judging sentiment on each much easier. Again, per Bob's note above, it's important to understand that while there might be an overall sentiment for the powerbook, understanding that the monitor sentiment is diff than the keyboard sentiment is important information for both product managers of powerbooks and prospective customers.
Posted by: p-air | May 22, 2007 at 12:40 PM
While I agree with the above two increments, I think that there is far more to be said. For example, consider expression like 'X is good' and compare them with expressions like 'Y (a part of X) is good'. How does one aggregate here? What are the biases (perhaps a certain feature produces a different distribution of positive/negative than another). How does one moderate the bias of an individual speaker? If author A always makes negative statements and B always positive is the aggregation weighted? How?
Posted by: Matthew Hurst | May 22, 2007 at 01:09 PM
Some other companies out there doing similar stuff are: -
1. Lexalytics: http://www.lexalytics.com
2. OpinMind: http://www.opinmind.com
3. Corpora Software: http://www.corporasoftware.com
4. Nstein: http://www.nstein.com
5: Nielsen Buzzmetrics: http://www.nielsenbuzzmetrics.com
I'm not sure how many of them score sentiment at a "mention" level (e.g. for a particular brand) or whether they just assign it at article/document level. I've seen some of Lexalytics' software and they claim to be able to score sentiment at mention level. It's certainly a growing area and it will be interesting to see what the next year holds.
Posted by: Dave | May 31, 2007 at 10:43 AM