Subjectivity in Text
An essential element of sentiment mining, and other attitudinal analyses of textual data, is the notion of subjectivity. I've been re-reading some publications by Janyce Wiebe and others on the topic - in the process of performing a more extensive literature survey of the field. Something that strikes as being problematic about current definitions is that they attempt to have two distinct categories (subjectivity and objectivity) which are to be used as labels for spans of text.
Firstly, I think that subjectivity is not a feature of the text, but the intention of the author(s). Secondly, I believe that one can (ideally) measure the difference between what is 'true' and what is understood by the reader of the text. Thirdly, I believe that spans of text can intermix what we traditionally might regard as being subjective and objective intentions.
For example, consider the following:
The idiot won the election.
It may be factually correct that some person (referred to by the phrase 'the idiot') won some election (referred to by 'the election'). However, the term 'idiot' indicates a certain about of subjectivity. The author may wish to express regret by reporting factual information in an opinionated manner. It may also be the case that that particular person didn't win the election (though the author believes that he did). In this case, the author's intention is to be objective, though in fact they are simply mistaken.



That subjective/objective binary is a pretty impoverished approach to analyzing human consciousness.
You pretty much decimated its use in a paragraph or two. And, of course, you weren't the first.
Yet it will continue.
Sigh.
Posted by: Clyde Smith | September 24, 2006 at 07:18 AM
In Lillian Lee and Bo Pang's paper, they set out to classify movie reviews into "positive" and "negative". They found it useful to first train a sentence-level "subjective"/"objective" classifier.
They take "subjective" to mean a reviewer's opinion and "objective" to mean a statement about the movie. They train a binary classifier over sentences using sentences extracted from user-submitted reviews for the subjective category and sentences extracted from descriptions of movies for the objective category.
They then train a classifer on the texts of whole customer reviews with categories "positive" and "negative" using 4- and 5-star and 1- and 2-star reviews respectively. They found that using only the "subjective" sentences from the reviews during training and classification worked better than using whole reviews. As far as I can recall, that's all they claimed in their paper.
I think readers searching for a theory of cognition in the class labels of a machine learning experiment are perhaps looking to the wrong literature. There was a recent discussion of this on Hal Daume III's NLP-ers blog: Stat NLP is not NLP but just stats: http://nlpers.blogspot.com/2006/09/statistical-nlp-is-not-nlp-but-just.html
Posted by: Bob Carpenter | September 25, 2006 at 02:28 PM
In Yu and Hatzivassiloglou's 'Towards Answering Opinion Questions: Separating Facts from Opinions and Identifying the Polarity of Opinion Sentences', they cite labeler agreement for Fact as 46%, Opinion as 77%; worse, for positive it was 29%, negative, 51% (albeit for small counts). Numbers like this come from ill-defined tasks - a result of there being no theory of opinion. The selection of labels for a machine learning experiment *on text* must be sensitive to the manner in which that text was created (i.e. by a person) and some notion of natural or reasonable labels. One of the big turn-offs from much of the recent work on supervised classification applied to text is simply that it in no way improves our knowledge of language or of machine learning in any significant manner.
Posted by: Matthew Hurst | September 25, 2006 at 02:56 PM