My Photo

« Social Media and Sleeper Hits | Main | Identifying Humour in Text »

September 24, 2006

Comments

Clyde Smith

That subjective/objective binary is a pretty impoverished approach to analyzing human consciousness.

You pretty much decimated its use in a paragraph or two. And, of course, you weren't the first.

Yet it will continue.

Sigh.

Bob Carpenter

In Lillian Lee and Bo Pang's paper, they set out to classify movie reviews into "positive" and "negative". They found it useful to first train a sentence-level "subjective"/"objective" classifier.

They take "subjective" to mean a reviewer's opinion and "objective" to mean a statement about the movie. They train a binary classifier over sentences using sentences extracted from user-submitted reviews for the subjective category and sentences extracted from descriptions of movies for the objective category.

They then train a classifer on the texts of whole customer reviews with categories "positive" and "negative" using 4- and 5-star and 1- and 2-star reviews respectively. They found that using only the "subjective" sentences from the reviews during training and classification worked better than using whole reviews. As far as I can recall, that's all they claimed in their paper.

I think readers searching for a theory of cognition in the class labels of a machine learning experiment are perhaps looking to the wrong literature. There was a recent discussion of this on Hal Daume III's NLP-ers blog: Stat NLP is not NLP but just stats: http://nlpers.blogspot.com/2006/09/statistical-nlp-is-not-nlp-but-just.html

Matthew Hurst

In Yu and Hatzivassiloglou's 'Towards Answering Opinion Questions: Separating Facts from Opinions and Identifying the Polarity of Opinion Sentences', they cite labeler agreement for Fact as 46%, Opinion as 77%; worse, for positive it was 29%, negative, 51% (albeit for small counts). Numbers like this come from ill-defined tasks - a result of there being no theory of opinion. The selection of labels for a machine learning experiment *on text* must be sensitive to the manner in which that text was created (i.e. by a person) and some notion of natural or reasonable labels. One of the big turn-offs from much of the recent work on supervised classification applied to text is simply that it in no way improves our knowledge of language or of machine learning in any significant manner.

The comments to this entry are closed.

Twitter Updates

    follow me on Twitter

    March 2016

    Sun Mon Tue Wed Thu Fri Sat
        1 2 3 4 5
    6 7 8 9 10 11 12
    13 14 15 16 17 18 19
    20 21 22 23 24 25 26
    27 28 29 30 31    

    Categories

    Blog powered by Typepad