By forming a query out of the candidate names for each party:
- (clinton OR obama OR edwards OR gravel OR kucinich)
- (giuliani OR huckabee OR keyes OR mccain OR paul OR romney OR thompson)
and combining this with terms that are commonly used around the presidential election:
- (president OR presidential OR campaign OR election OR primary OR primaries)
we can get some understanding of where attention lies at the party level.
The graph above suggests that until November both parties were getting a similar amount of attention. Prior to the first batch of primaries, the Republican candidates were getting a little more, but the Democratic race may be proving a more engaging topic than the Republican one.
You can play around with this query to see where attention lies for different issues by adding in terms such as +health, +war, +iraq, +immigration, etc.
With just slightly more complicated queries, you could get an estimate of whether the buzz is positive or negative:
http://arxiv.org/abs/cs.CL/0309034
Posted by: Peter Turney | January 23, 2008 at 10:47 AM
Peter,
I disagree. Determining sentiment requires generating tuples of the form (speaker, target, polarity). While determining candidate targets and polarity may be possible, the association is a hard problem. Many approaches to this make a simple sentence assumption: a sentence mentioning Obama which is also positive is positive about Obama. Unfortunately, this has many problems. Sure, in some domains it works out well, but in the political domain - where there are many targets and often opposing opinions - things are harder.
I do believe that it is possible to create a system which does a good job of sentiment analysis for politics. But I don't believe it can be done with 'slightly more complicated queries.'
That being said - I'd be happy to be proven wrong. Could you show some examples of the queries you have in mind and their accuracy?
Posted by: Matthew Hurst | January 23, 2008 at 10:55 AM
"Many approaches to this make a simple sentence assumption: a sentence mentioning Obama which is also positive is positive about Obama."
I expect that this assumption works well enough, given large sample sizes. I assume the errors would cancel out, if you take the average of a big sample. But you're right, it is a big assumption, and I really don't know how well it would work.
Posted by: Peter Turney | January 23, 2008 at 12:09 PM
Now how about those graphs without Ron Paul? That's what I'd like to see.
Posted by: Kevin | January 23, 2008 at 01:19 PM
Removing the term 'paul' appears to have a pretty significant impact.
http://tinyurl.com/yrfx9v
Posted by: Matthew Hurst | January 23, 2008 at 03:24 PM