Matt Cutts points to a neat new feature in Google news search which extracts quotes by individuals and displays them at the top of the result set. You can click through to more quotes by the same person. Regardless of what you might thing of the value of this, it does expose some key capabilities on the linguistic side.
- Disambiguation: a search for Hillary Clinton produces quotes like the following, which would require the system to resolve 'Clinton' to 'Hillary Clinton' : "When it comes to finishing the fight, Rocky and I have a lot in common. I never quit," Clinton said recently.
- Pronoun resolution: the same search produces quotes qualified by 'she': She said last week that she knows, "what it means to get knocked down, but I've never stayed down."
I'm guessing that the product has been tuned highly for precision (that is, after all, what web search companies are all about). Thus, a search for just 'clinton' on the front end only presents results for 'Hillary Rodham Clinton', and a search for 'Bill Clinton' produces no quote results. My guess is that there is some general technology underneath this, but there is a strong editorial layer designed to ensure that all the results are of high quality at the expense of recall. This is not surprising and quite reasonable.
It'd be interesting to know who is on the list of people that get passed through. I see Gordon Brown, but not Tony Blair. No sign of the Dalai Lama saying anything quotable even though the top news search result has this very quotable passage:
"From the very beginning I have supported the Olympics," said the Dalai Lama. "We must support China's desires. Even after this sad situation in Tibet, today I support the Olympics." Still, he said he fully understands why people would express frustration and protest.
Comments