Update: TechCrunch picks up the story. One thing that is getting a little fuzzy here is where NLP ought to be applied in search. Understanding the query is only one part - the other part (to me, the more interesting part) is in understanding the text in the documents. This also requires smarts for understanding the structure and layout of documents and the function of different document areas (navigational, content bearing, etc.). This point appears to be missed in the TechCrunch post (and in the comments by readers there).
Briefly, Barney has written up in detail his vision regarding Powerset, NLP in search and the restrictions of keyword based search. This is worth reading - and I'll post more on it later. I wanted to link this article to a recent post by Paul Kedrosky which expressed satisfaction with current search engines:
I have no idea anymore what "better" search would mean. I find pretty much everything I want now, and while natural-language processing always sounds great, improvements in how I submit searches do diddly for me.
My analogy is as follows: imagine interacting with a reference librarian, but you could only speak in 2-3 word statements. The reason this analogy is useful is that it points out that the search problem, when not constrained by the assumptions and expectations of the text box, is far richer and complex than we've been blinkered to believe.
Oh yes - I also wanted to quote Barney on a most excellent turn of phrase describing keyword based queries language, which he describes as: a grunting pidgin language. Excellent!
The application of NLP to IR is not new and I'm wondering if there is prior work on NLP actually improving results; I cannot think of any. It's a fine story but truly natural language interaction with a system comes close to asking for natural language understanding which is a tall order. The recent success in the subdiscplines of information extraction and interaction are probably better avenues to explore improving and changing search.
Posted by: Fernando Diaz | October 06, 2006 at 03:56 PM