Fernando is still skeptical about the potential of NLP to play a major role in search. I may be putting words in Fernando's mouth, but I believe the reason he states this is because he is assessing its impact against the standard search interaction (type words in a box, get a list of URLs back). This is missing the point.
When one is dealing with text (that is to say, an abstraction of content which is ultimately a sequence of letters represented in some form) there is a considerable amount of ambiguity. Sure, there are some manipulations that can be performed (ok - there is one, its called stemming), but ultimately, you are at the mercy of the most trivial representation of the intended communication of the speaker/author. It is not surprising, then, that the standard search paradigm (which is equivalent to the IR paradigm) is to deliver a list of documents: text in, documents out.
When one is dealing with language, one is dealing at a higher level of abstraction. Rather than sequences of characters (or tokens - what we might rudely refer to as words) we are dealing with logical symbols. Rather than the primary relationships being before and after (as in this word is before that word) we can capture relationships that are either grammatical (pretty interesting) or semantic (extremely interesting). With this ability to transform simple text into logical representations one has (had to) resolve a lot of ambiguity. The current search paradigm relies on a number of statistical qualities relating the query to the text in all the documents in the index to resolve these ambiguities with the help of the user: something interesting ought to be found somewhere in the documents at the top of the heap - please go and find it. When the content system itself is dealing with the ambiguity, the interface no longer has the job of dealing with this issue and so search systems (in fact, the won't even be called search systems) will be able to provide far more interesting applications.
This is why I find the notion of the DataWeb so interesting (see this post on Cognos for an illustration).
In some sense, I've broken the clear distinction I made earlier, in the post Fernando responded to, between the back and front end of search: I'm claiming that changes to the back end will enable fundamental changes to how 'results' are served.
Fernando states:
...search quality can be and has to be traded off with search cost.
If you change the game (e.g. by changing the way results are provided) then the notion of quality has been disrupted. I'm not sure what the costs are that Fernando is referring to. CPU (e.g. time to process all content)? Response time?
Does NLP mean Neuro-Linguistic Programming first and foremost to anyone else? I know what you use it to represent but as a marketer I think of Neuro-Linguistic Programming before Natural Language Processing.
Posted by: Jake Lockley | February 01, 2007 at 08:27 PM
"Rather than the primary relationships being before and after (as in this word is before that word) we can capture relationships that are either grammatical (pretty interesting) or semantic (extremely interesting)."
Task #4 in SemEval 2007 is specifically concerned with a new kind of search that is enabled by capturing semantic relations:
http://nlp.cs.swarthmore.edu/semeval/tasks/task04/description.shtml
We envision a relational search engine that would allow queries such as:
- list all X such that X causes cancer
- list all X such that X is part of an automobile engine
- list all X such that X is material for making a ship's hull
- list all X such that X is a type of transportation
- list all X such that X is produced from cork trees
Posted by: Peter Turney | February 02, 2007 at 05:12 PM
May be it would be productive if NLP could be applied to social search instead of focusing on algorithmic search.
Posted by: clique | February 05, 2007 at 12:12 AM
@Jake: NLP refers to Natural Language Processing
It's strange to see this approach gaining traction first in "whole web" searching, when the social media space seems it'd be more conducive for early successes.
Posted by: Hans | February 12, 2007 at 11:59 AM