For those following the parallel debates concerning NLP and search (NLP discussion from the technical side in parallel to the thy-shall-not-hype discussion from the Web 2.0 pundit-sphere) may be interested in this post by John Battelle from October 12th, 2004 (!). In the context of recent discussion, one hardly knows where to begin quoting:
"Named entity extraction" is a relatively new project [] which Norvig said Google had been working on for about six months. As Norvig explained the concept - essentially identifying semantically important concepts and the meaning wrapped around them[.]
This is in the context of a technology demo which Google gave around that time. Battelle continues, quoting Norvig in an eWeek story:
For example, Norvig said, researchers are looking for ways to break down sentences by looking for a phrase like "such as" and grabbing the names that follow it. The goal is to not only pull out the name but also its clusters, so that a name such as "Java" can be associated both with the computer language and with language in general, Norvig said.
"We want to be able to search and find these [entities] and the relationships between them, rather than you typing in the words specifically," Norvig said.
Battelle then goes on to speculate about how these capabilities might surface in the Google UI. The last sentence in the above quote seems so close - at least in terms of vision - with some of the current wave of NLP search debate that is provokes the question: what happened to this project? Did Google try and fail? If you read it closely, you'll see that Norvig is talking about some key NLP concepts:
- Entities (typed concepts expressed in short spans of text, generaly noun phrases)
- Ontologies (Java IS_A programming language)
- Relationships (between entities)
I mean - couldn't you build a next gen search engine on such wonderful ideas?
It would also suggest that Powerset's claimed supremacy fm the use of NLP (for their yet to be released service) may need to be tempered in light of work that Google has already explored and may be ready to release if and when appropriate.
Posted by: p-air | February 15, 2007 at 11:48 AM