For those following the parallel debates concerning NLP and search (NLP discussion from the technical side in parallel to the thy-shall-not-hype discussion from the Web 2.0 pundit-sphere) may be interested in this post by John Battelle from October 12th, 2004 (!). In the context of recent discussion, one hardly knows where to begin quoting:
"Named entity extraction" is a relatively new project [] which
Norvig said Google had been working on for about six months. As Norvig
explained the concept - essentially identifying semantically important
concepts and the meaning wrapped around them[.]
This is in the context of a technology demo which Google gave around that time. Battelle continues, quoting Norvig in an eWeek story:
For example, Norvig said, researchers are looking for ways to
break down sentences by looking for a phrase like "such as" and
grabbing the names that follow it. The goal is to not only pull out the
name but also its clusters, so that a name such as "Java" can be
associated both with the computer language and with language in
general, Norvig said.
"We want to be able to search and find these [entities] and the
relationships between them, rather than you typing in the words
specifically," Norvig said.
Battelle then goes on to speculate about how these capabilities might surface in the Google UI. The last sentence in the above quote seems so close - at least in terms of vision - with some of the current wave of NLP search debate that is provokes the question: what happened to this project? Did Google try and fail? If you read it closely, you'll see that Norvig is talking about some key NLP concepts:
- Entities (typed concepts expressed in short spans of text, generaly noun phrases)
- Ontologies (Java IS_A programming language)
- Relationships (between entities)
I mean - couldn't you build a next gen search engine on such wonderful ideas?