Greg recently pointed me to evri.com. The site does a number of really interesting things. Firstly, it attempts to solve the named entity extraction problem in a broad way. Named entity recognition is often limited to person names, places and organizations. Evri doesn’t seem to have any limit to the types of things it discovers – music, bands, movies, books. Secondly, it looks for relationships between those entities. This is largely via collocation in a document. Thirdly, it attempts to disambiguate concepts with more than one possible type, thus Blue, which could be a film, a band or an album (not to mention a colour) is disambiguated. Finally, it gives access to the web via an interface which allows the user to both search and wander across the relationships between entities.
In named entity recognition, there are three key features which largely determine the nature of the task:
- Inherent types: a person name is generally recognizable as such without context (with some obvious exceptions – names like White, Black, etc.)
- Syntactic types: product names and addresses often have some syntactic pattern that gives internal coherence.
- Cultural types: the name of a book, film, video game, etc. is often simply some number of words from the language (The Lord of the Rings, Lips, Wanted, Today).
The third type – cultural entities – are the hardest to match, and this is exactly the type that evri appears to excel at. It does have some trouble with the harder cases (It – the book or film, Today – the US television show).
In evri, I see a glimpse of the future – a new way to craft the users relationship not just to the documents of the web, but the information on the web.
Comments