When dealing with text, you can get a certain way with simple tricks. But eventually, a lack of any real model comes back and bites you. I recently saw a referal on this blog from a search on Google for
Google's results due to the 'AI' used in understanding the query, contained pages which didn't contain 'powerset' at all, but which contained some stem (for lack of a better word) such as 'powers'. You can get the results you intended if you put powerset in quotes as well (contrary to some belief that this is a crazy thing to do).
Discovery Engine, Powerset - both search engines - so, why do I get adverts for 'Find Landrover Parts?' Or more amusing 'Engine Set'.


Simple. It thinks (based on user input and click-through) that a relatively rare word "powerset" could be misspelling of something else. So its intelligent spell correction throws in alternatives into the query right after initial parsing, and index serve code pulls out the pages that contain these alternatives (along with the pages that contain "powerset") and ranker ranks them pretty high, because powerset isn't very well known outside NLP/IR circles.
Quoting "powerset" excludes it from implicit spell correction thereby forcing an exact index match.
Posted by: Dmitry | December 06, 2007 at 01:45 AM
"why do I get adverts for 'Find Landrover Parts?"
Because, prosaically, the Discovery is a common Landrover model, and there will be many owners looking for replacement engines.
Posted by: Derry | December 06, 2007 at 03:43 AM