My Photo

« Novel Graph Visualization | Main | The Illusionist, The Prestige »

October 15, 2006

Comments

Patrick Herron

Well, you're assuming that Google has a context for you. Personalization, if you will. There needs to be some context before it can disambiguate effectively.

As for even making a stab at disambiguation Google does do so for some terms. Oftentimes an ambiguous term generates multiple-sectioned results pages. Each section refers to one sense or another. So Google lets you administer or choose the right sense for your needs. Google's decision to pursue things in this fashion is I think the right choice.

Google does have some distribution data as to which senses garner the most clicks. Probably NLP as neurlingprog gets many more selections than NLP as natlangproc. Again, a good way for a contextless system like Google.

Clusty is doing a nice job, certinaly better in some respect than Google. Google is after all the market leader; their winning strategy is not to innovate in one product but rather to diversify into many. Google is leaving it to other companies (Vivisimo, and now, suddenly Microsoft) to innovate.

Now one working solution that might supply context and hence working disambiguation would be for Google users to have some sort of client-side Google app that would contain a model of their personal preferences & proclivities and re-weight search results on the client side. Another thing that could be done would be purely server side is to model preferences for every IP that issues a search query. Or now that google has a myriad of authentication-based services, search behaviors could be tied to accounts. The problem is that I'm convinced Google has no economic incentive to innovate with search.

Matthew Hurst

Patrick,

Thanks for the thoughtful comment. I think your last sentence is a good summary point.

Harish Kumar

I very much concur with your observation that we could do quite a bit about word sense disambiguation from the interface side. It is rather refreshing to read this given the extent to which server side processing is emphasised in this context.

I suppose that blog publishing interfaces would be a good starting point for this. All that is needed is a simple interface that offers to disambiguate some of the terms just before posting. The disambiguation would ofcourse have to be done using some kind of unique identifiers. The big blog search engines are in a good position to establish this set of identifiers.

Given the fact that all bloggers are constantly looking to improve the chances of their posts being discovered, there is a natural incentive for them to adopt this. Not unlike tagging! Just that, semantic disambiguation would be a lot more usefull.

Matthew Hurst

Harish,

I don't believe that this problem can really be solved at the user end - the added task of disambiguating terms manually is too much of a barrier. I do believe, however, that there is enough information in the data to automate the disambiguation. As for tagging, this is actually a pretty broken model, primarily because it doesn't use a meta-level of symbols - the tags are just more text. Research presented recently at WWW in Edinburgh (I don't have the citation at hand) showed that informationally, there isn't any real value in tags.

Alexandre Rafalovitch

Matthew,

How much of this is not because google is not smart enough, but because NLP as Natural Language Processing is not the only term people use when talking about subject area? There is also Computational Linguistics (different, but similar enough) and also specialised field names, such as Head-Driven Phrase Structure Grammar.

On the other hand, NLP as in NeuroLinguistic Programming is one term that is used very much as a brand, with individual techniques (e.g. Reframing) not having anywhere the same number of mentions.

Therefore, pages in the NLP term meaning are cross-linked much stronger and get higher Google standing.

On the other issue, if you use Google search after signing into your Google account, they collect history on you and eventually will be able to tailor the results to your context. At the moment, I suspect it is too computationally expensive with their map-reduce algorithms, but that might change in a near future. They already had similar experiments in their google labs for reordering the results on the first page.

The comments to this entry are closed.

Twitter Updates

    follow me on Twitter

    March 2016

    Sun Mon Tue Wed Thu Fri Sat
        1 2 3 4 5
    6 7 8 9 10 11 12
    13 14 15 16 17 18 19
    20 21 22 23 24 25 26
    27 28 29 30 31    

    Categories

    Blog powered by Typepad