My Photo

« Skewz not BLEWS | Main | The Surprising Mr Obama »

March 20, 2008



It seems from the examples that they are attempting to categorize the content without consideration for the context, but rather in a general case. We do it context-based, although when dealing with generic Web results some amount of "noise" is inevitable. I think they may have a better chance of success in a vertical, by creating solid taxonomies tuned for specific area of content. That's why I think in the challenge they seek app ideas for a vertical.

Frank Goertzen

Perhaps the proper semantic name for this contest should be the AmericanSemanticHackerContest.


Interesting examples. I'm guessing they got Maharashtra from MSR, since MSR appears on the Wikipedia Maharashtra page, though a good tokenizer would not have fallen for it.

I wouldn't expect a good disambiguation solution any time soon; the problem is just that hard. Personally, I find it even more striking that what they're looking for is a business plan. So they have some technology, it might not be perfect, but keep in mind how tough the problem is -- categorizing some random text without any domain restriction. That's extremely ambitious. But, what they're lacking is a good business application for it. Isn't this indicative of text mining in general? Great algorithms and ideas, but not much viable practical applications?

I think the bigger question is what are good applications of "crummy NLP" in Church and Hovy's sense...

Verify your Comment

Previewing your Comment

This is only a preview. Your comment has not yet been posted.

Your comment could not be posted. Error type:
Your comment has been posted. Post another comment

The letters and numbers you entered did not match the image. Please try again.

As a final step before posting your comment, enter the letters and numbers you see in the image below. This prevents automated programs from posting comments.

Having trouble reading this image? View an alternate.


Post a comment

Your Information

(Name is required. Email address will not be displayed with the comment.)

Twitter Updates

    follow me on Twitter

    March 2016

    Sun Mon Tue Wed Thu Fri Sat
        1 2 3 4 5
    6 7 8 9 10 11 12
    13 14 15 16 17 18 19
    20 21 22 23 24 25 26
    27 28 29 30 31    


    Blog powered by Typepad