My Photo

« FeedBurner and Google Reader | Main | Blogosphere Map on PAGE Cover (Again) »

May 06, 2007

Comments

p-air

Another point that Prof. Lotfi Zadeh (fm the Computer Science Division, Dept. of EECS, Berkeley) made, which I thought was well explained, was that "imprecision in natural language cannot be dealt with through the use of bivalent logic and probability theory. What are needed are new concepts and new techniques", than what's available today. Where he summarizes the future of search and being "question answering", he elaborated the short comings of natural language elegantly. I have to agree that communication and question answering goes beyond natural language only. Communication and answers are always relative, frequently through some exchange between people even, not just relying on to the words used, but also counting on the context in which they're used, where that context can be one's facial expressions, tone of voice, time of day or time in history, our mutually shared experiences and/or all of the above and other contexts. Even the written word has baggage along the lines of the items listed here, except for perhaps tone of voice and facial expressions.

The use of language to understand intent as is being positioned by NLP falls short in that language is but one channel in a multi-channel communication. Understanding the question asked by its words alone fall short of truly understanding the question asked. This leads me to the thought that Prof. Zadeh may have it right.

It was good meeting you there Matt and great to hear your perspectives live.

Kevin Burton

The reason why the majority might not be correct is why you use a trust metric.

Trust certain nodes over others.

Imagine these applied to voting systems. :)

Mike Love

I'm pretty baffled by Zadeh's comments and the questions about correctness. It seems like a resignation that is only acceptable outside of business. Imagine people at a search company saying "There is just too much context to try to figure out what this person is looking for based on keywords alone."

The natural language search panel devolved into the quest for a perfect question-answer machine that knows that Tesla invented the light bulb for instance. It's not clear to me why we would want our search technology determining the correctness of the pages. I just want search to be best at finding the pages I've described.

Patrick Herron

Using massive volumes of data creates problems if term interdependence is relied upon. Building representations of text collections using Latent Semantic Indexing, for example, become counterproductive when a document set reaches a certain size (somewhere on the order of 100,000 documents). I imagine that's the very source of "he" replacing "she." LSI is one of many term interdependence schemes that is supposed to be clever and often utilized to represent features of documents in large data sets. Using bigrams, trigrams, phrasers, etc., are all clever approaches for which you will receive nice stars from your professors. But in practice not only are they a pain. They are ultimately really dumb, truly, as document sets get larger and more "realistic." The lesson is not that data sets can become too large. There are approaches that mitigate the tyrranical majorities of pure frequencies. Ultimately term independence must be enforced.

Alas, too clever, whether by algorithm or via learning of large data sets, is never smart enough.

I still don't really know what "natural language" means.

The comments to this entry are closed.

Twitter Updates

    follow me on Twitter

    March 2016

    Sun Mon Tue Wed Thu Fri Sat
        1 2 3 4 5
    6 7 8 9 10 11 12
    13 14 15 16 17 18 19
    20 21 22 23 24 25 26
    27 28 29 30 31    

    Categories

    Blog powered by Typepad