A subtle but powerful feature on Google is question answering. If you type in a question like 'what is the population of UK', you will see an answer at the top of the page:
When I was working at the University of Edinburgh, I was involved in a project that required the extraction of information from tables. To cut a long story short, this train of research resulted in my thesis: Understanding Tables in Text.
Why do I mention this? Google is using table understanding to come up with the answers to these questions. If you take a look at the source in the above example, you can see that it is a view of the data presented in the CIA World Factbook. The page is, in fact, a big table of country names and population statistics. If you browse the other listed sources, you will find that they all contain some element of structured content pertaining to the answer of the question. It may not be an explicit table - there may be, for example, the name of the country and the key value pair population: #.
Once you grok this, you can get Google to do some impressive party tricks. By entering 'what is the religion of nepal?' you can find an answer: Religion: 90% Hindu, 5% Buddhist, 3% Muslim, 2% other from this page. By reverse engineering the table to create a question, you can get Google to answer something very interesting, like: who are the major trading partners of nepal? A neat trick to suggest real intelligence in the machine.
Of course, as Google is doing a light-weight trick which avoids any real semantics, you can get it to give odd answers:
- What is the density of France? 110.
- Who is the Queen? is the female head of a royal family.
- Who is the King? town (1990 pop. 4,059), Stokes Co., North, North Carolina, 15 mi/24 KM NNW of Winston-Salem;
- Where is the end? Country: UK
My guess is that for certain sources (like Wikipedia) which they have directly wrapped, they trust the answers. For other sources, where a loose match is found, they require multiple sources of evidence - what I call the WoodStein inference paradigm (Woodward and Bernstein wouldn't report anything unless they had multiple sources).
Google plans to roll out this feature gradually as it becomes more powerful and as the answers become more reliable. When they step over the threshold of simple text based answers to actual semantics, they will have taken a first step into a world where search no-longer means 'give me a list of documents'.


Comments