There were a couple of papers that I really liked at www. One was Mapping the worlds photos by David Crandall et al, another was SOFIE: a self organizing framework for information extraction by Fabian Suchanek et al. Both of these papers (while operating in different domains) used features from orthogonal spaces (images, tags and geo coding in the first case and entity, relations, text patterns and logic in the second) to automatically mine new facts from a data set.
However, after the initial impression of how great these things are (and they are great) one realises that the facts that they have surfaced are already known. In the first case, which excels at discovering landmarks and images of landmarks, this is a well known knowledge set – landmarks, by there definition, are discovered facts. In the second case, the fact used in the example (which we shouldn’t, of course, judge the entire system by) was also well known.
What one would want to see in these papers and certainly in their presentations, is the long tail of facts. The head of the knowledge-sphere is well known by definition, rather than discovering it, we should assume it. The long tail will have weaker signals and it is there that we really care about the power of these systems.
Other papers in this and related areas from the conference include:
- Exploiting web search to generate synonyms from entities by Surajit Chaudhuri et al
- Measuring the similarity between explicit semantic relations on the web by Danushka Bollegala et al
Actually, the act of mapping the world's photos does help you discover landmarks that are largely unknown. The "Mapping the World's Photos" paper is based on our own work at Yahoo! (see their "Related Work" and the landmark/World Explorer papers here: http://scils.rutgers.edu/~mor ). When we did that work, we created a live exploration/visualization demo which is still up and running at http://tagmaps.research.yahoo.com/worldexplorer.php . One of my favorite examples of discovery in that interface is the "Yoda" tag that appears when you zoom into San Francisco's Presidio. I did not know about that "landmark" until we had made this new exploration system.
So, yes, these are "discovered" fact but are only known to some. The beauty of this analysis is, like any good visualization tool, bringing this information to the surface. It is the flexibility of seeing information in the right resolution that makes a difference (very details for SF which I know well appear when I zoom in, vs. high level for Berlin where I've never been).
In addition, we suggested using these extracted landmarks when someone is searching for photos from a region; e.g. photos from NYC are really a set of photos of its landmarks. This idea was even part of Yahoo Image Search for a bucket test, I am not sure what the status of it is...
Finally, our "landmark" work focused on getting representative photos for these landmarks, which is an issue also when you know what the landmarks are.
Posted by: Mor | May 11, 2009 at 12:18 PM
There is an interesting approach for IR systems and a good Knowledge Management Systems' review in this blog. You should check out.
http://whatisprymas.wordpress.com/
Posted by: Antonio | April 12, 2010 at 07:47 AM