I spent Thursday and Friday last week in the Bay area. On Friday I participated in Berkeley's Future of Search (FoS) event, Thursday was spent partly chatting to people at BuzzLogic (thanks Todd for breakfast at a great little place in South Park) and Barney et al at Powerset.
I had previously considered BuzzLogic to be a competitor to BuzzMetrics. However, I now have a better understanding of what they are up to and how influence, if harnessed and measured correctly, is currency that they can take to publishers/bloggers, enterprises and advertisers. In other words - while they are fundamentally in the measurement business, the application of that measurement is horizontal.
I had previously thought of Powerset as a competitor to Google - well, I guess I'm still right about that. I fully understand the skepticism that previous efforts in this space has established but having seen the latest version of the demo, and having seen the presentations made at FoS by Yahoo! and Google, it seems clear that strategically they are on target and technologically they have a winner. The key, then, is release, users education, product management and so on.
The Future of Search event was a quick progression of short talks and panels. The talks, given by Andrei Broder (Yahoo!), Peter Norvig (Google), Marti Hearst (UB) and Eric Brill (Microsoft) covered a number of areas. Some of the highlights of the talks:
- Andrei Broder characterized the evolution of search in terms of generations. These were as follows:
- Syntactic search
- The use of meta-data (anchor text, links, etc)
- Semantic search (by which I believe he also meant intention understanding of a sort)
- The transformation from information retrieval to information supply (here he talked about a system that supplies additional annotations on current views of, say, a document - e.g. annotations on a newspaper; the point is that there is no explicit query required to receive information)
- Peter Norvig, rather than providing a forward looking view in his talk characterised the current bucketing of challenges in terms of what he called an the 'ice cream cone of search.' This is really what is called in common parlance 'the long tail of search' (if you're going to use an ice cream as a metaphore, ice cream-ness needs to figure - flavours, toppings, melting, etc.) Essentially, the idea here was that there are queries (e.g. 'clinton') which have many occurrences and for which there is a lot of data. In such cases, there are many opportunities to derive value from the data (there is a lot of it), and much reason to do so (many customers). On the other hand, there is a long tail (or a tip-of-the-cone in Peter's language) of queries with low frequency and for which there is little data. In these cases, more sophisticated methods are needed to add value. The most interesting thing that Peter said (to me at least), however, was not in his presentation but in the QA. He acknowledged that there is still much to be done (for Google) in terms of understanding document structure, in particular the interpretation of tables. I've never seen any real evidence of good document analysis in main stream search which is actually very surprising given the constraints that document structure provides which can only help with relevance and other issues.
- Marti Hearst summarized work being done at Berkeley which touched on faceted search and the role of tagging in social media. I'm a tag skeptic, but I was interested in some of the ideas here which explored the relationship between tags, ontologies and faceted search. To me, the problem with tags is the mess they introduce (they aren't clearly meta-data).
- Eric Brill discussed fundamental issues relating search to personalization and intention.
The event is timely and there have been a number of recent posts on the topic which contribute to the discussion:
- The Hakia blog has an interesting (if ad hoc) poll around what users want to see in search engines.
- Arnaud's summary of Scott Newcombe's question about the future of Google.
- Greg Linden's post quoting Dyson indicating that search ought to be replaced by a different type of interaction with online data and services.
My contribution to the debate centered on the distinction between social media (e.g. blogs) and the main stream web. Currently, the main stream search engines are doing a poor job of integrating social media in their results. Conversely, blog search engines take no advantage of the main stream web to analyse the influence and content of blogs. Social media may be thought of as an annotation stream on the main stream web. This model suggests some interesting ways to get leverage out of the differences between these two areas of the web. Main stream web can be used to rank social media content, and social media (blog) content could be used to rank the main stream web.