Danny Sullivan has an intersting post over at Search Engine Land about the integration of real time (i.e. twitter) search results into Google's results page and how they performed when the american actress Brittany Murphy died.
The most interesting thing about the post, and I think this is something happening across the board when it comes to real time data and search, is that while Sullivan makes reasonably fair and valid observations, it is not clear what requirements he has in mind for the ideal system, or when he expects real time search to really be of value.
Firstly, the case of Murphy's death isn't time critical from his point of view as a user (what action can he take?).
Secondly, and perhaps most importantly, there is no real discussion of the relationship between latency and quality. It seems to me a matter of course that the lower the latency (time between publication and surfacing in search results) the lower the quality. This is simply because there is less data to go on to make a determination of the value or quality of the 'document' (i.e. a tweet).
Thirdly, the results that we are seeing in the scenario he investigates perhaps indicate the inherent nature of the content. That is to say, much of the content is simply a reaction to the news published elsewhere and is at best an indication of the interest in the story and at worse an indication of the botware and so on that merrily populates these streams.
If you think about scoops in this type of story the source would have to be somehow close to the event. Given that we can't predict who is going to keel over next, the source could be effectively some random person (with or without a twitter account). Is that person going to drop a message to TMZ? Is that person likely to be well connected and thus have their gossip spread by their loyal followers?
I'd like to hear Sullivan's take on what valid scenarios are for this type of content. My take: longer format content (blogs) has a good chance of substantiating real stories; short form content (twitter) is useful not for simply reflecting what is being passed around but for situations in which the social aspects are part of the story (Iran) or when the bit rate is low and the immediacy of the situation promotes clarity (earthquake). In other words, twitter turns the search problem from a collect, index and rank problem (serving documents) to a data mining problem (make sense of all these documents as a whole). Consequently, it has the possibility radicalize the form of search itself (which is why Google's naive integration into the SERP is so jarring to Sullivan).
There are two aspects to real time search that we should be watching out for in 2010. The first is the continued experimentation of integration in traditional search results. The second is the impact this data will have on discovering and surfacing other content (i.e. not the real time documents themselves, but the things they point to).