Any system that uses some sort of inference to generate user value is at the mercy of the quality of the input data and the accuracy of the inference mechanism. As neither of these can be guaranteed to by perfect, users of the system will inevitably come across incorrect results.
In web search we see this all the time with irrelevant pages being surfaced. In the context of track // microsoft, I see this in the form of either articles that are incorrectly added to the wrong cluster, or articles that are incorrectly assigned to no cluster, becoming orphans.
It is important, therefore, to take these imperfections into account when building the interface. This is not necessarily a matter of pretending that they don't exist, or tricking the user. Rather it is a problem of eliciting an appropriate reaction to error. The average user is not conversant in error margins and the like, and thus tends to over-weight errors leading to the perception of poorer quality in the good stuff.
In designing the UI for track // microsoft, I dealt with these challenges in the following way.
Firstly, incorrect cluster assignment. The key goal of the cluster of articles associated with a story is not to provide access to all the articles (the user isn't going to read every single opinion about whether or not Rovio is going to make a windows phone version of Angry Birds). Rather it is to provide an indication - a visual indication - of the amount of content being created about that topic. If this happens to be 10 articles +/- 2 it shouldn't really matter to the user. The design, then, bundles the large mass of articles under their feed name rather than article name.
Secondly, the incorrect non-assignment of articles to clusters. The problem here is that there isn't enough data available to determine that two articles are about the same topic (of course, there are many improvements to make on that, but that is the subject of another post). The design addresses this by having two columns - the first shows the clusters of stories giving the user a view of what the major topics are; the second shows a more immediate list of articles for which we don't care about clustering. This still exposes these posts while giving the user the opportunity for forgiveness.
By taking the UI into account one can, hopefully, avoid some of the concerns generated by incorrect inferences while getting all the benefit of an automated system.
In some sense, web search has been doing this for a long time. However, the method used there is more intrinsic to the nature of the application. By positioning a search engine as an intermediary between humans (the searcher and the author of the document) the engine can be forgiven to some extent for errors and at the same time leverage the skills of the human to hunt and peck through the results. As we move in to the world of the entity web, we will have to get far more sophisticated in balancing between utility and the over-weighting of negative perceptions.
Please take a look at the site and let me know what you think.
I dig it, though it's hard to overlook the expectation from years of looking at Techmeme daily that the clusters are ranked by importance and the that the stream is the same content ranked by recency and linked to its respective location in the importance-ranked left bar.
For anyone who hasn't spend years training themselves to have those expectations though, I think I can see what you're saying: clusters help show what's important, the stream captures everything else and sets the expectation that not everything is in a cluster? Generally speaking though, I really appreciate reading about the idea of visual design being used to offset exaggerated negative perceptions about quality from end users.
Posted by: Marshallk | March 28, 2012 at 02:05 AM
Excellent point. UI can indeed be a way to indicate accuracy of inference(or lack of it) on a bunch of results to a user. I would also add workflow design as a means to utilize less than perfect inference algorithms. To quote an example - disclosures in financial research are typically mind numbing to generate, especially one called 'mentioned company' disclosure, which basically requires a report author to state what companies have been spoken about in the document. Apart from being time consuming to read say 50 page of text + tables, it is also difficult to do brute force search on it since IBM could also be called Big Blue and so on. Here we have used classification algorithms to great effect while also structuring the workflow + UI in a manner that minimizes errors in recognition to slip through.
Again, excellent post. More than anything helps seal the case for adding UI design as an essential element while building an inferencing system.
Posted by: Account Deleted | March 31, 2012 at 05:27 AM