Any system that uses some sort of inference to generate user value is at the mercy of the quality of the input data and the accuracy of the inference mechanism. As neither of these can be guaranteed to by perfect, users of the system will inevitably come across incorrect results.
In web search we see this all the time with irrelevant pages being surfaced. In the context of track // microsoft, I see this in the form of either articles that are incorrectly added to the wrong cluster, or articles that are incorrectly assigned to no cluster, becoming orphans.
It is important, therefore, to take these imperfections into account when building the interface. This is not necessarily a matter of pretending that they don't exist, or tricking the user. Rather it is a problem of eliciting an appropriate reaction to error. The average user is not conversant in error margins and the like, and thus tends to over-weight errors leading to the perception of poorer quality in the good stuff.
In designing the UI for track // microsoft, I dealt with these challenges in the following way.
Firstly, incorrect cluster assignment. The key goal of the cluster of articles associated with a story is not to provide access to all the articles (the user isn't going to read every single opinion about whether or not Rovio is going to make a windows phone version of Angry Birds). Rather it is to provide an indication - a visual indication - of the amount of content being created about that topic. If this happens to be 10 articles +/- 2 it shouldn't really matter to the user. The design, then, bundles the large mass of articles under their feed name rather than article name.
Secondly, the incorrect non-assignment of articles to clusters. The problem here is that there isn't enough data available to determine that two articles are about the same topic (of course, there are many improvements to make on that, but that is the subject of another post). The design addresses this by having two columns - the first shows the clusters of stories giving the user a view of what the major topics are; the second shows a more immediate list of articles for which we don't care about clustering. This still exposes these posts while giving the user the opportunity for forgiveness.
By taking the UI into account one can, hopefully, avoid some of the concerns generated by incorrect inferences while getting all the benefit of an automated system.
In some sense, web search has been doing this for a long time. However, the method used there is more intrinsic to the nature of the application. By positioning a search engine as an intermediary between humans (the searcher and the author of the document) the engine can be forgiven to some extent for errors and at the same time leverage the skills of the human to hunt and peck through the results. As we move in to the world of the entity web, we will have to get far more sophisticated in balancing between utility and the over-weighting of negative perceptions.
Please take a look at the site and let me know what you think.
I've rolled out a small update to track // microsoft (a site for tracking blogosphere buzz about Microsoft - my employer) which provide a more compact view the cluster of articles on a related topic.
As you can see from the cluster below - the story around the retina display for Windows 8 is presented in a more space efficient view. This has the downside of hiding the specific Twitter and Bitly statistics per story, but I'm working on a way to bring those to the fore. Stay tuned.
Microsoft is an incredibly diverse company. I've just celebrated 5 years here and still don't have a full appreciation of the breadth and depth of products and innovation that the corporation generates. After BlogPulse was unplugged, I felt something of a hankering to continue to follow the buzz around Microsoft, partly as a way to better follow what the company is doing and how it is perceived in the online world.
I'm a big fan of TechMeme, but it has some challenges when it comes to tracking news and trends around a specific company. Firstly, I don't know the sources that are used and the ranking mechanisms in place, so it is hard to really understand quantitatively what it represents. Secondly, with limited real estate, while a big story may be happening for a company of interest, it can be crowded out by other events. Thirdly, I can't help but think it has a strong valley culture bias. Fourthly, it hasn't evolved much in the years that I've been visiting it.
So I've put together an experimental site called track // microsoft which follows a few blogs, clusters posts that are related and uses Bitly and Twitter data to rank the articles and clusters of stories. In doing this, I observed that many posts in the blogosphere about Microsoft would contain videos (be they of Windows 8 demos or the latest research leveraging the Kinect platform).
The site has three basic columns. The first contains established stories, represented by clusters of articles. The second represents a more timely view of posts. Both of these columns use Bitly and Twitter statistics to rank, with a bias to recency. The third column shows videos which have been embedded in posts multiple times.
Thus far, I find the stories and videos that surface here to be very interesting. This is where I first learned about: