A quick reminder: the pre-conference registration period for ICWSM 2008 will be over on the 7th of March. If you are coming to the meeting, check out our CrowdVine network - a great place to meet others who are going to be there.
There were a couple of sessions at CIFOO which related two areas that I believe are very important. The first was a session on GapMinder/Trendalyzer - the software that Hans Rosling uses to give illustrate and motivate his wonderful presentations. The second was a session on data visualization.
GapMinder is the birthplace of Trendalyzer, a piece of software that Google acquired recently. GapMinder has three great assets: a data collection (publicly available statistics describing economic, environmental and other national and cultural measures), some software (which is basically a scatter graph with a temporal aspect - elegant, but quite simple) and a personality (Hans Rosling).
In the session on data visualization, I brought up a number of issues which I believe get at a central question: how do we go beyond the list as the only interface with the huge wealth of information that can be found on line. These included:
The user is lazy - users will discover the simplest path of least commitment.
Value of the web cannot be expressed as a list.
A user's investment in an interaction is related to the expected quality and value of results (thus, if you have low expectations, you will only suffer close to immediate response times, but if you have high expectations you will be prepared to wait longer for a response).
So what relates these two sessions? I believe that the Trendalyzer software is a great example of the type of step we need to make to better educate people about the power of visualization and the power of data types other than lists of documents. It is a single idea, a single way to look at things and such focus is a powerful way to keep the attention of the user and remove possible distractions and complexity.
Ola Rosling, who presented the project, had something of a tough time answering questions about the direction and plans for the system, however. Will corporations be able to upload their data and visualize it? Is the system intended purely for (high school) educational purposes? I assume that there is actually a well thought out plan for the system, but the Googlification of the project may make these objectives less than transparent at this time.
One clear answer that Ola gave which, I'm afraid, indicates a real lost opportunity, is that Trendalyzer will not involve any community aspect. The system does allow for the sharing of links that reconstitute a specific state of the interface, but there won't be any Many Eyes style community. There are a number of reasons why I believe this to be a mistake:
Community will help to build attention and thus lead to increased visibility and educational opportunities.
Community will help refine existing data (those pesky 'official' stats may be less than true) and identify new sources of data.
Community will help develop the tool.
GapMinder and Trendalyzer have always been very cool things - I hope that somewhere there is a strong vision guiding them forward to even more ambitious heights leveraging their new home.
(BTW, I am aware that the GapMinder organization is separate from Google - it is just the software that was acquired).
Checking in with the ratio of blog posts mentioning Obama and those mentioning Clinton we see that Obama continues to trend upwards. The vertical scale shows the ratio (log scale) and the horizontal shows days starting from 27th of December 2007 up to 25th February 2008 (today).
Glenn Fannick shows similar data from media mentions (MSM I guess) - I'd like to see him provide the ratio graph like that above.
Something that I really liked at CIFOO was the presentation by Jason Hunter of MarkLogic's MarkMail product (Jason blogs on the MarkMail weblog). MarkMail is a growing repository of 8MM posts to mailing lists, growing at approximately 2k per day. There are two attractions for me to this system. Firstly, it deals with a nice community source: mailing lists. Secondly, they have made some innovations in the interface which break from the standard search interface.
A search brings up a multi-functional panel which include a time series of posts that match the search term. In addition, information about the lists, authors and other details of the matched messages are provided. They also have a nice little touch which slides the interface left and right to provide access to the original full post.
Jason was really open about sharing some of the cool features of this system as well as some of the acknowledged challenges in the space (e.g. what is a sensible ranking function for this type of data set).
I'm back from the Collective Intelligence Foo Camp - it was a great experience. The format of the meeting, which is essentially a collection of spontaneous discussions, informal presentations and demos, really made me think about how to get the best value out of other, more traditional meetings (like ICWSM!). I met a bunch of great people (many of whom I had been reading in the blogosphere) and participated in some very engaging and rich discussion.
I plan to write a number of posts about the camp, but perhaps the best place to start in terms of covering the meeting is at the end, when the group discussed definitional issues of collective intelligence. Definitions are always problematic as they can often define away interesting related areas that aren't captured by some 'pure' or elegant description. Thus I think it worthwhile summarizing some of the dimensions which surfaced in the discussion.
Parallelism - a key aspect of collective intelligence is the notion that there are many agents, all of whom are working in parallel in some way.
Homogeneity - are all the agents identical?
Systemic effects - does the system that networks the agents contribute in some way to the quality or form of the result. Another way to think about this is: are all the agents doing work which is simply summed together linearly, implying that a single agent, with enough time, could effectively carry out the same task.
Efficiency - does the system result in quicker solutions.
Intelligence granularity - is the resulting behaviour producible by the component agents? In other words, is the emergent intelligence attributed to the system at a higher level than that of the agents? A single ant (which we assume is not intelligent to any great extent) couldn't devise an algorithm for the efficient gathering of food, but the colony can.
Of course, for definitions one could do worse than refer to Wikipedia:
Collective intelligence is a form of intelligence that emerges from the collaboration and competition of many individuals. Collective intelligence appears in a wide variety of forms of consensus decision making in bacteria, animals, humans, and computers. The study of collective intelligence may properly be considered a subfield of sociology, of business, of computer science, and of mass behavior — a field that studies collective behavior from the level of quarks to the level of bacterial, plant, animal, and human societies.
The use of the term 'collaboration' is interesting here as - one might argue - collaboration requires intention.
The display of expats (shown below) could, however, show much more. While it shows the breadth of expat Brits, it doesn't, for example, reflect the percentage by country population (which would be very interesting). The tool does, though, have some depth. One can explore regions and get to some country by country stats. With visualizations like this, one always has to make a decision between the simplicity of exploration and the power of combined elements.
While a graph from BlogPulse allows one to see the attention around certain keywords - as in the example below for 'obama' and 'clinton', it doesn't allow one to compare how these terms are doing relative to each other.
The graph below takes the percentage values from the above and plots them as ratios (Obama as a ratio of Clinton buzz and Clinton as a ratio of Obama buzz).
What we can see here is that while certain parties may like to characterize the battle for nomination as a some you win, some you lose affair, the battle for attention appears to have a clear trend.
Note: producing this type of chart is pretty straight forward. The html page that presents the BlogPulse trend results includes client side image map data with the percentages included as labels. One can transform this in to appropriate tab separated data for a spreadsheet and then compute the ratios.
Update: here is the logarithmic version of the chart as suggested by Moritz Stefaner.
Chris Anderson continues to jot down observations about FREEdom in the wild. His latest post is illustrated by an image of powerlines transmitting the word 'copy' (I guess indicating the transmission of 'free' copies of content over a network). Ironically, this illustration captures one of the major issues which Chris has yet to address: the externalities of free stuff.
Now, I'm quite sure that Chris fully understands this issue, but I do wish he would address it to some degree in his musings as he builds up to the book. I'm also quite open to the idea that Chris' concept of 'free' is quite different from mine and may somehow elegantly sidestep this issue (though this post does equate specifically cost with value, thus hinting that externalities - which are a cost - will be addressed even though the post in question somehow fails to). That being said, the idea of 'free' enterprise productivity tools is so paradoxical as to be almost ridiculous. Just as you can't print money, you can't simply make costs evaporate. You can certainly move those costs around a little.