The 3rd International Workshop on the Weblogging Ecosystem was held yesterday at the WWW Conference in Edinburgh. As the final event in this series (next year will see the launch of a new conference in the area), I felt it was extremely successful. The room was at capacity - both the result of the popularty of the workshop and some interesting organizational issues made by the conference organizers - more of which later.
I believe the papers at this workshop were the best set of all 3 events. In this post I'd like to summarize some of the points that really interested me. The full programme can be found on the workshop's home page, and the workshop blog contains posts which summarize the papers and which show the presenters in action!
The programme kicked off with a paper by Gilad Mishne, who interned with us (Nielsen BuzzMetrics) last year, and my colleague Natalie Glance. Leave a Reply: An Analysis of Weblog Comments looks at the world of weblog comments. This paper has already received some commentary from the blogosphere.
There were a couple of papers that looked at various social network analyses of blog data. Thomas Lento et al's paper The Ties that Blog: Examining the Relationships Between Social ties and Continued Participation in the Wallop Weblogging System I posted on yesterday. Belle Tseng's group from NEC presented Discovery of Blog Communities based on Mutual Awareness which has many similarities with some work I have briefly presented on this blog. It is fascinating to see communities appear out of the huge blogosphere graph - defining and understanding these communities is one area that is at the heart of true weblog research. Both Lento's and Tseng's papers introduced a temporal aspect to this research. Tseng's paper also showed how, given a discovered community, the topicality of that community could be discovered by mining keywords as in this example:
The NEC paper made use of a data set that BuzzMetrics (Intelliseek at the time) released to help support research in the area of weblog analysis. The applied research group at BuzzMetrics has been lucky enough to be supported by an industrial context which values strong relationships with the academic community. The data set is now on general release.
Mike Thelwall of the University of Wolverhampton also presented some work made possible by the data set. We chose the specific time period for the data as it contained a number of significant events. Mike presented Bloggers during the London attacks: Top information sources and topics, which gives an analysis of this time period and shows how the blogosphere reacted to the events. MIke's paper also mentioned some of the issues with dealing with the data set. The workshop had a very active discussion on how to carry out research in this space, the role of data sets and how research with data can be supported with free tools. Natalie's notes on the discussion capture some of these issues.
One of the hottests topics in the blogosphere just now, particularly from an industrial perspective, is internationalization (which means non-english or non-US coverage in general). It was great to hear Yasaman Soltan-Zadeh's presentation of her group's paper Experiments on Persion Weblogs which looked at a number of aspects of Weblogistan. The data she presented on the connected components found in the Persian blogosphere raise the question: are community structures the same or different in different regional subsets of the global blogosphere. This issue is also at the heart of Lento's paper which compared the Chinese language and English language users of Microsoft's Wallop.
Another question I'd love to know the answer to is: how does spam vary regionally? We had three papers on weblog spam: Characterizing the Splogosphere (presented by Tim Finen from U Maryland), Detecting Blog Spams using the Vocabulary Size of All Substrings in Their Copies (presented by Kazuyuki Narisawa from Kyushu University) and Collaborative Blog Spam Filtering Using Adaptive Percolation Search (by a group from KAIST in Korea). Each paper presented different areas of blog spam anlaysis.
Krisztian Balog presented a paper he co-authored with Maarten de Rijke (of ISLA, U of Amsterdam - the same group with which Gilad Mishne is associated) entitled Decomposing Bloggers' Moods. This paper looks at continued research on LiveJournal mood data.
Extracting Topics From Weblogs Through Frequency Segments, presented by Mizuki Oka of Tsukuba University in Japan, looked at mining terms in weblog data based on charactersistics of their distribution over time - somewhat related to work I presented at AAAI entitled Temporal Text Mining. This type of analysis has clear application in industrial settings where clients are interested in emerging and declining trends in online conversation.
The workshop concluded with two papers which looked at the client side of the blogosphere: BLOGRANGER - A Multi-faceted Blog Search Engine (presented by Ko Fujimura of NTT) and Browsing System for Weblog Articles based on Automated Folksonomy (presented by Tsutomu Ohkura of Tokyo University). The later sparked some interesting discussion around the use of social tags to create supervised classifiers for blog data. The point being that tagging (folksonomies) exist as a reaction in part to highly structured and automated metadata systems.
That concluded our workshop. The role of scientific research and the value to industrial and academic institutions was clearly validated by this event. Interesting and novel research was presented and, from the questions and discussion, it was clear that there is a strong community of researchers fascinated by the blogosphere. Science, acadmic work and industrial R&D is an ecology - an ecology that must be embraced by all actors if progress (both in terms of human knowledge and in terms of industrial advantage) is to be made.
Comments