Anjo follows up a post on conversation graphing with an updated visualization. In addition, relating to the upcoming workshop and the data set we are releasing, he asks:
Apart from stating that the data set consists of 10M posts little more is said about what potential participants are supposed to do with it. Throwing away 99,99% of the posts would be a good starting point, I think.
We released the data set for a number of reasons. Not least is that a number of researchers have asked for exactly what we are providing: a large, comprehensive collection of blog posts. The second reason, which I think Anjo might be looking for, is that we'd like to see what happens when the full spectrum of interesting research on blog data is pointed a single data set. So the answer to the question: what potential participants are supposed to do with it? is: do exactly what you are doing now - just do it with this data set!
In other words, we'd really like to see what Anjo and colleagues' conversation graphs look like when run over a large volume of data:
- will there be types of conversation?
- what is the scale of conversations?
- do they span long or short time periods?
- how many participants?
- are they mainly trees, full graphs - what is their structure like?
So Anjo, and anyone else working in the blogspace - I'd like to encourage yout to request a copy of the data and do your thing.



Hi Matthew,
I will at least obtain the data set. An issue is that it only covers three weeks which is a little short for conversation analysis and community identification.
Posted by: Anjo Anjewierden | December 03, 2005 at 11:58 AM