[Update: I've created a new image with some improved qualities.]
What does the blogosphere look like? Well, I'm not really sure what the question means. However, it is certainly intuitive to think about the graphical structure of the blogosphere, where nodes are blogs and edges are the links between them (either from blogrolls, trackbacks or links). I've tried a few experiments to draw this graph and bascially it has demonstrated that the blogosphere is one giant hairball. As a graph drawing problem, this visualization challange has two solutions: use a different graph layout algorithm, or draw a different graph.
For the upcoming workshop, I'm really keen to produce a good visualization, so I've been thinking about drawing a different graph. The problem seems to be that there is essentially no typology to links. Blogs which are topical are good sources of rich link strucutre as they keep on topic. The majority of blogs, however, are diary in nature, and so their contents tend to be somewhat random (hey - have you seen this?). Consequently, the link structure is a mess.
There are some pretty obvious things one can do to start removing links based on trivial count based filters: remove all links between two blogs that have fewer than t instances, remove blogs which have fewer than c citations, etc. I'm interested in something a little more subtle, however. I want to look at the blogosphere from the point of view of robust, rich community structure. Basically, I want a magic filter that removes all blogs which don't participate in community of some sort.
The following image is my first pass at doing this. I'm not yet ready to talk about the filtering method used. However, it does attempt to follow the basic goal above. The nodes displayed represent blogs. The size of the node is a rough indication of the number of citations. The colour of the nodes indicates livejournal (blue), blogspot (red), typepad (green), wordpress (cyan) and Weblogsinc (pink) - all other blogs are gray. The layout algorithm is a variation on the force based organic method and has been iterated 1, 000 times. The basic interpretation: blogs that are near each other cite each other more than those that are further apart. The data was taken from the workshop data (which contains approximately 1 million blogs and 10 million posts).
A couple of observations:
- Livejournal is very self referantial and keeps away from the rest of the blogosphere.
- Typepad and Blogspot appear to be well mixed with the rest of the blogosphere.
- There is a small group of WordPress blogs off to the right.
- Weblogsinc blogs (pink) form a tight little cluster - probably due to lots of interlinking.
Note that this is a very preliminary result. Note also that this data is the largest connected component in the entire graph - I cut out the rest of the blogosphere that wasn't linked to this cluster.
Feels like I just had the first glimse into a new galaxy!
I agree with you about LiveJournal. I just visited a LJ blog and I was appalled. What a messy scene.
This drives me to speculate that there are Mini-Blogospherias within One Blogiverse of all blogs and blogoid objects.
MySpace and Xanga are other sub-cultural sets within, not the Blogosphere, but the Sub-Blogosphere, the marginal area of hook-up dating sites, romantic networking, encouraged, in MySpace (see my SoMeEx blog in MySpace) by Erotic Ads, "it's nice to be naughty", ads with a woman in panties pulling her sweater up over her head.
MySpace "bulletins", "friends lists", "music bands as friends", a high school Pseudo-Blogoteria.
Posted by: steven streight aka vaspers the grate who conducts micro-blog experiments like this | January 27, 2006 at 10:00 PM