Much work on modeling the influence or importance of blogs relies on the structure of the blogosphere itself - in other words: in links. However, in links from blogs are only one channel through which readers get to the content. Most bloggers play close attention to their referral logs in order to monitor how people arrive at their sites (RSS readers, and stats produced by such services as FeedBurner are set aside for the purpose of this study). By looking at my own referral log, I can see people arriving from a number of different channels:
- Main Stream Web (MSW) search engines (Google, Yahoo, Microsoft, Ask, etc.),
- Weblog search engines (Technorati, BlogPulse, Sphere, Google BlogSearch, etc.),
- Other blogs,
- My own blog,
- unknown referrals
I can get a feel for how people get to my blog from this stream, but in order to really answer the question: where does blog traffic come from? with authority, one would need to aggregate a lot of data from many blogs.
Rather than doing that, I've collected a small set of referral data from 5 different blogs - I'd like to get an idea of what is going on to see if there are any hypotheses that might be considered, and which can be tested against a larger data set.
Let's take a high-level view of one of the sites: an A-list blogger.
- 26 % of referrals are unknown. There is not much we can say about this part of the sample. When looking at aggregate data, and comparing different blogs, it may be important to make clear assumptions about the similarity of bias in this unseen segment.
- 61 % of all referrals (including unknowns) are from the .com TLD.
- 24 % of the .com referrals come from google.com (that is, 15 % of all referrals). 1 % of these google referrals are attributed to Google's blogsearch system.
- 5 % of the .com referrals come from bloglines.com (that is, 3 % of all referrals) - from readers clicking through from the popular RSS reader.
- 3 % of the .com referrals come from technorati.com (that is, 2 % of all referrals). Of these, 17 % come from search (either keyword or tag based).
- At least 40 % of .com referrals come from blogs and memetrackers (including the one being studied).
From this, we might make the following observations:
- Google accounts for a significant amount of traffic to this blog, more than specialized blog search engines.
- The blogosphere itself (blogs and memetrackers) is a significant traffic driver.
If we take the 5 blogs in aggregate, our sample accounts for 26, 699 referrals.
- 39 % are unknown referrals.
- 50 % of the .com referrals are from MSW search engines (google, msn, yahoo)
- <2 % of the .com referrals are from blog search engines (google, technorati, blogpulse, sphere)
There are two main hypotheses to draw from this initial foray:
- MSW search engines are a major channel into the blogosphere - more so than weblog search engines.
- Visitors are far more likely to arrive at your blog via blogosphere naviagation than via a blog search engine.
These hypotheses have been derived from a look at a sample of referral data from only 5 weblogs (3 A-list and 2 others). Consequently, the directions indicated above may be way off. However, it does seem pretty clear that MSW search engines are a major part of the blogosphere (perhaps bringing readers who are not fully aware of blogs, or who don't surf the blogosphere in any regular manner). This relationship between the blogosphere and the MSW search engines also brings up the question of separability in the web: should blogs, with their peculiar and different linking characteristics, be part of the main stream search domain?
The key thing that I personally get from this is that the field is totally open for blog focused portals. If one can crack the discovery process in a manner which suites the navigation preference of the blog reader, then one can deliver a compelling interface to the blogosphere. One wants to be the Yahoo! of the blogosphere, not the Google.
The above statements are all based on hypothesis (for the ID readers, you may want to substitute 'theory' for 'hypothesis'). There are many admitted gaps in this mini-study, not the least of which is the big lump of 'unknown' referrals.
For reference, and fun, here is a treemap which shows referrals from the .com TLD for all blogs.
wonderful visualization. I was wondering which tool you've used to create this. Is it available in the public domain or is it proprietary ?
Posted by: saurab | July 21, 2006 at 01:00 PM
Saurab - homebrew.
Posted by: Matthew Hurst | July 23, 2006 at 01:01 AM
This is the thing I love most about data mining, the simple a clear analysis of fact.
The patterns you have discovered will become temporal as the use of Google's blog search increases. With the general public becoming more and more intrigued by the world of blogging then there will very different view.
I would be interested to see the results in 6 months time to make a comparison.
Posted by: Data Mining | July 27, 2006 at 11:26 AM