David Sifry, in his latest spin on blog statistics, illustrates the size of the long tail:

To give a sense of scale, if this chart was kept to the same scale and I printed out the additional sheets necessary on regular 8.5 x 11 inch sheets of paper in landscape mode to show the entire long tail, the length of the complete graph would be about 120 pages long, making the entire chart about 110 feet long!

Reading this, I suddenly realised that the notion of the long tail is completely misleading. The long tail takes its name from the type of chart that David is using - one which orders items (blogs) by some quantity (inlinks). This produces the now familiar curve with a tall intial area (the big head), a rapid decline and the rest of the data points vanishing over the horizon in the long tail. The reason that this metaphor is misleading is that most of the mass of the long tail consists of data points for which the quantity being graphed (here, the number of inlinks) is equal.

Take a simple example. Let's say we have 10 data points. The first (A) has a quantity of 10, the second (B) a quantity of 3 and the remainder (C-J) a quantity of 1. A is the big head, B is the drop in the curve (B-lister bloggers) and C-J represent the long tail. But - in the long tail metaphor, C-J have to be ordered even though they all have the same quantity. In fact, they represent a fat tail - a big blob of data points which are all equivalent.

Visual metaphors are tricky, apparantly.

OK, but the length of the x-axis would still be the same right? So it's both fat and long?

Posted by: Niall Cook | February 14, 2006 at 08:27 AM

Hm..... it would be nice to graph this out with reliable data over the long term. I wonder if its migrating from a zipf distribution to a more linear distribution.

I'm to pessimistic to believe that the short head will go away anytime soon.

I'd love to be proven wrong though!!! :)

Posted by: Kevin Burton | February 14, 2006 at 04:47 PM