My Photo

 

  • Subscribe with Kindle

April 05, 2009

Comparing Twitter and Blogosphere Latency

With all the bzzz around Twitter, and its apparent ability to walk on water right up to the brass ring of search, I’m still looking for the definitive study comparing discovery times in Twitter with those in other forms of social media. The study is pretty easy to perform:

  1. Grab some Twitter data and extract some URLs
  2. Use Twitter search to find the first mention of those URLs
  3. Use a blog search engine to find the first mention in the blogosphere
  4. Vice versa

This news article Defiant N Korea launches rocket first appeared on Twitter here 5 hours ago, but the earliest I can find it in the blogosphere is here 4 hours ago.

Endangered right whales appear to be on the rebound appeared a full day in the blogosphere before Twitter.

Neither the blogosphere nor Twitter cares about Ice bridge ruptures in Antarctic, and it seems neck-a-neck for Russia to unveil spaceship plans.

Of course, one would have to do a bit more work to really see what is going on. Twitter search doesn’t dereference URLs, so you can’t really search for the first mention of something, never mind the whole can of worms called URLs (the BBC usually has 2 URLs per story…).

And which stories do you want to check for anyway? Those that snowball into big piles of links may be less important (some signal will reach the user anyway).

Twitter search’s trends currently include ‘North Korea’, but then again, Google’s blogsearch has a top cluster for the same topic (and using the title of a post ‘North Korea launches rocket’ is more informative). Twitter also includes ‘DSI’, but Google has ‘Console Review: Nintendo DSi’.

January 16, 2009

Blog Hosting Statistics

Pingdom has published some interesting statistics regarding the use of different blogging platforms and technologies within the Technorati top 100. There are interesting not just due to the basic distributional data (Wordpress accounts for 27 of the top 100, MT for 8, Typepad for 16, Blogger for 3) but also in what the report reveals about how Technorati computes the top 100. For example, Weblogs, Inc. accounts for 14 of the top 100, Wired for 9 and Gawker for 9. I'd be interested in looking at the sister linking between sites in these networks.

Another part of this signal is to do with comment spam. The more popular the blog, the more important it is that the authors have control over spam.

November 01, 2008

Google Blog Search Changes

I’ve been noticing some weird behaviour in Google’s blog search recently that makes me suspect that are testing new features and possibly a new back end or index. Firstly, I noticed changes between different attempts to search for the same item. There was a significant reduction in the number of results with many legitimate posts missing. Secondly, I’ve started to notice hits for search terms that are not in the post. For example, a search for “political streams” brings up this post from Blog About Stats which doesn’t mention the phrase, but which (as of the time of writing) has it in a title of a post under it’s ‘Recent Posts’ list. The strange thing is that this feed isn’t partial. I had originally thought that Google was attempting to fill in partial feed data and getting it wrong, but this doesn’t seem to be the case.

October 30, 2008

Interactive Visualization of Blog Search

Briefly, an interesting blog (via Twingly) recording progress in a project to create a new user experience for blog search.

October 20, 2008

Blogging is alive and well

This month’s Wired has an article title ‘Kill Your Blog’ – it’s a great article. Great, that is, as an example of poor writing, logic, journalism, etc. It’s written by Paul Boutin for Valleywag, so it may be complete fiction. The basic theme of the article is: blogging has been overwhelmed by corporate content (that, or blogs have become corporations), so little-guy blogging is a waste of time – you will never be heard.

Keeping with the great tradition of blog-o-journalism, Paul uses examples of Jason Calacanis and Robert Scoble as evidence of what is happening in the blogosphere. This is like comparing Larry Ellison’s mega-yachts to a second hand row boat you are thinking of buying on cragslist – i.e. not an exemplar for the population.

Boutin also uses the Technorati 100 list as evidence for something being wrong with the blogosphere, and random, obnoxious comments left on posts as reasons to stop writing. The attitude seems to be – let’s have everything stay the same and wait for bit rot to set in then claim the space to be done with.

Personally, I found the article to be a perfect list of reasons to do something really useful with social media: again, Google’s blogsearch is the thin end of the thin end of the wedge, as is Political Streams.

Sigh, I know Wired’s there to sell copy and one way to do this is to be crazy and controversial – but this is just lame.

October 16, 2008

Animated Weblog Diffusion

As part of my presentation at the Personal Democracy Forum 2008, I showed an animation of a simulation of diffusion through the blogosphere. Anthony and Guilhem (also presenting at PDF) have just released a version of this on real data over on their Presidential Watch 08 site. They talk about it here, and the two demonstrations can be found here and here.

The visualization is an augmentation of the existing graph based view with a new display/control showing the attention over time to the video that is the subject of diffusion.

image

October 15, 2008

Google Blogsearch Update

I wrote on Monday about Google Blogsearch and their graphs. It looks like some events do produce more interesting temporal signatures. Below is the graph relating to the divorce of Madonna and Guy Richie:

 

October 13, 2008

Google Blog Search Update

[This is late, but I’m getting back to my normal posting rate…]

Google recently updated their blog search home page to include analytics over recent blogging activity. The update is essentially an application of clustering similar to that found in other memetrackers and on Google’s news product.

image

In addition, drilling down on a cluster provides (can you guess?) a time series of attention around the topic (I assume these are the number of blogs per day in the cluster.

image

It is great to see Google pushing beyond the simple search interface to provide something more appropriate to (a certain use case in) the blogosphere. However, I’ve yet to find a graph that looks much different from the one above. I’m not sure what this means. I imagine that the clusters are not very dynamic – there is not clear set of interesting temporal signatures for the stories, at least not at volume that Google is crawling.

What we’ve done at Political Streams is to apply a similar sort of clustering, but to trend the content of the blog posts rather than the clusters themselves. This gives the user an understanding of both what is being discussed, and how that discussion is trending over time. Note also that Political Streams is not just limited to weblogs, which allows the system to surface news stories that are breaking in other social media ecologies before they even get to the blogosphere.

September 10, 2008

What Should Blog Search Look Like?

This summer, Marti Hearst has been visiting Microsoft Research. Marti, with contributions from Sue Dumais and myself, put together a position paper on weblog search to be presented in October at CIKM: What Should Blog Search Look Like?

July 09, 2008

Weblog Activation Simulation

This post is partly to show the animation that I presented at PDF2008 and partly to test Vimeo (thanks to Jeff Jarvis and Jake for the recommendation).

This graph animation illustrates the reach over time of a blog post. When a blog is activated (when a node is selected by the pointer) other blogs that link to it have some chance of referring to the injected post. The animation is intended to give an impression of how information might spread in the blogosphere.

Firstly, I activate a few peripheral blogs – these don’t have many connections and so we don’t see much spread. Then a small community in the south west corner is activated – information spreads within the community but doesn’t break out. Finally I look at a few blogs in the central core. Here we see information spreading through the core.

Note that the graphical data is real – this is a visualization of the core of the blogosphere circa 2007; the spread of information is only a (simulated) illustration. Note also that there are many other (better) ways to model diffusion – I’ll publish more animations as and when I start looking into this area!


Weblog Activation Simulation from matthew hurst on Vimeo.

The weblogs that I use in this example are:

Twitter Updates

    follow me on Twitter

    July 2009

    Sun Mon Tue Wed Thu Fri Sat
          1 2 3 4
    5 6 7 8 9 10 11
    12 13 14 15 16 17 18
    19 20 21 22 23 24 25
    26 27 28 29 30 31  

    Categories

    Blog powered by TypePad