« September 2007 | Main | November 2007 »

October 31, 2007

International Conference on Weblogs and Social Media 2008: Update

We are fast approaching the deadline (December 3rd) for technical paper submissions to next year's meeting, so now is a good time to give a broad update on the conference.

Firstly, we have a new invited speaker to join Bernardo Huberman and David Sifry: Brad Fitzpatrick will be with us in Seattle. Brad is probably best known for founding LiveJournal, one of the biggest blogging/network platforms out there.

Secondly, the venue has been selected. ICWSM 2008 will be held in the Seattle Hilton. Hotel booking will be on line in the near future at the conference web site.

Finally, a reminder: ICWSM 2008 will feature two compelling tutorials - Subjectivity and Sentiment Analysis (Weibe) and Graph Mining Techniques for Social Media Analysis (McGlohon and Faloutsos)

Candy - It's What's for Dinner

Halloween

Google's Universal Search - Confused?

A short time after the fires in SoCal hit the news I noticed quite a bit of traffic coming from Google's main search page. At that time, a search for 'Santiago Fire' incorporated blog posts at the bottom of the page and one of the 3 highlighted posts was mine. The inclusion of blog posts is part of Google's 'universal search' paradigm in which, for any search, a mixture of different types or results is presented interlaced in the standard one dimensional result page.

One of the maxims of interface design is predictability. An interface shouldn't change from under you. A search now for 'Santiago Fire' doesn't produce blog results. A search for 'Gordon Brown' produces, at the top of the list, first images of the man himself, then news articles then organic results. Actually - it does if I do the search on IE6, but not if I do the search on Firefox (Japanese).

Is there anyway you can predict what you will get? With some luck and some guess work, one can look at Google news and produce search results that will contain blog posts. A search (right now) for Robert Goulet - the Camelot star who died yesterday, produces links to these three blog posts (note that the last, a post on TMZ, results in a redirect to the main page, perhaps they removed the post - oops!). A search for 'Google Phone' also very visible on the Google news pages, doesn't produce any blogs. Note also that the blog posts that are highlighted on the search page are not the top most relevant posts (according to Google's blog search engine).Two of the Robert Goulet posts are on the first page of relevant blog posts, but not the third.

In addition to the lack of predictability in the results page, one can't page through the mixed results. The first page offers the interlaced results, but going to the next page only offers organic results.

I think it is great that Google is addressing the interface problem - the web is so much more than a bland list of pages. However, the current approach seems more confusing than, say, Ask's 3D layout (though I notice that that is shifting over time too...)

October 29, 2007

Social Media Analysis - Different Strokes

Last week, Nielsen BuzzMetrics - a former employer - staged their second CGM Summit (you may recall the silly backlash that occurred after their/our first event - it was off limits for bloggers which produced some whinging from those left out and in the dark as to what it was really about). I wasn't at this year's event, but according to Peter Kim, Carson (CEO) discussed their plans for moving in to 15 new international markets.

No mention of new products and in the near term the focus will be on international expansion, i.e. 15 countries.

Social media analysis at scale requires a sensible mixture of automated and human analysis. There are many opportunities for applied text and data mining to help support the scaling and efficiency that are required for survival in this space. However, rapid expansion in to international markets will require the development of these systems in a number of different languages. The cost there will be linear in the number of languages (speakers of these languages will be required to develop the systems and to provide analysis).

Meanwhile, Nathan Gilliat (whose blog is a a good read for keeping up to date in this space) mentions the roll out of a cheap, off the shelf alternative to BuzzMetrics from Andiamo. Andiamo offer a free trial of their system (which I've not yet taken). They join BuzzLogic in approaching the market with a very different approach to pricing.

Some posts that discuss the CGM Summit:

October 28, 2007

Listas: Live Labs Technical Preview

Listaslogo_2 Live Labs recently announced a tech preview of a new social content platform names Listas. From the Live Labs blog:

Listas is a tool for the creation, management and sharing of lists, notes, favorites, and more. It allows you to quickly and easily edit lists, share them with others for reading or wiki-style editing, and discover the public lists of other users.

Listas has one fundamental data structure - the list. The list allows for indenting - effectively the creation of hierarchical lists, or sub-lists. In addition to wiki-like editing of list data, Listas comes with another key component: a toolbar. With the toolbar installed, users can - upon encountering something they would like to store in a list - push data to their account on Listas. The toolbar page explains:

Quickly navigate to Listas web site.
One click to add a link to your current page to a list.
Select page content and click once to add it to a list.
Highlight web page "clippings" and add them to a list
Listas also offers a community aspect which allows for the sharing and joint editing of lists.

October 24, 2007

Map of the Santiago Fire

[Update: A comment by Jeffrey links to Bruce Henderson's blog; Google's LatLong blog points to more resources.]

The OC Register has put up a map of the continuing fire.

Ocfire

Of course, being in Google Maps, we can switch to alternate views!

Ocfire2 

October 23, 2007

Technorati Brings Charts Back

I'm happy to see that Technorati has brought back the display of charts showing the number of blog posts matching a search term over time. Below is the chart for "south africa".

Technoratisa

The Most Important Blogs for Efficient Readers

Cascade Current systems for ranking blogs are largely about inlinks. Technorati and BlogPulse both use this basic measure of citation to create their lists; TechMeme - whose new list created plenty of discussion on the topic - takes the algorithm it uses for placing stories on its home page (essentially, another citation based approach) and aggregates visibility information. Additional features to consider include the number of feed subscribers and the number of visitors to the blog site. However, there are plenty of alternative approaches to creating a list of important blogs.

The above approaches are motivated by some (vague) notion of influence - a term that is central to the analysis of social media and blogs in particular, but one which has not really been given a full, well grounded definition in the space. However, there is also the issue of reader efficiency - ensuring that the consumer of blog data maximises the value they get from reading blogs.

A group of researchers at CMU have been considering a notion of blog importance based on how likely a set of blogs is to ensure that you will be informed of topics bursting in the blogosphere. By analogy, they consider a graph of water pipelines. Their paper - Cost-Effective Outbreak Detection in Networks Leskovec, Krause, Guestrin, Faloutsos, VanBriesen, Glance - poses the problem:

Given a water distribution network, where should we place sensors to quickly detect contaminants? Or, which blogs should we read to avoid missing important stories? These seemingly different problems share common structure: Outbreak detection can be modeled as selecting nodes (sensor locations, blogs) in a network, in order to detect the spreading of a virus or information as quickly as possible.

As a result of this work, the authors have published some blog lists which answer a fundamentally important question in terms of weblog reading habits: Which weblogs should I read to be most up to date? The lists answering this question - generated by the approach described in their paper - come in a number of varieties to be found on the project's page.

Highlights from the work include the top 10 and bottom 10 from the list of blogs to read to be the most up to date on stories if you only have time to read 100 blogs. It must be noted that this work is a theoretical exploration - the dataset mined to create the list is not a live corpus of blogs; thus some of the blogs may be stale or even abandoned.

1 http://instapundit.com
2 http://donsurber.blogspot.com
3 http://sciencepolitics.blogspot.com
4 http://www.watcherofweasels.com
5 http://michellemalkin.com
6 http://blogometer.nationaljournal.com
7 http://themodulator.org
8 http://www.bloggersblog.com
9 http://www.boingboing.net
10 http://atrios.blogspot.com
... ...
91 http://www.saysuncle.com
92 http://www.privacydigest.com
93 http://www.londonist.com
94 http://www.shanghaiist.com
95 http://markshea.blogspot.com
96 http://www.singleservecoffee.com
97 http://jeremy.zawodny.com/blog
98 http://www.scienceblogs.com
99 http://www.basicthinking.de/blog
100 http://scobleizer.wordpress.com

Note that another view of the data - which blogs to read if you can only read 500 posts - generates quite a different list of blogs.

October 22, 2007

Images are Data

Information Aesthetics points to this page with a zoomable interface to a treemap like visualization of the internet. A friend working on the Gigapan project mentioned to me that it was surprising to see what types of 'panoramas' people had uploaded to the site - some not photos at all. I'd love to see this internet map uploaded to Gigapan!

Internettreemap

October 21, 2007

Feed Problems

A number of readers have contacted me to let me know of some problems with the feed for this blog. As a result of posting about some FeedBurner/Google issues, Dick Costolo of FeedBurner has been looking in to things and doing a great job of keeping me informed. As far as I understand it, there are some issues relating to both TypePad's creation of the feed as well as the link between TypePad and FeedBurner. I've been testing the feed on a number of sites in a number of browsers (Bloglines, Google Reader x IE7, Firefox and Safari) and haven't actually noticed any issues (which makes the whole thing that bit more mysterious).

Please sit tight - I'm sure there will be resolution RSN.

May 2008

Sun Mon Tue Wed Thu Fri Sat
        1 2 3
4 5 6 7 8 9 10
11 12 13 14 15 16 17
18 19 20 21 22 23 24
25 26 27 28 29 30 31
Blog powered by TypePad