May 15, 2008

Industry Standard's Top B-Z Blogs

The Industry Standard has produced a list of B-Z blogs of note.

These are the blogs you won't see on the Techmeme Leaderboard, Technorati's Top 100 blogs, or the CruchBase BloggerBoard ... at least not yet. They include VCs, entrepreneurs, coders, experts, and observers, and they bring a delicious mix of insight, experience, and passion to their blogs. While they may not have the right amount of link love, they need to be on your radar screens.

I'm very happy that this blog has been included on the list!

May 14, 2008

Future of News: Day 1 Summary

Had a great day of presentations, panels and discussion today at the Future of News event here in Princeton (which reminds me a lot of Cambridge). In summary, I heard both optimism and pessimism regarding the future of news. Things that seem to be of concern:

  • The collapsing of the newspaper model (which has had plenty of coverage) - though the Guardian's model was held out as an exception.
  • Of greater concern: the lack of watchdog journalism implicit in "decentralized non-market" forms of media.

Optimism was expressed largely by those who were actively trying to push the evolution of the space (the best way to predict the future is to invent it).

I was mistaken for a BBC employee - can I put that on my CV?

May 13, 2008

Worldwide Telescope

Microsoft Research's Worldwide Telescope is available for download. I'd encourage you to go and take a look. I would have been writing up some details of it for this blog, but it is so easy to spend time exploring the sky and the planets that I'm left with no time right now! Note that it does provide planet models (including earth) as well as astral data.

Wwt

May 12, 2008

Brand Tags

This is a great idea: the user is shown branding and asked to provide a single tag. You can then click through to see the tag clouds (which could be displayed a little better) for each brand.

Brand_tags_twitter

[Via Nathan]

Powerset Factz, Star Wars

One of the best things about Powerset is its Factz feature. If you look at a page for a movie, you can see a pretty neat, completely automated summary of the plot. Have a look at Star Wars.

Powerset_starwars

Powerset Launches!

Powerset, which provides a new relationship with web data via innovative interfaces and natural language processing, launched this evening. Take a look at this video:

I'll write more later, but for now, check out other posts I've made on Powerset and NLP. I'll try to keep abreast of the commentary as it comes in. Meanwhile, I'm waiting for Fernando to pounce.

Update: ok, some comments. A couple of things that people are going to get hung up on. Firstly, writers seem to be referring to the technology as context or contextual search - why not call it NLP. Not sure where that is coming from. Secondly (actually, this is more important) pundits are going to write about the wikipedia-only issue. They're not getting it. 90% of search results come from a tiny fraction of web pages due to the huge redundancy on the web and the differences between searcher needs and author/publisher intents. The task isn't to always search that huge set, but to get the answers to the user.

May 11, 2008

Spectra Visual News Reader

Another interesting find from Information Aesthetics. News classes, selected via the top menu, populate a rotating column of articles that are then read at the bottom of the display. Fun - not sold on the utility.

Spectra

News, Opinion and Efficiency

Jeff Jarvis writes up some thoughts spring boarding from Nick Denton's post regarding the news/opinion divide. At the highest level, this is about the value of humanizing information. There are two related points that I think are missing from this discussion. The first is the value a source of information provides to the user by enabling them to be efficient consumers of that information. The second is a little more complex, and is to do with network effects and homophily.

Efficiency: news sources, or rather, news aggregators, must make decisions about which pieces of news to present to the consumer. In addition, they must figure out how to present this news. Objectivity and the editorial role play in to this by removing distractions and providing a relevance function to the possible set of news items. Opinion - that is to say - removing either or both of these filters - may well lead to a lack of efficiency on the side of the consumer.

Homophily: consumers, being human, are subject to homophily. Thus, the more human/emotional an information source is, the more it will strengthen reading behaviours that are driven by this seeking of like minded writers. With ideal information distribution goals in mind (allowing information consumers to be more efficient and better informed) this will do a disservice to readers.

I think the bigger picture here is to do with trust. If we could trust our news sources, then objectivity and editorial control would be fine. However, the forces that determine what a news source reports work directly against trust as they are financial. Bringing in the emotional element - the personality of the writer - into the picture provides a powerful connection with the reader, thus replacing trust with a personal relationship.

May 08, 2008

Non-engaged Blogging and Reimagining Social Networks for the Blogosphere

The discussion over the definition of blogging is as old as the practice itself. For some all a blog is is a publication mechanism - thus any use of that mechanism is blogging; for others it is a certain publication and interaction behaviour through the web. One aspect of the application of social media infrastructure that I'm becoming more aware of is the level of engagement. For example, a typical blogger may write posts that link to other bloggers, and is likely to follow up with comments posted on their own blog. In addition, such a blogger may well respond to posts that link to their blog via the comments on that other blog or via posts on their own blog. Such an individual is engaged in the blogosphere.

At the other extreme, we have those who write blog posts that never link to other bloggers and, though they may receive a large number of comments, don't respond to these comments via their own commenting system. Such an individual is, we might say, a non-engaged blogger. Another example of this being the tweeter who has plenty of followers but who never issues an @'d tweet.

While the definition of blogging may still be in debate, the behaviours above can certainly be determined from pretty clear signals automatically. I'm guessing that someone has already done this analysis - anyone know of a paper?

An area of social media research that this measure has impact on is social network analysis. Typically, when inducing a social network from blog data, researchers look for reciprocal links. However, many political bloggers, while being of the non-engaged type, catalyze discussion in other blogs, or even simply within the many comments that each of their posts receive. Thus, one might argue, the simple notion of a tie between nodes should be abandoned for a model that can capture the different types of behaviour precipitated by different types of applications of social media publication technology.

I've long be suspicious of the wholesale adoption of real world social network analytics applied to social media, and blogging in particular (just as I am skeptical of the use of terms like 'conversation' when applied to this data). The above ideas, to me, seem to capture something of the reason for this discomfort.

May 05, 2008

The TechMeme Bikini

Some stats regarding the distribution of headlines on TechMeme between A-listers and others seems to be getting some attention. Attention, but little real thought. The basic observation is that while 70% of the headlines on TechMeme are accounted for by the top 100 ranked sources (according to the leaderboard), 30% is from the long tail.

For the sake of argument, let's assume that the data is static - that is to say, that the leader board 100 is always the same (it isn't). Let's also assume that TechMeme crawls 10k weblogs (I don't know that it crawls that many). Let's make some more assumptions: that every weblog posts 1 post a day and that there are 10 headlines per day on TechMeme. Thus, there are 100 posts per day from the top 100 sources and 7 of them will appear on TechMeme. Thus, each of the top 100 sources has a 7% chance of producing a headline on any given day.

The other 3 headlines come from the remaining 9, 900 sources. Thus, if they are also producing 1 post a day, each source has a 3/9, 900 =  0.03 % chance of getting noticed. So while the 2:1 ratio of A-listers to others sounds good, for any individual, it actually translates to 233:1 odds (7/0.03).

Of course, the assumptions above are a little rough and there is absolutely no accounting for how network effects really get things done in the blogosphere. The point is, there is a 2 orders of magnitude difference in these numbers between what an individual can expect and what the groups (A-listers/others) can expect.

May 2008

Sun Mon Tue Wed Thu Fri Sat
        1 2 3
4 5 6 7 8 9 10
11 12 13 14 15 16 17
18 19 20 21 22 23 24
25 26 27 28 29 30 31
Blog powered by TypePad