« July 2007 | Main | September 2007 »

August 31, 2007

Influence is not Authority

Let's imagine we have three blogs: A, B and C. A has 10, 000 readers, B has 100 readers and C has 10 reader. Let's also characterize the topics that these blogs write about. A writes about topics t1, t2, ..., t10, B writes about t5 and t6 and C writes about t5 only.

Now, suppose C writes something interesting on topic t5 and both A and B links to this post adding their own particular commentary. Who will drive more traffic to C? A or B? While A has many more readers than B, it is topically a very broad blog. The writer doesn't have the time (or expertise) to really go deep into the issues of all these topics. Consequently, her audience is not made up of experts in those areas and reads there to get a high level picture over a broad range of topics. On the other hand, B's readers pretty much just go there for 2 topics. B has time to go in to detail and understand those topics and is probably spending plenty of off line time in that topical space as well. Consequently, B will actually send more traffic (not relatively more, but absolutely more) to C than A will due to the specialization of his audience.

What I'm describing above is the difference between some notion of popularity (which may be called influence) and some other notion of authority (or expertise) and how these issues are related to both the blogger (blog) and the readers of that blog or feed. Measuring readership on topics is key to really modeling this stuff in social media which is why FeedBurner is such an asset to Google. It also captures why metrics for bloggers should capture notions of topic (something which BuzzLogic understands).

[Thanks to Akshay Java for discussions that highlighted this issue.]

August 29, 2007

Bloglines Beta

Briefly, beta.bloglines.com:

Bloglines

August 28, 2007

Google/Virtual Earth Wish List

I've been happily watching the march of features and data encouraged by the competition to win users of 3D Earth systems. Here is a short list of next generation features that I'd love to see:

  1. Day/Night simulation. With the increasing availability of building models and the greater detail in evaluation data, the ability to accurately demonstrate to the user the environmental differences that lighting can bring to a scene will the users ability to understand and explore new environments. In addition, the ability to set and animate the changes between night and day, and to simulate urban lighting (and even the inclusion of celestial data) would bring further value. Adding a user controlled temporal element would also broaden the types of data and their exploration (imagine browsing historical weather data on the globe).
  2. Atmospheric simulation. The views in Google Earth and Virtual Earth are impressive, but they lack many qualities of the real world. One key element of reality that has a great effect on rendering scenic views is that of atmospheric perspective. This is the fading out of things as they recede into the distance.
  3. Meteorological simulation. What does Pittsburgh look like on a cloudy muggy summer afternoon? How about an April shower in Edinburgh? Imagine looking up to the sky in one of these systems and seeing realistic clouds (even clouds that are driven by real time weather data).
  4. Ambient sound. The larks in the sky, the sea, Niagra, ...
  5. Vehicle and Pedestrian extraction and simulation. Most of the better photographic data available on earth simulators has recognizable vehicles and sometimes people. By recognizing where vehicles are, the systems could certainly place token models. In addition, the density of traffic and pedestrians could be calculated and simulated.
  6. Vegetation simulation. Trees and other land use features have been studied in GIS systems for quite a while. Imagine if we could identify these features and place realistic trees and other vegetation.

August 26, 2007

Finding Hot/Cold Beaches with HotMap

My hope is that eventually the web, data mining and online applications will make arranging vacations (especially to the beach) trivial. One problem that beach goers must deal with is the balance between how good the location is and how well known. HotMap - a project by Danyel Fisher at Microsoft Research (which I've written about before) - could be used to help with this. By spotting locations that look good (according to Microsoft's imagery) but which haven't been inspected by too many people may turn out to be the secret beaches that we are all searching for.

Hotmap3

Perhaps Danyel could create a new version: ColdMap.

August 25, 2007

Game Time

Wired has a nice article about Halo 3 focusing on the testing regime being used. Of the tools used in testing, I like the look of this time stamp map which is explained in the article:

In early tests, players wandered lost around the Jungle level: Colored dots showing player location at five-second intervals (each color is a new time stamp) were scattered randomly. So Bungie fixed the terrain to keep players from backtracking. Sure enough, the dots clustered by color, showing that players were moving smoothly through the map.

Wiredhalo

As well as being a great way to debug levels it acts as an interesting visual representation of the linearity of the level. In the above, there is basically one way through the terrain. 

Scoble (mis)Reads the Twitter Tea Leaves

I'm currently reading Fooled by Randomness by Nassim Nicholas Taleb which may explain my lack of appreciation on this, but I'm having a bad reaction to Robert Scoble's Fast Company post on Twitter and other micro-blogging platforms. My general problem is that it is all anecdotal, heavy breathing and hearsay when in fact it could have been simply founded in data. Scoable says:

[Micro-blogging] services mix contacts, instant messaging, blogging, and texting, and they're poised to make email feel as antiquated as the mimeograph.

It is pretty standard to underestimate the massive volume of email that we generate. According to Yahoo Answers:

As best as we can figure then, the number of emails sent each day far exceeds 2.25 billion. It may be approaching 62 billion.

Ballpark, blogging is around <small int> millions of posts per day, and Twitter is less. As for its growth, I don't see significant growth now in Twitter (though ask Akshay for a more informed opinion).

On the marketing potential of mining these <141 character tweets, Scoble states:

Sales and marketing are lagging in seeing the potential here. When I used all these services to tell the world that my wife and I were expecting a child in September, I anticipated hearing from the world's largest consumer-products companies begging me to try their latest diapers, food, car seats, and financial instruments. What came back? Nothing.

Firstly, there are two major paths in handling unsolicited expressions regarding products or product opportunities online (connecting with everyone or connecting with the influencers). Given that blogs and message boards contain far richer data and thus far more accurate and relevant mining results produced from it why would one jump on Twitter long before that space had been fully established?

Secondly, while I'm sure Scoble has plenty of anecdotal evidence, how much of the online conversation is relevant? Did he use Twitterment or some other search service to estimate the signal to noise ratio? If so he doesn't mention it in his post.

Well, perhaps when you have over 1000 blogs to read each day there isn't time for any really deep analysis and on must, as one so often sees on the blogosphere, go on instinct.

Personally, I think there is huge value in the Twitter data, but it is at the aggregate level (most likely intersected with geographic information and other filters).

KDD 2007 Videos

KDD has put up videos of some of the talks on the excellent Video Lectures site. Of note:

August 23, 2007

GigaPixel Images in Google Earth

Frank Taylor at the Google Earth Blog has posted a video demonstrating a new layer in Google Earth (v 4.2 required). The layer essentially adds portals to high resolution images on to the map and allows for modal interaction with the image. The interaction starts with a sweep down to the geolocated image which is then aligned with the surrounding 3d space. You can then navigate into the image which is refined like the standard tiling approach seen in mapping sites giving you access to the full gigapixel experience.

Here you can see the image (of Pittsburgh) situated in position against its subject.

Gegiga1

Selecting the interaction mode for the image then repositions the camera so that the image is aligned with the 3d context:

Gegiga2

Zooming in, we can then see the great detail in the image (note the full image displayed in the top right hand corner which indicates which part of the image we are now viewing).

Gegiga3   

August 21, 2007

Super Hero Social Network

My colleague, Mukund Narasimhan, pointed me to this paper by P. M. Gleiser on deriving the social networks of super heroes in the Marvel universe.

We analyze a collaboration network based on the Marvel Universe comic books. First, we consider the system as a binary network, where two characters are connected if they appear in the same publication. The analysis of degree correlations reveals that, in contrast to most real social networks, the Marvel Universe presents a disassortative mixing on the degree. Then, we use a weight measure to study the system as a weighted network. This allows us to find and characterize well defined communities. Through the analysis of the community structure and the clustering as a function of the degree we show that the network presents a hierarchical structure. Finally, we comment on possible mechanisms responsible for the particular motifs observed.

Of course, super hero networks aren't like those of us mere mortals!

Below we see some of the graph, with Spider-Man (SM), Thing (T), Beast (B), Captain America (CA), Namor (N) and Hulk (H).

Supernetwork

Microsoft, Google to Sponsor ICWSM

Briefly, as noted on the blog for the International Conference on Weblogs and Social Media, Microsoft and Google will be sponsoring our 2008 event in Seattle. Subscribe to the blog to keep up to date on conference news!

May 2008

Sun Mon Tue Wed Thu Fri Sat
        1 2 3
4 5 6 7 8 9 10
11 12 13 14 15 16 17
18 19 20 21 22 23 24
25 26 27 28 29 30 31
Blog powered by TypePad