My Photo

 

  • Subscribe with Kindle

« May 2008 | Main | July 2008 »

June 30, 2008

In Theory

Below is the first part of a post in response to Chris Anderson's latest cover article in Wired magazine entitled The End of Theory: The Data Deluge Makes The Scientific Method Obsolete. I had the whole post finished and ready to go, but hadn't counted on the poor quality of software that Sixapart (which hosts this blog) has rolled out in the latest version of its post editing system. Most of my work was lost while checking spelling. Rather than try to recover it, I'll point you to John Timmer's post at ars technica.

//

First of all, let's be abstract. A system S produces a number of events. Any single event E may generate a set of observable data O. This data is interpreted by some observer and may be recorded in some form. The system might be the weather, the event might be a hurricane, the observable data might be the change in atmospheric pressure.

Now, let's imagine that you have a big collection of data. You can look back at it and say I saw X, then I saw Y (perhaps to subsequent readings of a barometer). In fact, if you saw X right now, you might be inclined to say that you expect to see Y shortly in the future. However, let's imagine that you see X'. What could you say about your expectations for the next reading? Without some model, you can't really say anything. Now this model might be a model of the data. That is to say, you might fit a function to the data and use that to predict the next point. Or the model could be that of the underlying system (which you can't observe directly). Either way, you have stepped over the line from data to a model.

The really neat thing about models is they allow us to peer through the thin veneer of data and glimpse the next layer of the world. They extend our context and our understanding. In addition, and from a more utilitarian point of view, they allow us to predict things in the future based on observations that haven't been made before.

Another example. I come across a word that I've never seen before. Immediately, I can use that word and apply all manner of morphology to it such that others who speak my language can understand. It is new data, but because I have at some level abstracted the language (we might say, modeled the language) I can painlessly handle that novelty.

And yet another example. No matter how many times we observe an apple falling from a tree, our data will tell us nothing about the trajectory required to send a rocket to the moon. The only questions we can ask of the data are about the past, limited to events that have already occurred. With no model - with no theory of the underlying system (S) we can't ask question about things that we have never experienced.

June 28, 2008

Theory - Meta Post

Chris Anderson's latest article in Wired about the lessening need for theory in science brought on a fit of cognitive hysteresis. I was pretty annoyed by the article, but didn't write anything as I wanted to make sure I had a clear, thoughtful post to put out (Chris is a smart guy and deserves it). I hope to be getting that post published soon.

For now, I'd like to point out an interesting artifact of the current world-o-media that we live in. This post, by Kevin Kelly, relating to Chris' article starts of thus:

There's a dawning sense that extremely large databases of information, starting in the petabyte level, could change how we learn things

I find this type of language quite strange. It sounds like a collective is experiencing this 'sense'. But who is this collective? What is the evidence that Kevin observes that results in this statement. This kind of language leverages authority but fails to support the statement - in fact the intuitive phenomenon that is imagined denies the need for any evidence.

Perhaps something for the Language Log?

Summer Animation

Kung-fu Panda did $20MM on its first day. Wall-E is getting plenty of attention as it opens, so it will be interesting to see if it beats the Dragon Warrior to the punch.

Animation

Weather Tweets

[Updated with additional map.]

Walter Rafelsberger has created a weather map which uses an analysis of tweets to determine current conditions. Below is a screen shot of the current weather in the US.

Weathertweets2

Compare this with the current satellite map.

Weather2

Here's another example with Austria:

Weathertweets

Compare this with a map from a more traditional service.

Weather

June 24, 2008

Dealing With Real Time Media

I attended a panel yesterday (at the Personal Democracy Forum 2008) on the live web (Robert Scoble, Bhaskar Roy, Max Haot, Keith McSpurren). A couple of observations about streaming live video, tweets, etc.

  1. The production of these streams leaves little room - or time - for self editing.
  2. The consumption of this data leaves little room for filtering - why would we consume real time data yesterday?

One problem that I see with the tools we currently have for handling this data is that they will follow the path of email. Email clients rapidly evolved to the 3-pane approach to consumption and never went anywhere else. There are no real analytical tools (except, perhaps, Xobni) for helping us deal with email.

I see an opportunity in this space for user facing tools that leverage the advances in social media analysis (including text mining, network analysis, etc.) to help us summarize and select the data, making the data more relevant, and the consumption more efficient.

June 23, 2008

Politics, meet Social Media Analysis

In preparing for speaking at Personal Democracy Forum 2008 I spoke with a number of colleagues in the social media analysis space asking if they were engaged with any of the presidential or primary campaigns. I didn't hear of any who were.

There was a panel in the morning session today in which a number of staffers from the various campaigns present and past talked about the tools and strategies of campaigning. It was apparent from their answers that they didn't use any real social media analytics systems. They did talk about dealing with lots of email, and building internal systems to help with that. Again, it didn't sound like they were using any text mining tools to help with this.

So, it seems like there is a big opportunity for companies in the social media analysis space, but there is also something of a disappointment in hearing that the politicians are not exploring what can be done with these technologies.

June 21, 2008

Personal Democracy Forum 2008

I'll be in New York next week speaking at Personal Democracy Forum 2008. The title of my talk is 'When Worlds Collide: Social Media, Mainstream Media and Politics. In it I will discuss the value of large aggregates of social media data as a lens for understanding the political process and landscape. The presentation is very visual and I hope to have some version of it online soon. Readers of this blog will much of the content, but there is some new stuff in there as well!

Google Trends Update

Briefly, Google Trends now provides results for website traffic data (via TechCrunch). Still with the frustrating lack of y axis units Mary points out in the comments that you can get the missing y-axis info if you log in to your account (also - don't look for results for Google, they aren't available).

Googletrends

Rethinking The Search Metaphor

Binary search is an elegant way for a computer program to find a specific item in a set of sorted objects. It takes the sorted set and make progressively accurate guesses as to where the desired item might be using the knowledge that the set is sorted.

When I'm looking for a book in my office, I stand in front of the bookshelf and home in on the location based on many factors - rough organization of books by topic, a vague recollection of the book cover and so on.

These are examples of what we might call holistic search (if you have a better term, leave a comment). What I mean by that is that we have some sort of access to the entire space of objects - not necessarily in fine grained detail, but certainly at some useful level of summary. Searching game spaces are similar, though here the search is intentional, not extensional.

Now think of finding a resource in a library. We ask the librarian for a book on X and the librarian uses their knowledge and cataloging tools to determine what would be a good suggestion. We could call this agent search - an expert (agent) interfaces between the need on one side and the data and tools for exploring that data on the other.

To complete the picture, let's think about fishing. We put some attractive bauble (or morsel) on the end of a line and throw it out and into the water, not having visibility as to what lies below the surface but making some good guesses based on our knowledge of the behaviour of our prey.

So, from the users perspective, we have holistic search (visibility into the result space), agent search (collaborate with an intelligent intermediary) and fishing. Which of these models best describes our interactions with the web through 'search engines'? In the above, I assume that the desired result is a single resource (an item in a list, a book, a fish). Of course, more often the desired result is some piece of information which may be synthesized from the object resource material (a feature of the item, a fact in the book or a fish pie).

June 19, 2008

Twingly: Innovating The (Blog) Search Interface

I really like the vision behind this interface to the blogosphere that the six Twingly interns are working on. It manages to push all the right buttons: blogosphere, graphs, zoomable interfaces and fun.


Twitter Updates

    follow me on Twitter

    July 2009

    Sun Mon Tue Wed Thu Fri Sat
          1 2 3 4
    5 6 7 8 9 10 11
    12 13 14 15 16 17 18
    19 20 21 22 23 24 25
    26 27 28 29 30 31  

    Categories

    Blog powered by TypePad