Below is the first part of a post in response to Chris Anderson's latest cover article in Wired magazine entitled The End of Theory: The Data Deluge Makes The Scientific Method Obsolete. I had the whole post finished and ready to go, but hadn't counted on the poor quality of software that Sixapart (which hosts this blog) has rolled out in the latest version of its post editing system. Most of my work was lost while checking spelling. Rather than try to recover it, I'll point you to John Timmer's post at ars technica.
//
First of all, let's be abstract. A system S produces a number of events. Any single event E may generate a set of observable data O. This data is interpreted by some observer and may be recorded in some form. The system might be the weather, the event might be a hurricane, the observable data might be the change in atmospheric pressure.
Now, let's imagine that you have a big collection of data. You can look back at it and say I saw X, then I saw Y (perhaps to subsequent readings of a barometer). In fact, if you saw X right now, you might be inclined to say that you expect to see Y shortly in the future. However, let's imagine that you see X'. What could you say about your expectations for the next reading? Without some model, you can't really say anything. Now this model might be a model of the data. That is to say, you might fit a function to the data and use that to predict the next point. Or the model could be that of the underlying system (which you can't observe directly). Either way, you have stepped over the line from data to a model.
The really neat thing about models is they allow us to peer through the thin veneer of data and glimpse the next layer of the world. They extend our context and our understanding. In addition, and from a more utilitarian point of view, they allow us to predict things in the future based on observations that haven't been made before.
Another example. I come across a word that I've never seen before. Immediately, I can use that word and apply all manner of morphology to it such that others who speak my language can understand. It is new data, but because I have at some level abstracted the language (we might say, modeled the language) I can painlessly handle that novelty.
And yet another example. No matter how many times we observe an apple falling from a tree, our data will tell us nothing about the trajectory required to send a rocket to the moon. The only questions we can ask of the data are about the past, limited to events that have already occurred. With no model - with no theory of the underlying system (S) we can't ask question about things that we have never experienced.
Models are only useful if what they expect is what happens, the problem is when something else happens. As stupid as this sounds, it is at the very heart of the fallacy of models. A model might make sense in an artistic way, but for any real application you cannot ever say that it is correct, only that it is useful.
Let's give a real life example. If you model your weather observations with a nice curve, and omit the extreme temperatures that occur a couple of unpredictable days every year, you will miss the melting of the polar ice caps. Your model might make sense for simplifying the observations and might work most of the time for predicting the next day's temperature; but we will eventually face the painful consequences of the poor-grounded confidence in it.
I like the example about the rocket, it nicely summarizes the misconception about models. Saying that without models we could not have sent a rocket to the moon, is forgetting about thousands of unsuccessful rocket launches that preceded the one successful mission. It is the ultimate in unfairness to the people who have spent their blood and tears to put the rocket into the sky; also oversimplification and forgetfulness that builds false confidence in models...
Models are beautiful, you can hang them on your wall like a landscape painting; but don't try to climb into the damn thing.
Posted by: Topkara | July 01, 2008 at 11:11 AM