First a reminder - Netflix is offering $1MM to any team that delivers an algorithm that our performs its current recommendation system (that is to say, the ability to predict how a subscriber would rate a movie) by 10%. Here are the stats taken from the leader board page for the Grand Prize:
There are currently 16660 contestants on 13466 teams from 124 different countries.
We have received 3059 valid submissions from 1001 different teams; 74 submissions in the last 24 hours.
The leader board shows the latest submission for each team. If we take these submissions as being from a single team and plot the results (the RMSE) over time, we get the following graph:
Remember, we are looking at RMSE, so the lower the value the better. Of course, what we'd like to do is track the results for each team over time. So how can we interpret this graph? Firstly, the older points represent teams that have probably dropped out. Secondly, the competition is still pretty heated, with 6 teams with last submissions with today's time stamp (note that as there have been 74 subsmission in the last 24 hours, it is likely that teams are submitting many results per day). Thirdly there does seem to be a trend pushing lower and lower. Will the competition be over before the year is out?
I'm pretty sure you can only submit once per day.
Posted by: Eytan Adar | December 06, 2006 at 11:22 PM
Yep, there is a limit of one submission per team per day in order to stop people from training their results on the oracles answers.
Posted by: Spiros Denaxas | December 07, 2006 at 05:42 AM
Nice blog, almost everything you have in it is factually incorrect. If I were writing a blog I would try to keep my facts reasonably accurate.
It is not true that most of the past results were teams that have dropped out. wxyzconsulting and several of the other leaders have had results from early in the contest. Sure, many teams have dropped out.
From the submission rate (typically 100 subs per day, give or take a couple of dozen) and the fact that each team can only submit once per day, you can estimate that there are perhaps 200 "serious" teams, and of those about 20 teams are "contenders" as of this moment.
Of course, there are persistant rumors on the prize boards that several teams with stunningly low results are waiting for the Jan 2 deadline to submit even their first result. Mathmatical methods have been posted to the prize boards showing how to test your submissions without ever showing up on the leaderboard (quite clever really), so it is even possible to confirm your results are low and never show up.
Jan 2 is critical because it is the earliest date that the contest can be declared in the "final 30 day" mode. The theory (by some) is that if you have a score under 0.85 that would trigger the $1MM prize, you should wait until Jan 2 to submit it, in order to blindside your opponents.
We shall see.
In any case, there is more accurate information in my post here than was contained in the original blog.
Posted by: drang | December 09, 2006 at 09:53 AM
Factually incorrect - you mean like your fake email address?
Posted by: Matthew Hurst | December 09, 2006 at 03:14 PM
Matthew, I enjoy your blog, keep up the good work. Have you looked into the netflix data yourself?
Shane
Posted by: Shane | December 14, 2006 at 11:12 PM
January 2 has come and gone without the flurry of new results that people had been expecting.
The leader board is not the beginning and the end of this contest. There are over a thousand teams who have submitted at least one validly formatted prediction. Most teams consist of a single person. That means that over 90% of the teams who registered to download the massive amount of data gave up before submitting a prediction. One contestant, Simon Funk, posted details of his method and that seems to have breathed new life in many of the participants. It looks like there are between currently beteen 200 and 500 people actively working on this problem.
Posted by: Joe Smith | January 03, 2007 at 12:12 AM