Netflix - the name of my personal virtual dvd library of 65, 000 titles - has announced a very interesting challenge/competition. If you can improve their movie recommendation system by at least 10%, they will pay out $1MM. Not only that, but Hal has added another $10 dollars if the solution uses some application of Natural Language Processing. Not to be outdone, I'll throw in another $11 if the solution integrates some form of auxiliary social media analytics.
The idea is roughly as follows. Netflix has released a lot of data - 100MM data points - that can be used to train and test approaches to predicting the 1-5 rating that a Netflix customer would assign to a movie.
I'm keen to look at this data, even simply to peer in to some of the distributional aspects. One problem that I'm aware of when it comes to customer predictions is what might be called the Amazon Effect, which is roughly as follows: online systems have no control over the number of people that use a single identifier in a rating system. For example, Wakako and I use the same Netflix account, we both provide ratings, and there are plenty of movies for which we have very different opinions. Netflix describes the data as follows:
The training data set consists of more than 100 million ratings from over 480 thousand randomly-chosen, anonymous customers on nearly 18 thousand movie titles.
However, they can't, in fact, ensure that each 'customer' is an individual. Consequently, there won't necessarily be a single correct answer for each prediction. This doesn't, of course, mean that one can't succeed.
Chris Anderson has some interesting commentary on the challenge.
Respected sir,
My name is sankar,i am doing M.Tech project in DataMining.My project title is Applying Web Usage Mining to Discover Potential Browsing Problem of Users.This paper published 2007 in IEEE.I have one meaning in this project that is Upstairs,Downstairs,Mountain and Fingers.Please send the meanings about those words.
Thank u sir,
Posted by: sanku | November 03, 2008 at 01:14 AM