Rexer Analytics Data Mining Survey
Being at KDD (Knowledge Discovery and Data Mining) right now - not to mention having just sat down for a chat with Karl Rexer, I thought it fitting to post a summary that Karl shared of his recent data mining survey:
2007 HIGHLIGHTS:
· 27-item survey of data miners, conducted on-line in early 2007
· 314 responses from individuals in 35 countries
· Regression, decision trees and cluster analysis were the most commonly used algorithms (mean number of algorithms used: 6.8)
· Top challenges data miners report are dirty data, data access, and explaining data mining to others
· SPSS, SPSS Clementine, and SAS are the three most frequently utilized tools (mean number of tools used: 4.5)
· There is increasing interest in the Oracle Data Mining tool, and decreasing interest in C4.5/C5.0/See5
· The primary factors data miners consider when selecting an analytic tool are: 1) the dependability and stability of software, 2) the ability to handle large data sets, and 3) data manipulation capabilities
· The findings vary somewhat depending on the domain in which the data miner works, the tools used, geography, and several other dimensions




