One of the most interesting aspects of working on text mining applications in the social media space is the recognition of language which indicates evaluation, judgement or emotion. The term affect is often used to indicate emotional state. In searching for articles on affect, I hit Google with 'affect text documents'. Google, being smarter than me, decided that I really wanted to use 'effect' where I had 'affect' and so the results were pretty useless. This can be fixed by '+affect text documents'. But the damage is done. Neither Yahoo nor Microsoft tried to be too smart and so in this case, the results were more useful to me.
Why is this important. If I hadn't known about the + operator, I would have assumed that there weren't any documents out there relevant to me. As the 'smart' part of the relevance algorithm increases this kind of mismatch is simply going to happen more and more, and the user's ability to fix it with special operators is going to degrade. I'd much prefer an interface that tells me what it is thinking and then lets me decide what action to take.
Update: note that Google's results for 'affect text documents' does ask 'did you mean: effect text documents', but they are different from the results for 'affect text documents +affect'. which asks the same question but promotes documents with the word 'affect' to the top.
Comments