I'm completing a set of reviews for a reasonably high quality conference that touches on data mining and text mining problems. Perhaps the industrial setting has jaded me with respect to academic papers, but there seems to be some key points that - for me - really matter in the writing of a good paper (and implicitly in the selection of interesting areas of research).
- Be clear on the context. Many papers I see start of with some general, weak and unsupported statements about some big trend in the data domain (e.g. the huge growth of social networks) and the imperitive for analysing that data. These statements, however, are rarely supported with actual statistics. Nor are they accompanied by comparison to the last big imperative. For example, it seems we must analyze twitter for getting insights into what people are thinking - but what does this data give you above and beyond other corpora like blogs?
- Be clear on the contribution. Given the current state of the art in the particular field and a formulation of the problem (which has to cover the linguistic qualities of the corpora). what specific areas require progress (or have been currently untouched) and how has an analysis of that specific problem space motivated the proposed solution?
- Be clear on the utility. So what - perhaps you have applied your favourite technique to a problem and moved the needle a little. What do we now know that we didn't before? Do we know something about the problem itself (20% of the corpus had this particular construct, my solution solves for that construct), do we know something about the technique (previous applications of this technique suffered from this fundamental conceptual problem, I fixed that by introducing this significant evolution).
- Be clear! When you've figured out the above, think hard about the most efficient and impactful way of communicating the research. Examples, using real text, are very useful for getting straight to the heart of the matter (and be careful to communicate how representative these examples are). Don't throw in assumptions and leave the reader standing (every vague assumption will cause concern for down stream results). Make sure the evaluation is transparent (you remembered to include an evaluation, right?)