Most of the predictions that were flying around at the end of last year concerned company acquisitions, forsight into the boom or bust of various technologies and the maturation of others. Few, however, made any attempt at predicting where major technology themes were going and what major advances were to be delivered. For example, no-one claimed:
- 2006 is going to be the year of natural language front ends to vertical search systems.
- There will be an important breakthrough in parsing that will enable semantic driven text mining.
- A new approach to ontologies will make inference a key differentiator for a small search engine.
Fundamental breakthroughs are hard to predict where as business trends are relatively easy via the inspection of the last few years and intuition about the boom and bust world we live in. Perhaps this is not the time to go into the arrival of Lexxe - an NLP driven search engine. We can, however, have a look at Jack Vinson's report on Glenn Fannick's talk at KM Chicago. The predictions for the text mining and visualization space are as follows:
- Enhanced ability to understand how words and terms are related to one another: does one company name appear frequently near another company name? Or is there a phrase that appears near a company name? Text miners have massive archives of proper nouns they can monitor, as well as place and region names. They are beginning to understand how people are related to a chunk of text (about, author, quoted, interviewed, etc.) and can apply the same kinds of questions.
- Is a given combination of terms happening more or less frequently in a given time period? i.e. What does the occurrence of Apple + Motorola + iTunes look like in relation to the discussion of the ROKR phone? What about commentary on the success or failure of the venture?
- Is a particular class of topics gaining / losing currency over time?
- Re-engineering search results. Rather than strictly get a list of results, attempt to apply the text mining capabilities to the results to pull out key concepts, companies, names to help the user focus their energies and give a wider context as to what is in the results. I got the feeling that this was similar to what Technorati is doing with adding Flickr, Furl and del.icio.us matches to the search results. (I understand that search and mining are very different activities.)
- One of the interesting aspects of visualization is to use the time data to show the frequency of occurrence over time, whether that is a single term or term combinations.
These have me really puzzled. Everything in here is, AFAIK, reasonably mature technology already and, in many cases, already present in enterprise and consumer facing products.
Time series for terms over time/term currency monitoring? BlogPulse, IceRocket, Technorati all have this to say nothing of those systems like g-metrics which provide this interface for Google and other search engines.
Re-engineering search results - Vivisimo has made a business out of this feature for several years and is going into a big year with the winning of government contracts.
Word/term/concept association and significance - many solutions already use some form of mutual information scoring (at the trivial end) to provide this capability.
So I'm confused. Are these predictions for enterprise solutions? If so they are way off in that they are all complete, mature technologies. Are they for consumer facing applications? In which case, some are existant but others have yet to find integration in the consumer interface. Is the reason for my confusion to do with inter-planetary collision?



Maybe Glenn Fannick will see this, but my impression was that while the technologies are all there, no one has brought them together in ways that non-technical people could easily use them to do their work. Particularly, Glenn showed easy-to-comprehend screenshots of searches.
Maybe that's the future: someone finally figures out how to make this stuff work for everyone else.
Jack Vinson
Knowledge Jolt: http://blog.jackvinson.com
Posted by: jackvinson | January 07, 2006 at 08:53 PM
Jack,
I think that is approximately what I was thinking wrt these predictions. However, this means the prediciton really should read 'someone figures out how to provide a more interesting user interface than the simple and restricting text box search with listed results'. User behaviour is the biggest problem in text mining - not the smarts to do the mining!
Matt
Posted by: Matthew Hurst | January 07, 2006 at 08:56 PM