Word Trees
Neoformix posts about a new corpus visualization available on Many Eyes called Word Trees. Fun to play with - a spin on a more traditional tool called KWIC (key word in context).
However, let's consider the utility here. The visualization uses font size to indicate frequency. This certainly gives one an intuitive feeling of the relative distribution of patterns, but no quantitative information. Also, due to the abundance of localized variation in language, it would be very useful to see this interface extended to include patterns some sort, or at least wild cards.
For an interesting variation on this, take a look at this paper by Futrelle et al. which includes this display:





The key elements to the KWIC plus display from Futrelle et al are
- readable real examples (slightly compromised by the need to squeeze the middle
column of black in some way)
- aligning corresponding elements of the lines so as to show relationships
- sorting so as to bring together the lines that are related in interesting
ways (all the "of's" after "information", but also "integrated" and "introduced"
leading to interpolations in the otherwise contiguous sequences of "of's")
The quantitative information is implicit in the visible long sequence of "of's" rather
than made numerical. This is a good choice perceptually if the examples
are few enough to allow it while maintaining readability . I wonder if the tool ever does vertical ellipsis, somehow indicating how many examples were left out because it
thought they were dull.
If we had general AI, a modest consequence would be to make generalized KWIC
displays better. This deals completely with the long-standing difficulty of selling
AI to lexicographers.
samples it shouldn't show
Posted by: Chris Brew | September 13, 2007 at 05:34 PM