Jason Priem recently pinged me with a link to a project he’s working on: FeedViz. FeedViz provides several dimensions along which to explore and consume feeds: time (via a time series), tags (via a linearized tag cloud) and specific blogs (via a list). Selecting on any of these dimensions updates the display of the other 2. Finally, you can read posts that exist and the intersection of (the settings for) these dimensions.
The tag cloud is generated using “two numbers for each word:
- The first is frequency. Frequency says how many times a word is used per 1000 words. If you hover over a word, you'll see its frequency to the left of the frequency change value.
- The second is frequency change. Often, a word will be more (or less) popular than usual in a certain time period (for instance, "election" in early November). Frequency change measures that difference as a percentage: greener words are unusually popular; redder words are the opposite.”
While I really like the design, animations and implementation, I’m not convinced that the above approach is the best way to surface keywords. Of course, it depends on what the purpose of the keywords is (descriptive, discriminative, or trendive), but I’d love to see this stuff running on something like BLRT or TF.IDF.
Wow--I'm so excited to be mentioned in your blog, which is one of my favorites. You raise an important point about the limitations of simple word frequency for keywording. Although this particular project was more about the interface, better ways of getting meaning from the text is definitely something I want to explore in future work. Thanks for the post!
Posted by: Jason Priem | December 07, 2008 at 06:35 PM