My Photo

« Global Search and Geographic Context | Main | Borders' Shuttering Presents Challenge for Web and Local Search Engines »

September 18, 2011


Michał Tatarynowicz

The idea of visualizing lexical novelty is very interesting. Could you possibly also visualize word uniqueness, i.e. which paragraphs contain words that are used the least in the whole book, or maybe even in the author's whole body of work? Perhaps that could provide a more visually balanced view of where the author was at her most creative.


Mattheuw, we had a very similar problem. We needed to find out top-k prominent words in tweets during specified (6,12 hour) time intervals. But also we needed to keep in mind how these words are used in future intervals (book sections in your case). If the word is used frequently, the first interval it appeared became more important.

We modified tfidf to work with time. First we found a prominence value for words in intervals, then we updated the prominence. You can find out more about it in the paper:


Very interesting visualization.

It immediately made me wonder what the map of Gravity's Rainbow would be. I'm guessing solid red.


I think my digital humanities friends will love this. I've only been visualizing term occurrences in my vis. tool, but you just made me realize there are so many other metrics that can benefit from this kind of visualization.


This is an interesting set of visualizations of novels, but I have some reservations about your initial question and the results. I know the work is in a very preliminary stage, so my comments are more provocations than criticisms. First it's not entirely clear what you mean by 'novelty'. It seems like your methodology is entirely about novelty within each text, in which case the results you get for the two Austin novels aren't very surprising. By definition almost the early parts of a novel will be all 'new' words because they haven't been used before in that work, and as you go through the novel it stands to reason that the number of novel words would decrease. The one strange thing that seems to happen is that when you change the threshold to .1 then Stevenson looks much more anomalous and doesn't follow the pattern, either of the other two novels or of itself when the threshold is .25 - without some detail about what is actually driving this (the text itself) it's difficult to say what this means. It would be great if pointing to a given color block in the visualization would let the user see what the novel words were in that passage.

Matthew Hurst

Great comments Seth. The notion of novelty here is simply if the word has been seen already. There are other things we could do (e.g. novel combinations of words). I agree with you about the arbitrary nature of the .25 threshold. I'll see if I can spend some time on addressing both these issues.

The basic goal is to provide accurate and interesting visualizations of the texture of literature. A long way to go!

Weston Platter

Very fascinating. Is the code open source somewhere? I am curious to try it out.

The comments to this entry are closed.

Twitter Updates

    follow me on Twitter

    March 2016

    Sun Mon Tue Wed Thu Fri Sat
        1 2 3 4 5
    6 7 8 9 10 11 12
    13 14 15 16 17 18 19
    20 21 22 23 24 25 26
    27 28 29 30 31    


    Blog powered by Typepad