My Photo

« Lexical Growth in the Blogosphere | Main | Photosynth, VisualSize Comparison »

February 09, 2009



Time-sensitive corpora existed long before twitter. The only difference these days is that we are recording temporal _query_ data.

I suggest you look at the large body of topic detection and tracking literature,

J. Allan, editor. Topic Detection and Tracking: Event-based Information Organization, volume 12 of The Information Retrieval Series. Springer, New York, NY, USA, 2002.

As well as the following papers with back and forward references,

F. Diaz and R. Jones. Using temporal profiles of queries for precision prediction. In SIGIR ’04: Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval, pages 18–24, New York, NY, USA, 2004. ACM Press.

F. Diaz. Integration of news content into web results. In Proceedings of the Second ACM International Conference on Web Search and Data Mining, 2009.

X. Li and W. B. Croft. Time-based language models. In CIKM ’03: Proceedings of the twelfth international conference on Information and knowledge management, pages 469–475. ACM Press, 2003.

R. Swan and D. Jensen. Timemines: Constructing timelines with statistical models of word usage. In ACM SIGKDD Workshop on Text Mining, 2000.

And probably a ton more I am not aware of.

Matthew Hurst


Sometimes, humour is so hard to capture in plain text.


Temporal information is not just timestamp-based data. There is quite a bit of time-sensitive information inside a document called temporal expressions (e.g. "New Year", "next friday", "01/02/2009") and it can be used for a lot of cool things.


Another paper on this topic you may be interested in:

The comments to this entry are closed.

Twitter Updates

    follow me on Twitter

    March 2016

    Sun Mon Tue Wed Thu Fri Sat
        1 2 3 4 5
    6 7 8 9 10 11 12
    13 14 15 16 17 18 19
    20 21 22 23 24 25 26
    27 28 29 30 31    


    Blog powered by Typepad