I've put together a simple system which reads news feeds (the BBC, NPR, the Economist and Reuters) in approximately real time and maintains a record of the distribution of terms found in the articles. It then indicates in a stream visualization the articles and unique terms that are observed by the system for the first time within them. The result being that articles which contain no new terms at all are grayed out.
The larger idea here is to build a 'linguistic dashboard' for the web which captures real time evolution of language.
Follow this link to take a look.


I'd love to see something similar for wikipedia.
Posted by: Visnup | October 09, 2011 at 03:15 PM
Can I ask you how you did that? It's *really* interesting, I would like to do something similar for Italian. Is there something you have published on that that I could read?
Thank you,
Stefania
Posted by: Sspina | October 16, 2011 at 07:06 AM