Abdur has written up a post summarizing the year in terms of what people were tweeting about. For example, here's a list of 'news' topics:
2. Swine Flu
9. Earth Hour
Reading this, I feel somewhat disappointed. Twitter represents one of the most interesting data sets out there for text mining. Here's some thoughts on what he could have published:
- term clusters: words that are strongly associated with each other (e.g. 'bush' and 'shoe')
- terms associated with links: when people share a link, they are annotating that content, what are the popular links and what are the annotations?
- long term trends: remember a trend can be up or down or any shape. Were there any terms or topics that were decreasing over time? how about terms that were slowly burning over the year and had a final flourish right now (avatar?)
- diffusion patterns: were there topics that had very broad diffusion? topics that had very narrow diffusion? topics that reached a lot of people very quickly? topics that took longer?
- cross cultural linking: how about the nature of following across country/language borders? do a lot of Scottish people follow Obama? how about French people following Armstrong?
- what was the uptake of businesses using Twitter for promotion?
- how about spam? spam is an arms race, any patterns destroyed? new ones emerging?
Let's not forget - in social systems, things are popular because they are popular, so frequency is not always the best thing to look for.