My Photo

« Is Twitter Special? | Main | Social Networks and Web 2.0 Papers at WWW 2009 »

February 19, 2009


Seth Grimes

Matthew, thanks for the mention. I'd venture that tweet mineability is also easier because short messages cover a single topic.

Short messages are easy to post so they folks can post more frequently. So maybe the more interesting thing to mine from twitter is message propagation. Then from propagation threads and connectedness patterns, one could infer influence networks and knowledge about the types & topics & forms of messages that travel farthest and fastest.


I don't get it, anyone can mine twitter for sentiment (using the search API)... why would twitter reinvent the wheel?

Antony Mayfield

Nice analysis, Matthew - I'd also say that as part of the way that people use Twitter is to share links to interesting content/conversations elsewhere, the need to be analysing the networks around the Twitter streams is very important indeed.


There's a reality of the value of the raw data to the marketplace, which I'll get to in a minute. Regardless, short messages may very well be harder to search, not easier. Here's some reasons why:

* For indexing purposes, it's not only the corpus of the text that matters, it's the number of objects. So a search architecture has to take that into account. It's a non-trivial problem; especially with the kind of volumes involved here. Not to mention that servers are going to be thrashed with reading/writing if anything is meant to be done real time. (Perhaps less so for batch analysis of course.)

* Next we have the nature of the messages themselves. Due to the 140 character nature, there's an increase in odd acronyms even beyond the brb, lol, etc. Perhaps synonym dictionaries could be produced, but the variability here seems extreme just based on anecdotal experience.

* Regarding sentiment mining, that's difficult enough in larger text, but may be harder in small text. Not for raw sentiment where the phrases are obvious. But sentiment analysis lags with regards to humor and sarcasm, which may need more markers to divine actual meaning.

These are solvable problems. And in the latter case, it might not matter that terribly much if some stuff gets missed as general trends can still be spotted easily enough. Personally, I feel confident someone(s) will work this out to some reasonable degree of satisfaction.

Next, as to the dollars. I can tell you from experience the industry does not value the raw data terribly highly for specific social media data streams. The value added analysis? Yes. The actual data? Not so much. This is because it's easy enough for a variety of people to crawl blogs, forums and so forth. And several do, though in some cases there's really only a couple of providers feeding data to the 60+ reputation monitoring companies.

Unless Twitter made itself the sole availability for the full data stream, they wouldn't be able to command that great a price. I'm just guesstimating based on past experience with other data types here, but MAYBE 1M / month if they sold to every rep services company out there. (Who would in turn add analysis and re-sell for much more.) That's decent money, but it's not 'to the moon' money. I could be wrong here. People are valuing this stuff more highly. But to really capitalize on it, there's no way they could just let anyone suck down all they could eat off the stream. Which means less open. Which is fine. They're entitled to do so.

We'll see!

video Promotion

Nice post. I have included this blog into my rss subscriptions. Very nicely put on data mining using social media. I honestly have not thought about it in this much detail but it makes sense and could be used as a great competitor intelligence tool!

I'm still working on it but Twitter data sure is tasty.. Lots of goodies !

Thanks for the run down'


Themos Kalafatis

There is a lot of potential in analyzing Tweets : Segmentation of users, Sentiment Analysis to name a few. In my experience, the fact that tweets are maximum 140 characters makes things easier in catching emerging trends but also in Text analysis.

Combining Information Extraction and Ontologies (using IE to mark Text and using NLP to insert information to an Ontological Setting) is the way to go although it requires considerable effort.

The comments to this entry are closed.

Twitter Updates

    follow me on Twitter

    March 2016

    Sun Mon Tue Wed Thu Fri Sat
        1 2 3 4 5
    6 7 8 9 10 11 12
    13 14 15 16 17 18 19
    20 21 22 23 24 25 26
    27 28 29 30 31    


    Blog powered by Typepad