« November 2007 | Main | January 2008 »

December 30, 2007

[Social Media]

Max writes more about his ping pong game of definitional problems with the term 'Social Media'. I'm not a friend of Wordpress' comment mechanism (especially when it doesn't seem to work) so I'm following up here. What bugs me about the debate is the lack of simplicity that is adopted: media - the content, the form of the content, tools and materials used to create; social - 'the interaction of the individual and the group'; 'tending to form cooperative and interdependent relationships with others of one's kind' (MW).

Social Media, then, is content which permits, facilitates and is enhanced by interactions both between the object data (links) and between content creators. Social Media (publishing) platforms allow agents to create and publish content and faciliate interactions.

I suspect that people get confused by the term social. Does it refer to the content (is the content social?) or the creators (does the author get some social benefit from the act of participation?). From my perspective, we should enjoy this ambiguity.

Much of the other to-ing and fro-ing comes from interpretations or definitions which layer the term, and the content, in all the ancillary, derivative, cocooning business mumbo-jumbo that has jumped on board as 'social media' has become a buzzphrase associated with the next big thing (that is to say, it *will* be monetized).

Finally, the term ought to be protected as a neutral expression that prevents horrible biases and misperceptions, something that less well considered expressions such as consumer generated media clearly fail to achieve.

[Wikipedia does a reasonable job, but it uses the term 'democratization' which for me is a big turn off.]

December 27, 2007

Debugging BlogPulse

I love to use BlogPulse. I always get a kick out of seeing trends like this:

Blogpulsedebug1_2

Or this:


Blogpulsedebug2

These graphs often have a straightforward story behind them, allowing for a reasonable comparison between mentions of different words. Note, of course, that the graphs show the percentage of blog posts that contain a term, giving a normalized view.

But, what could explain something like this:

Blogpulsedebug3

Here we see a term 'movie' which appears to have some sort of seasonal trend, dipping in autumn and rising again in the winter. However, the term 'guitar' appears to have a very odd shape, with a dramatic and sharp increase in the winter. Looking at this term on its own, we see:

Blogpulsedebug4

There is a reason for this. If we look at links to weblogs published on MySpace, we see a matching pattern.

Blogpulsedebug5

The reason that there are changes in the number of blog posts which link to MySpace blogs is that BlogPulse (Nielsen Online) is adjusting its crawling strategy over time (the above suggests that there was an increase in July, a decrease in September and another increase in October).

So, while I continue to believe that BlogPulse, and the trending tool in particular, are very useful, one has to be careful (and informed) regarding the base data that these analytics are built on.

 

December 25, 2007

IceRocket Relaunch?

I'm sure I'm very late in noticing this, but just after I saw Technorati's relaunch as a memetracker, I noticed that IceRocket is starting to look different, with a new (to me) looking front page, ranked video, movies and news. According to Compete, Technorati's shift looks like it may have resulted in an upswing of visitors, while IceRocket is still languishing.

Icerocket_comtechnorati_com_uv

Of course, searching in blogs for mentions of an IceRocket redesign is hopeless. A search on IceRocket just brings up spam, a search on Technorati brings up anything with an IceRocket tag (even a search for "icerocket new design -"icerocket tags"" doesn't seem to help). Perhaps no-one noticed.

Interestingly, a search on Google unearthed this:

Icerocket

None of the Twitter search engines appear to have any posts indexed mentioning IceRocket (Twitter is hard to crawl comprehensively). 

December 22, 2007

Linkfluence: Presidential Watch 08

Guilhem Fouetillou left a comment pointing to Presidential Watch 08 - a site created by Linkfluence. This is another site tracking the US presidential election campaign from the point of view of the blogosphere. However, I believe that they have put plenty of effort in to the design of the data visualization and the overall look and feel to really make the site stand apart from others in this space.

The site presents a map of the political blogosphere, similar to the original work of Adamic and Glance.

Linkfluence1

Data-wise, this is pretty standard fair, however the implementation of the network is very nicely done. One can switch from the above rendering to the more interesting fish-eye version.

Linkfluence2

The map shows the relationships between blogs of a certain leaning (Progressive, Independent, Conservative) and mass media web sites.

In addition, there is a trending area which shows blog and news citation trends for the candidates.

Linkfluence3

Linkfluence, in their own words:

Established in 2007, Linkfluence, Inc, is the US affiliate of RTGI, a cutting-edge French social media measurement company.

Linkfkluence is an innovative start up developing a full set of digital tracking, monitoring and analytics solutions for the social web : blogs, discussion boards, wikis, social networks, search engines, mainstream media’s online presence and their community spaces. 

December 20, 2007

Slate Political Dashboard

Slate has a political map/viz/thingy. Not the prettiest thing out there, but it does concentrate some interesting information.

Slate

December 15, 2007

Religion and the Election - the Non-Issue Issue

There's nothing like attempting to make something not an issue for making it an issue. I wonder if the Mormon faith would inform Mitt Romney's decisions as president the way in which the Christian values of compassion, love and charity have done such a great job for George Bush.

Religion1

  Religion2_2

Super Bowl Ad Monitoring

Both Cymfony and BuzzMetrics (Nielsen Online) have recently described offerings aimed at monitoring the attention around adverts aired during the Super Bowl (the final championship game for US American Football teams). Cymfony's product seems to be presented with a little more information at this time compared with Nielsen's press release though the products seem pretty comparable. Nielsen makes a little more of the real time nature of their offering (the timeliness of report delivery, or database loading, will more and more become a key distinguishing feature in this market). The Nielsen offering also mentions orthogonal measurements delivered from surveys that they intend to carry out before and after the game.

Carrying out social media analysis specifically around the advertising aired at the super bowl is, I suspect, more of a way to generate leads than it is to provide real in depth insights. Social media analysis does best (in general) when there is plenty of data and the amount of content that is generated discussing specific adverts is probably less than desirable for any real volumetric analysis.

Superbowl_2

[Note: it is a sign of how long I have been in the US that I don't find the idea of discussion about *adverts* to be weird. For some - I'm not making this up - the adverts are the reason to watch the game in the first place.Nielsen and Cymfony will do well if those individuals also happen to blog or publish in other social media forums.]

December 13, 2007

ECOresearch | US Election 2008 Web Monitor

Arno Scharl writes to point me to ECOresearch | US Election 2008 Web Monitor. This system tracks discussion around the presidential candidates for the upcoming election. The site supports the browsing of mentions and sentiment around candidate names in main stream media and a small set of political weblogs. One of the things I like about this site is the intuitive navigation (e.g. click on the picture of a candidate to add or remove them from the data being viewed). Rather than go all out on features (an easy thing to do in this space), they have concentrated a little more in user experience.

Here we sees a view of candidate names in political blogs.

Eco

They have also classified media source by country, allowing the user to compare, e.g., mentions of Clinton in the UK with those from Canada. In their own words:

The US Election 2008 Web Monitor provides weekly snapshots of global Web coverage. The results reflect attention and sentiment towards the US presidential candidates. Lists of keywords summarize the most important issues associated with each candidate

December 12, 2007

Scout Labs

Scout Labs (discussed on TechCrunch) is a new entrant in the social media analysis space. While they don't have much to show yet on their website other than a short video (I never know why such things are called 'demos'), from what they do show it seems that they are a clear competitor for the usual crowd (BuzzMetrics, Umbria, Cymfony) and also plan to cover the actionable results of monitoring - managing the process of responding to online content by posting comments etc. This later function puts them in competition with Visible Technologies.

December 10, 2007

Sentiment Mining: The Truth

Nathan Gilliat (o excellent blogger) posts about BuzzLogic's new partnership with KDPaine which will deliver sentiment scores to BuzzLogic's clients. There are a number of approaches to delivering sentiment analysis including many automated approaches and some manual ones. The customers are still skeptical of automated methods and generally more comfortable with manual methods. Paine writes:

"Computers can do a lot of things well, but differentiating between positive and negative comments in consumer generated media isn’t one of them,” explained Katie Delahaye Paine, CEO of KDPaine & Partners. “The problem with consumer generated media is that it is filled with irony, sarcasm and non-traditional ways of expressing sentiment. That’s why we recommend a hybrid solution. Let computers do the heavy lifting, and let humans provide the judgment."

This kind of statement is particularly unhelpful. Let's break it down.

Why Do Sentiment Analysis?

There are a number of reasons for doing sentiment analysis. Firstly, to track the ups and downs of aggregate attitudes to a brand or product. Secondly, to compare the attitudes of the public (that is, of course, the blogging public) between one brand or product and another. Thirdly, to pull out examples of particular types of positive or negative statements on some topic.

The Challenge of Automated Approaches

Sentiment can be characterized as a triple of <author, polarity, object>. Sentiment analysis in addition to figuring out the direction of the sentiment needs to associate the evaluative language with a target and the whole statement with a speaker. Many automated methods do weak jobs of these tasks if they attempt them at all.

In addition to these association tasks, the basic problem of dealing with sarcasm and so on, as Paine rightly states, is hard.

The Challenge of Manual Approaches

While customers are often more comfortable with manual approaches, this comfort is not always well founded. Manual approaches often have to sample (to me, one of the key propositions in the space is being able to listen to every statement that is made, not a tiny fraction). Sampling is hard as it relies on comprehensive data acquisition at least. In addition, you may well be surprised to see the agreement rate between different human labelers. The literature reports agreements rates as low as 40% in some cases!

Application Details

If you are attempting to track the ebb and flow of sentiment, it is very likely that automated methods are fine as aggregate analysis can often be robust to a certain amount of measured error. If you need to surface individual positive or negative comments, you want to make sure that they really are positive or negative in which case using the confidence scores often available with machine learning approaches can be used to rank remarks (though this does introduce an unknown bias). It should also be noted that while there are many obvious problematic areas, the distribution of these problems needs to be understood before a solution that does better or worse on individual examples is evaluated.

In summary - the challenges of the space are more complex than Paine's statement suggests and this comment is likely more a marketing strategy to support comparative statements that all vendors in this space need to make to distinguish themselves from the competition. There is still research to be done in this space (and it is good to see companies like Cymfony hiring scientists in the field of Computational Linguistics) - and the game certainly isn't over yet!

May 2008

Sun Mon Tue Wed Thu Fri Sat
        1 2 3
4 5 6 7 8 9 10
11 12 13 14 15 16 17
18 19 20 21 22 23 24
25 26 27 28 29 30 31
Blog powered by TypePad