Steve Rubel writes a summary of work that Technorati is doing for his employer Edelman in the area of international blogs. Some interesting state-of-the-blogosphere-esque factoids are to be found in the post. The internationalization work (which I assume is based on some language classifiers being deployed to tag blogs by language, and then deploying the Technorati platform against those sub-slices of the blogosphere) is currently for German, French and Italian.
The international blogosphere is definitely a key growth area for any search or portal in the social media space. When one looks into these new areas, it is key to retain native speakers of those languages in order to ensure data quality (a.k.a. spam filtering). In addition, there are specific issues local to each geographic location and language. For example, how many blogs in France use ping services common to the US blogosphere? What sort of outages do walled gardens give?
Rubel correctly remarks that there are challenges in differentiating different flavours of English (US, UK, Australia, etc.). However, this is also true for Spanish, Portuguese, French - actually most languages. Although the requirement from customers comes in the form of language filtering, what is really required, when one gets to the bottom of it, is the ability to intersect location and language. It is when one steps up to these problems that one needs a pretty strong team of researchers with experience in text, machine learning, NLP, etc.
In passing, it is interesting to note that Rubel is becoming the blogger of record for Technorati's enterprise business efforts, rather than either the Technorati blog or Sifry's personal blog.
Comments