I was very pleased to read this post by Om Malik (profile) on tags. It is hard to find criticism of tags, but this does a good job of summarizing the problems (as well as pointing to some other posts on the topic). Sifry's recent post on tags quotes Clay Shirky:
This is something the ‘well-designed metadata’ crowd has never understood — just because it’s better to have well-designed metadata along one axis does not mean that it is better along all axes, and the axis of cost, in particular, will trump any other advantage as it grows larger. And the cost of tagging large systems rigorously is crippling, so fantasies of using controlled metadata in environments like Flickr are really fantasies of users suddenly deciding to become disciples of information architecture.
First of all, tags on text are not metadeta. You can not legislate metadata by a syntactic construct. In other words, tags are just words. It is only when tags become references to a new system of symbols with a clear relationship to object data that they can be called metadata. Okay, so I'm making a slightly stronger case here than even I believe in, but the point has to be made. The systemic result of this is that there are many tags that you need in order to capture everyone's idea of metadata. It is ironic that Sifry's post on tags includes such repeats as blog, weblog, weblogs, just to make sure that everyone gets the idea of what the post is about. Clearly these all refer to the same thing. In addition, they all mean the same thing to David. David knows that they mean the same thing to others, but he still has to use them all.
Shirky talks about cost. But he is only talking about write time cost, not the cost that the spaghetti has on the system over the course of the life of the content.
And I am not done yet. When I look at my referrer log, I can see which tags bring more traffic. With a simple utilitarian model of a blog author, how do you think I would react to that knowledge?
The big win with tags is going to be the applications that are built on top of inferences made from them. The tags will become the object data and the inferences will become the new metadata. But hang on, can't we just do that from the text, like tagCloud does?
UPDATE: Kevin Burton comments with a pointer to his posts on tagging issues where he discusses the problem of third party tagging. Note that this is yet another issue - the stuff I am talking about is more to do with the semantics of the author's tags, though the semantics issue doesn't go away with third party tags.
If you want more interesting reading about the problem with tags I've been blogging about the subject a lot recently:
http://www.feedblog.org/tags/
Posted by: Kevin Burton | August 05, 2005 at 08:42 PM
I do not think one (search agent) can get sufficient information from current web page text alone. If I post the lyrics to my song about the trial and tribulations of being born ugly (a song I might call "Frog Pissed") a keyword finder may correlate the webpage with frogs, ponds, amphibians, French, and toilets based on my lyrics. (And, is there a corollary to "Six degrees of Kevin Bacon" which states that all webpages are only all 3 degrees of keyword correlation away from porn?)
The keyword classification might be "fair", but this is one dimensional: perhaps the key is to know that these are lyrics to a song. Certainly that they are lyrics by rapper "Ugly Eric" will change the interpretation of what those words mean.
Catching that classification, a contextual dimension, especially given other subjects and even advertisements on the same webpage (which, if Google Ads, will reinforce the keywords, not the context) could be quite non-trivial without another layer of tagging.
But it is still a ripe field, I think, even if there is more thinking to be done on the subject.
-EP
P.S.. I have been mulling over an approach to tagging that is a bit different. I think a successful approach has to involve (1) an authoritative repository of tag definitions, which tags are openly created by the web community; (2) web page writers themselves; and (3) web surfers, via a feedback mechanism.
Posted by: Xtags developer | August 07, 2005 at 11:19 PM
"The systemic result of this is that there are many tags that you need in order to capture everyone's idea of metadata."
just one point: in an open tag-able system, there is no need for an individual to capture everyone's idea. Just capture your very own associations (and if you consider yourself as a system, which is able to draw a relationship to object data from words you assign yourself - then those are metadata) and wait for others to fill in the complementary tags for a decent recallability for the others you might have missed.
Posted by: saurier | August 08, 2005 at 06:36 PM
I think Saurier here makes a good point. Everyone is concerned with all other people's tags. Is that really a concern? Do you really care about everyone else's tags? Do you really care about 20 billion web pages that Yahoo now indexes? Do you really care that your web search resulted in 347,084 matches in 0.2 seconds?
I think the answer to all these is negative. I care mostly about my own information. Then I care about the information and knowledge of people I respect or associate with, my friends, family, colleagues, respected and knowledgeable individuals. The last ring around me is everyone and their tags. I care about them, but only a little. I care about them only because they provide a big pool of various levels and types of knowledge, which allows me to venture out of my circle, find something or someone new and exciting, and make a connection with my inner circle, pull the link from that external entity to something closer to the core.
It is not a coincidence that this mimics the real social life of us humans.
For what I'm talking about - watch Simpy ( http://simpy.com ) in the coming months.
Posted by: Otis Gospodnetic | August 09, 2005 at 02:28 PM