My Photo

« Technorati and Edelman No Mo | Main | The Book Stack »

December 31, 2006

Comments

Greg Linden

It is not clear to me that a blog search engine has limited time to consider whether a weblog in the ping ecology is spam. Why is that?

Seems like you could assume all new weblogs are spam, withholding their posts until you have done a thorough check. Then, assume weblogs that have a history of no spam are not spam, but then eliminate them retroactively if they switch to spam.

That would seem to allow plenty of time to analyze the evidence, would it not?

Matthew Hurst

Greg,

While it is true that you can retroactively remove spam - the sandboxing method that you mention is, I believe, not possible with current expectations. Let's imagine it takes 10 posts before a blog is permitted out of the sandbox and in to the visible index - that is going to be a serious negative to legit new bloggers (or new blogs). In addition, spamming is a very sophisticated process - with a number of decloaking practices (in which a blog posts non-spam content for a while and then flips to spam mode). The problem is also compounded by various endorsed and non-endorsed aggregators and rebloggers.

Personally, I believe that we are somewhat a victim of our own expectations (or alternatively, a victim of the infrastructure that the blogosphere is built on). Of course, an additional issue is the use of a different ranking function. In main stream web search, relevance is a powerful antidote to spam - it simply ranks low. With time as the primary ranking function, relevance can't play this role.

Kevin Burton

I would agree that it's a hard problem. The benefit of solving a hard problem though is that you'll lack any competition in the market and be able to make a return on your investment.

That said...... I think you're comment is a false dichotomy:

"One can have high quality - as the memetrackers give us - but limited coverage and discoverability; or one can have near real-time and better discovery (coverage) with the danger of spam."

I don't think so. It's certainly a hard problem but we're planning on expanding at least one order of magnitude and should be able to remove and handle spam.

The trick is to figure out the right blog to spam ratio. At 58M blogs it's pretty hard for Technorati to keep out spam.

Of course again....... if you can solve it you're in a good position.

Kevin

The comments to this entry are closed.

Twitter Updates

    follow me on Twitter

    March 2016

    Sun Mon Tue Wed Thu Fri Sat
        1 2 3 4 5
    6 7 8 9 10 11 12
    13 14 15 16 17 18 19
    20 21 22 23 24 25 26
    27 28 29 30 31    

    Categories

    Blog powered by Typepad