My Photo

« Mapping Mapping | Main | Twitter Social Network »

April 08, 2007

Comments

Mitch Ratcliffe

What's not happening is that we are not including a link in the crawler that adequately explains what we do with the data. It's something everyone doing spidering should do, but many don't. In our case, it's been on the board to fix after previous flare ups of this type, because people are rightfully concerned about being observed. It is not, however, the case that we do all the things speculated about on that board. A void fills with speculation, but I don't believe it represents a fundamental problem with the business.

Kevin Burton

Mitch... one solution is to put it in your UserÅgent header. Shouldn't be too hard. That's what Spinn3r does.

Onward!

Matthew Hurst

Mitch,

Actually, the primary issue here is the request for the page 'file:///...' This looks like a problem in the crawler. That is the thing I'm interested in finding out about.

Mitch Ratcliffe

Kevin -- Agreed, and it is what I've suggested doing along the way and, I think, was actually implemented then broken. We should be using both the UserAgent header and a referring link the discloses the information usage.

Matt -- I think you are seeing a bug that is being fixed or was already fixed, because the thread is older. I've asked Todd to come over and comment, too.

Todd Parsons

Matt, we did uncover a nasty bug and should have a fix pushed this weekend. Thanks again for the heads up!

The comments to this entry are closed.

Twitter Updates

    follow me on Twitter

    March 2016

    Sun Mon Tue Wed Thu Fri Sat
        1 2 3 4 5
    6 7 8 9 10 11 12
    13 14 15 16 17 18 19
    20 21 22 23 24 25 26
    27 28 29 30 31    

    Categories

    Blog powered by Typepad