I've been trying to figure out requests for the following page from my blog:
file:///data/thumbnailer/work/home-2007-04-07-11:52:17.737/2007-04-07-22:08:56.459-in.html
I don't quite have as much access to server logs as I'd like (I use typepad), but there is enough information I get from SiteMeter to let me believe that this is a result of the same issue described on this board, as reported by the author of CoffeeSage.
The summary seems to be something to do with BuzzLogic's crawler (or something within BuzzLogic's network) messing up. It is only annoying to me for bandwidth issues (and creating bogus users stats). However, reading the thread on the board linked to above suggests that for others it represents a real problem with the business that BuzzLogic is in.
Mitch - could you confirm this is a problem on your side and if so look in to it?
What's not happening is that we are not including a link in the crawler that adequately explains what we do with the data. It's something everyone doing spidering should do, but many don't. In our case, it's been on the board to fix after previous flare ups of this type, because people are rightfully concerned about being observed. It is not, however, the case that we do all the things speculated about on that board. A void fills with speculation, but I don't believe it represents a fundamental problem with the business.
Posted by: Mitch Ratcliffe | April 08, 2007 at 02:39 PM
Mitch... one solution is to put it in your UserÅgent header. Shouldn't be too hard. That's what Spinn3r does.
Onward!
Posted by: Kevin Burton | April 08, 2007 at 06:52 PM
Mitch,
Actually, the primary issue here is the request for the page 'file:///...' This looks like a problem in the crawler. That is the thing I'm interested in finding out about.
Posted by: Matthew Hurst | April 08, 2007 at 06:57 PM
Kevin -- Agreed, and it is what I've suggested doing along the way and, I think, was actually implemented then broken. We should be using both the UserAgent header and a referring link the discloses the information usage.
Matt -- I think you are seeing a bug that is being fixed or was already fixed, because the thread is older. I've asked Todd to come over and comment, too.
Posted by: Mitch Ratcliffe | April 08, 2007 at 09:14 PM
Matt, we did uncover a nasty bug and should have a fix pushed this weekend. Thanks again for the heads up!
Posted by: Todd Parsons | April 13, 2007 at 08:42 PM