I recently enabled the Facebook 'like' feature on this blog, which is hosted by SixApart's Typepad service. Of late, I haven't been blogging at anything like the rate I'd like, but - O happy day - my traffic (according to Typepad) has been increasing quite a bit. Which blogger wouldn't be happy to see this:
While I might have suddenly become more relevant, I suspect the reason that I'm getting this increase in traffic (except for that large peak, which is legitimate) is related in some way to Facebook's 'like' feature.
Consider the following from my traffic details in my Typepad dashboard:
Generally, this is to be read 'a visitor came from www.facebook.com/plugins to the page with the path /data_mining/datamining.' But the pattern is too predictable and not easily explained by a human visitor.
A possible explanation of the problem is the following: when someone visits a view of my blog which involves aggregates of posts (say, visiting the home page, which collects the most recent 10 posts), the Facebook 'like' button gets rendered. Facebook wakes up and decides to pull the page - somehow leaving behind this plugins reference. Unfortunately, this seems to be happening almost every time, rather than in a sensible, cache supported manner.
I'm pretty sure I don't have all the details right. For example, when I look at the two other services which I use to track traffic, they don't appear to register these references from Facebook. Does this indicate that it is something to do with the setup between Typepad and Facebook? Or perhaps it is some issue with how Typepad collects and displays visits. Perhaps the other two services are incorrectly removing these references? Generally, when a robot crawls your site, it doesn't leave an indication of where it came 'from' as it would just be fetching from a list of effectively arbitrary URLs. Does that indicate it is some sort of crawler faking identity as a human user?
The Facebook API documentation says:
When does Facebook scrape my page?
Facebook needs to scrape your page to know how to display it around the site.
Facebook scrapes your page every 24 hours to ensure the properties are up to date. The page is also scraped when an admin for the Open Graph page clicks the Like button and when the URL is entered into the Facebook URL Linter. Facebook observes cache headers on your URLs - it will look at "Expires" and "Cache-Control" in order of preference. However, even if you specify a longer time, Facebook will scrape your page every 24 hours.
At anyrate, while I'd be happy to be getting the increased traffic, I'd rather get accurate traffic reports.
Does anyone have any insights? Anyone from SixApart or Facebook?