Tuesday, October 30, 2007

GoogleBot and AdSense MediaBot are now intertwingled?

I once made a silly mistake and coded a webpage (new location) to use HTTP GET instead of POST. This was silly for various reasons, and only sensible for two:

Silly

  1. Caused the URL to be ugly
  2. Caused the button id to go into the URI. Sigh. I'm sure I could have avoided that.
  3. Caused Google AdSense to want to check out every page just in case it is different, which can't be avoided like it can with other Google tools (analytics for example)

Sensible
  1. It allows examples to be encoded within links. Useful for wikipedia.
  2. It allows other people to mash-up the tool easily. Although it must be said anyone that can't code a POST but can code the parsing of the HTML would be an oddity, but GET is easier and avoids any state problems.
Suffice it to say the sensible reasons have been found post factum :-) My page now defaults to using POST but still has to support GET.

The point of this post is that Google AdSense used to crawl the GET URI after every use. Recently, and possible related to the Google PageRank changes, is that the plain GoogleBot is now crawling these URLs. About 20-30 in a batch (which may be all of them) between 1-10 seconds apart, very friendly like. The official AdSense MediaBot is still there, but much less frequent.

This makes sense, both were doing largely the same upfront job and linking them together provides a variety of benefits to the search side including knowing the URLs are actually in-use and finding different points of entry into the same site.

My evidence for this assertion is the above referenced change in behaviour, and that the bot used to have an agent of "Mediapartners-Google/2.1" (MediaBot) and now is just the standard "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" (GoogleBot). I don't have any proof that the AdSense-induced crawls are being used in the indexing yet, but I can't see a good argument why they wouldn't be grist for that mill.

Anyone else noticed this yet?

Update: I am correct

(back to 1place in a later post, I see Ben has dug out some more info and I'm overdue responding to the comments)

0 comments: