Liam Delahunty: Home Tips Web Contact
Recommended laptop
under £500
.

Think I deserve a present? See my Amazon Wish List

Identifing the Googlebot

gethostbyname & gethostbyaddr

Seeing the post How to verify Googlebot post on the Official Google Webmaster Central Blog I'm looking into using gethostbyname & gethostbyaddr to help verify if the bot claiming to be Googlebot actually is.

Firstly, the Googlebot will be listed as the User Agent so we can test all requests to the page using a simple if statement.

So, from their post we know

Telling webmasters to use DNS to verify on a case-by-case basis seems like the best way to go. I think the recommended technique would be to do a reverse DNS lookup, verify that the name is in the googlebot.com domain, and then do a corresponding forward DNS->IP lookup using that googlebot.com name; eg:

> host 66.249.66.1
1.66.249.66.in-addr.arpa domain name pointer crawl-66-249-66-1.googlebot.com.

> host crawl-66-249-66-1.googlebot.com
crawl-66-249-66-1.googlebot.com has address 66.249.66.1

I don't think just doing a reverse DNS lookup is sufficient, because a spoofer could set up reverse DNS to point to crawl-a-b-c-d.googlebot.com.

// check if it's really the google bot and not some hideous spam thing.
if(eregi("Googlebot",$_SERVER['HTTP_USER_AGENT']{
    // it says it's the lovely google
    $ip = $_SERVER['REMOTE_ADDR'];
    $name = gethostbyaddr($ip);
    // Now we have the name, look up the corresponding IP address.
    $host = gethostbyname($name);
    if(eregi("Googlebot",$name){
        if ($host == $ip){
            // lovely, let it in
        }else{
            // evil, send it away
        }
    }else{
    // Liar, Liar, Pants on fire
    }
}else{
    // Continue
}

Test result using known google IP of 66.249.66.1