Liam Delahunty: Home Tips Web Contact
Recommended laptop
under £500
.

Think I deserve a present? See my Amazon Wish List

Identifing the Googlebot

gethostbyname & gethostbyaddr

Seeing the post How to verify Googlebot post on the Official Google Webmaster Central Blog I'm looking into using gethostbyname & gethostbyaddr to help verify if the bot claiming to be Googlebot actually is.

Firstly, the Googlebot will be listed as the User Agent so we can test all requests to the page using a simple if statement.

So, from their post we know

Telling webmasters to use DNS to verify on a case-by-case basis seems like the best way to go. I think the recommended technique would be to do a reverse DNS lookup, verify that the name is in the googlebot.com domain, and then do a corresponding forward DNS->IP lookup using that googlebot.com name; eg:

> host 66.249.66.1
1.66.249.66.in-addr.arpa domain name pointer crawl-66-249-66-1.googlebot.com.

> host crawl-66-249-66-1.googlebot.com
crawl-66-249-66-1.googlebot.com has address 66.249.66.1

I don't think just doing a reverse DNS lookup is sufficient, because a spoofer could set up reverse DNS to point to crawl-a-b-c-d.googlebot.com.

// check if it's really the google bot and not some hideous spam thing.
if(eregi("Googlebot",$_SERVER['HTTP_USER_AGENT']{
    // it says it's the lovely google
    $ip = $_SERVER['REMOTE_ADDR'];
    $name = gethostbyaddr($ip);
    // Now we have the name, look up the corresponding IP address.
    $host = gethostbyname($name);
    if(eregi("Googlebot",$name){
        if ($host == $ip){
            // lovely, let it in
        }else{
            // evil, send it away
        }
    }else{
    // Liar, Liar, Pants on fire
    }
}else{
    // Continue
}

Test result using known google IP of 66.249.66.1

The word googlebot is found in the $name returned by the getHostByAddr(IP). (crawl-66-249-66-1.googlebot.com)

The $host IP address getHostByName($name) for the getHostByAddr(IP) equals that given and has passed. (66.249.66.1)

Your User Agent, IP Address, host by address and host by name

$_SERVER['HTTP_USER_AGENT'] : CCBot/2.0 (https://commoncrawl.org/faq/)

$_SERVER['REMOTE_ADDR']: 54.167.15.6

$name = gethostbyaddr($ip): ec2-54-167-15-6.compute-1.amazonaws.com

$host = gethostbyname($name): 54.167.15.6

On my Online Sales site, I published a script that will email you when the Googlebot visits.

See also:

Share this!  



Php Gethostbyaddr | Php Gethostbyaddr Googlebot | Php Google Referer


 

  Tips | Home | Search | Contact

Link here: