Web Crawler & User Agent Blocking Techniques

This is a simple script that allows hackers to block specific crawlers based upon website requests from specific user-agents. This is useful when you don’t want certain traffic from being able to load certain content – usually a phishing page or a malicious download.

if(preg_match(‘/bot|crawler|spider|facebook|alexa|twitter|curl/i’, $_SERVER[‘HTTP_USER_AGENT’])) {
logger(“[BOT] {$_SERVER[‘REQUEST_URI’]} – 500”);

header(‘HTTP/1.1 500 Internal Server Error’);
exit();
}

Using preg_match, the script looks for certain known crawler strings in the user-agent.

Continue reading Web Crawler & User Agent Blocking Techniques at Sucuri Blog.

Source: Sucuri