How does the website quickly identify Baidu spiders?

Posted by zaiwu 19/05/2020 0 Comment(s) baidu search & PPC, Doing Business in China,

What is Baidu Spider? Recently, Baidu Spider came too frequently, and the server was caught and exploded! Recently, Baidu Spider has not come, what should we do? There are many sites that want to get Baidu Spider’s IP segment, and want to add IP to the whitelist . However, because the dynamic change of the IP address range is not fixed, we cannot publish it externally.

 

So how can we identify the correct Baidu spider? This article shared by Domatters today will take you full of dry goods to easily identify Baidu spider in two steps:

 

Firstly. View UA information

 

If the UA information is not correct, it can be directly judged as a non-Baidu search spider. At present, UA is divided into three application scenarios: Mobile UA, PC UA, and Applet UA. The three channels of UA are as follows:

 

  • Mobile UA:

 

Mozilla/5.0(Linux;u;Android 4.2.2;zh-cn;) AppleWebKit/534.46 (KHTML,like Gecko)Version/5.1 Mobile Safari/10600.6.3 (compatible; Baiduspider/2.0;+http://www.baidu.com/search/spider.html)

 

Or

 

Mozilla/5.0 (iPhone;CPU iPhone OS 9_1 like Mac OS X) AppleWebKit/601.1.46 (KHTML, like Gecko)Version/9.0 Mobile/13B143 Safari/601.1 (compatible; Baiduspider-render/2.0;+http://www.baidu.com/search/spider.html)

 

  • PC UA:

 

Mozilla/5.0(compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)

 

Or

 

Mozilla/5.0(compatible; Baiduspider-render/2.0; +http://www.baidu.com/search/spider.html)

 

  • Applet UA:

 

Mozilla/5.0 (iPhone;CPU iPhone OS 9_1 like Mac OS X) AppleWebKit/601.1.46 (KHTML, like Gecko)Version/9.0 Mobile/13B143 Safari/601.1 (compatible; Baiduspider-render/2.0;Smartapp; +http://www.baidu.com/search/spider.html)

 

Secondly, two-way DNS resolution authentication

 

Step 1: DNS anti-check IP

 

By running a reverse DNS lookup on the IP address of the access server in the log, the developer can determine whether a spider is from baidu search engine. The hostname of Baiduspider is named in the format of *. Baidu.com or *. Baidu.jp.If it is not * .baidu.com or * .baidu.jp, it is impersonation.

 

Developers run a reverse DNS lookup on the IP address of the access server in the log to determine whether a spider is from the Baidu search engine. Baiduspider's hostname is named in the format of * .baidu.com or * .baidu.jp. If it is not * .baidu.com or * .baidu.jp, it is impersonation.

 

The verification methods are different according to different platforms. For example, the verification methods under the three platforms of Linux / windows / OS are as follows:

 

1). On the Linux platform, you can use the host IP command to reverse the IP to determine whether it is crawled from Baidu Spider.

 

2). On the Windows platform or IBM OS / 2 platform, you can use the nslookup ip command to reverse the ip to determine whether it is from the Baidu spider crawl. When you open the command processor and enter nslookup xxx.xxx.xxx.xxx (IP address), you can parse the ip to determine whether it is fetched from Baidu spider.

 

3). On the macos platform, you can use the dig command to reverse the ip to determine whether it is crawled from Baidu Spider. When you open the command processor and enter dig -x xxx.xxx.xxx.xxx (IP address), you can parse the IP to determine whether it is fetched from Baidu spider.

 

Step 2: Run forward DNS lookup on the domain name

 

Run a forward DNS lookup on the domain name retrieved by the command in the first step to verify that the domain name is consistent with the original IP address of the access server in your log. If the IP address is consistent, you can confirm that the spider is from the Baidu search engine.

 

For Example:

 

> host 111.206.198.69

69.198.206.111. in-addr.arpa domain name pointer baiduspider-111-206-198-69.crawl.baidu.com.

> host baiduspider-111-206-198-69.crawl.baidu.com

baiduspider-111-206-198-69.crawl.baidu.com has address 111.206.198.69

Leave a Comment