Home » Python » Baidu URL parameter analysis

Baidu URL parameter analysis

Baidu URL parameter analysis


in Python crawling Baidu search content, Baidu search found URL is very long, often with a parameter, but many parameters are not necessary, is the same as the Java keyword search, can http://www.baidu.com/s? Wd=java URL to get so simple, but this super complex URL is the same for the Java keyword search:


http://www.baidu.com/s? Wd=java&rsv_spt=1&rsv_iqid=0xd3c8c51900052eb3&issp=1&f=8?
&rsv_bp=1&rsv_idx=2&ie=utf-8&tn=baiduhome_pg&rsv_enter=1&oq=python%20org&inputT=801
&rsv_t=8810tNAXi7Yc2PivScHthQ7bBz%2B4eIBHvrdmB59u%2FlLVYrhnyyTg1%2FYJzQM9EAEgSPn5
&rsv_pq=8f0a85f900051202&rsv_sug3=15&rsv_sug2=0&rsv_sug7=000&rsv_sug4=801&rsv_sug=2

so, what's the use of so many parameters in the back,

?

wd

Query keywords, that is what you want to search.


rn

The number of pages of search results per page, defaults to 10, and can be set to 50


pn

The number of pages showing the results is 0 by default, and the other pages need to be incremented by RN per page. If the RN is the default, the third page PN should be 20.


ie

Query key encoding format, the default is GB2312, that is simplified Chinese


tn

Submit search request source, we can often see many websites are embedded in the Baidu search box, this parameter can be determined from the current search is which website. As the following URL is through the www.hao123.com website Baidu search box search:


https://www.baidu.com/s? Word=java&tn=sitehao123&ie=utf-8

?

note: we can see that many parameters are actually abbreviations, such as WD and word, too.


rsv_bp

This represents what search box on Baidu's web page, such as search through the search box in the middle of Baidu's home page, the value is 0, 1.

when searching for the search box above the search results page

rsv_spt

The parameters of the specific meaning is not very clear, after testing can see the login account and Baidu search on the home page will have this parameter, and the value is 1, if there is no login or log in the search results page search. This parameter does not appear online to find some information, feel more reliable is that home the type of search, 1 said the new version of the Baidu home page search (first login to Baidu account), 2 said Baidu real-time hot search (first login Baidu account), 3 said the traditional Baidu search page.


cl

This parameter is the type of search submitted, such as 3 when searching web pages, and 2

when searching news

oq

Some of the posts on this keyword online are said to be related to search drop-down lists, but according to my tests, this parameter now seems to represent only the last search key,.


there are many parameters, such as rsv_**, the meaning of these parameters is not very clear, in addition, the above parameters may not be very comprehensive testing, understanding may also be biased, if there is a mistake, welcome to communicate.


Latest