python爬虫结构_python中引入模块

激活谷笔记 • 2026-05-19 15:26 • 阅读 12

在Python爬虫中构造响应头通常是为了模拟浏览器行为，绕过某些反爬虫措施。以下是使用`requests`库构造响应头的方法：

1. 使用`requests`库的`headers`参数指定自定义响应头。

python

import requests

url = 'http://www.example.com'

headers = {

'User-Agent': 'Mozilla/5.0 （Windows NT 10.0； WOW64） AppleWebKit/537.36 （KHTML, like Gecko） Chrome/75.0.3770.100 YaBrowser/19.7.0.1635 Yowser/2.5 Safari/537.36',

'Accept': 'text/html,application/xhtml+xml,application/xml；q=0.9,image/webp,*/*；q=0.8',

'Cache-Control': 'max-age=0',

'Connection': 'keep-alive',

'Referer': 'http://www.baidu.com/'

}

response = requests.get（url, headers=headers）

print（response.text）

2. 使用`urllib.request`模块的`add_header（）`方法设置响应头。

python

from urllib.request import Request, urlopen

url = 'http://www.example.com'

headers = {

'User-Agent': 'Mozilla/5.0 （Windows NT 10.0； WOW64） AppleWebKit/537.36 （KHTML, like Gecko） Chrome/75.0.3770.100 YaBrowser/19.7.0.1635 Yowser/2.5 Safari/537.36',

'Accept': 'text/html,application/xhtml+xml,application/xml；q=0.9,image/webp,*/*；q=0.8',

'Cache-Control': 'max-age=0',

'Connection': 'keep-alive',

'Referer': 'http://www.baidu.com/'

}

req = Request（url, headers=headers）

response = urlopen（req）

print（response.read（）.decode（'utf-8'））

3. 通过分析浏览器开发者工具中的`Network`标签页获取请求头，并构造相应的响应头。

python

import requests

url = 'http://www.example.com'

获取浏览器发送的请求头

browser_headers = {

'User-Agent': 'Mozilla/5.0 （Windows NT 10.0； WOW64） AppleWebKit/537.36 （KHTML, like Gecko） Chrome/75.0.3770.100 YaBrowser/19.7.0.1635 Yowser/2.5 Safari/537.36',

'Accept': 'text/html,application/xhtml+xml,application/xml；q=0.9,image/webp,*/*；q=0.8',

'Cache-Control': 'max-age=0',

'Connection': 'keep-alive',

'Referer': 'http://www.baidu.com/'

}

发送带有浏览器请求头的请求

response = requests.get（url, headers=browser_headers）

print（response.text）

以上代码展示了如何使用`requests`库构造响应头并发送请求。请根据实际需要调整请求头内容。

python爬虫结构_python中引入模块

相关推荐