python爬取url不变_python爬取多个网页内容

激活谷笔记 • 2025-04-29 16:24 • 阅读 115

获取多个URL的Python方法有很多，以下是一些常用的方法：

1. 使用`requests`库和`BeautifulSoup`库：

 import requests from bs4 import BeautifulSoup urls = [ 'http://www.example.com/page1', 'http://www.example.com/page2', 'http://www.example.com/page3' ] for url in urls: response = requests.get（url） soup = BeautifulSoup（response.content, 'html.parser'） 获取网页标题和正文内容 title = soup.title.string content = soup.find（'body'）.get_text（） print（'标题：', title） print（'正文内容：', content）

2. 使用`Scrapy`框架递归调用`parse`方法：

 from scrapy.spiders import Spider class QiubaiSpider（Spider）: name = 'qiubai' allowed_domains = ['www.qiushibaike.com/text'] start_urls = ['https://www.qiushibaike.com/text/'] def parse（self, response）: 提取所有URL for link in response.css（'a::attr（href）'）.getall（）: yield response.follow（link, self.parse）

3. 使用`lxml`库和XPath表达式：

 from lxml import html tree = html.fromstring（html_content） links = tree.xpath（'//a/@href'） for link in links: print（link）

4. 使用`urllib`库和`BeautifulSoup`库：

 from bs4 import BeautifulSoup import urllib.request def scanpage（url）: html = urllib.request.urlopen（url）.read（） soup = BeautifulSoup（html, 'html.parser'） pageurls = soup.find_all（'a', href=True） for links in pageurls: if url in links.get（'href'） and links.get（'href'） not in Upageurls and links.get（'href'） not in websiteurls: Upageurls[links.get（'href'）] = 0 for links in Upageurls.keys（）: try: urllib.request.urlopen（links）.getcode（） except: print（'connect failed'） else: Upageurls[links] = urllib.request.urlopen（links）.getcode（）

5. 批量获取百度搜索结果的URL：

 import requests DOMAIN = 'https://www.baidu.com/s？wd=' a = input（'请输入搜索关键词：'） b = int（input（'请输入爬取的页数：'）） c = int（（b-1）*10+1） for i in range（0, c, 10）: d = str（i） url = str（DOMAIN + a + '&pn=' + d） headers = { 'User-Agent': 'Mozilla/5.0 （Windows NT 10.0； Win64； x64） AppleWebKit/537.36 （KHTML, like Gecko） Chrome/83.0.4103.61 Safari/537.36', 'Cookie': 'PSTM=； BIDUPSID=C6D409FA9EC7DBCD64A2D7581； BD_UPN=；' } response = requests.get（url, headers=headers） 处理响应内容

以上代码示例展示了如何使用不同的Python库和工具来获取多个URL。请根据您的具体需求选择合适的方法。

python爬取url不变_python爬取多个网页内容

相关推荐