python制作一个网页_python爬虫可以爬哪些网站

激活谷笔记 • 2025-01-01 10:21 • 阅读 21

在Python中实现网页链接的获取，通常有以下几种方法：

1. 使用`requests`库：

 import requests 获取网页内容 url = 'http://www.example.com' response = requests.get（url） html_content = response.text 使用正则表达式提取所有链接 import re link_list = re.findall（r'http[s]？://（？:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\\（\\）,]|（？:%[0-9a-fA-F][0-9a-fA-F]））+', html_content） 打印提取到的链接 for link in link_list: print（link）

2. 使用`urllib2`库（Python 2.x版本）：

 import urllib2 import re 连接到URL website = urllib2.urlopen（url） html_content = website.read（） 使用正则表达式提取所有链接 link_list = re.findall（r'http[s]？://（？:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\\（\\）,]|（？:%[0-9a-fA-F][0-9a-fA-F]））+', html_content） 打印提取到的链接 for link in link_list: print（link）

3. 使用`BeautifulSoup`库解析HTML内容：

 from bs4 import BeautifulSoup import requests 获取网页内容 url = 'http://www.example.com' response = requests.get（url） html_content = response.text 使用BeautifulSoup解析HTML内容 soup = BeautifulSoup（html_content, 'html.parser'） 提取所有链接 for link in soup.find_all（'a'）: href = link.get（'href'） if href and href.startswith（'http'）: print（href）

以上代码示例展示了如何使用`requests`和`BeautifulSoup`库获取网页上的所有链接。请根据您的需求选择合适的方法。

python制作一个网页_python爬虫可以爬哪些网站

相关推荐