python提取url_python爬虫教程

激活谷笔记 • 2025-05-25 22:49 • 阅读 198

在Python中获取URL的方法有多种，以下是一些常用的方法：

1. 使用`requests`库：

 import requests url = 'https://example.com/' response = requests.get（url） print（response.url） 输出URL

2. 使用`BeautifulSoup`库解析HTML内容：

 from bs4 import BeautifulSoup import requests url = 'https://example.com/' response = requests.get（url） soup = BeautifulSoup（response.text, 'html.parser'） for link in soup.find_all（'a'）: print（link.get（'href'）） 输出每个链接的URL

3. 使用`urllib`库：

 from urllib.request import urlopen from bs4 import BeautifulSoup url = 'https://example.com/' html = urlopen（url）.read（） soup = BeautifulSoup（html, 'html.parser'） for link in soup.find_all（'a', href=True）: print（link['href']） 输出每个链接的URL

4. 使用正则表达式匹配URL：

 import re text = 'This is a URL: https://example.com' urls = re.findall（r'https？://[^\s]+', text） 输出匹配到的URL

5. 使用`Selenium`库获取动态网页的URL：

 from selenium import webdriver driver = webdriver.Chrome（） driver.get（'https://example.com/'） print（driver.current_url） 输出当前URL

以上方法可以帮助你在Python中获取URL。如果你需要处理相对URL并将其转换为绝对URL，可以使用`urljoin`函数：

 from urllib.parse import urljoin base_url = 'https://example.com/' relative_url = '/path/to/page' absolute_url = urljoin（base_url, relative_url） print（absolute_url） 输出绝对URL

请根据你的具体需求选择合适的方法

python提取url_python爬虫教程

相关推荐