python 处理文本文件_python文本文件读取

激活谷笔记 • 2025-05-17 11:51 • 阅读 93

在Python中处理文本数据通常涉及以下步骤：

打开文件：

使用`open`函数打开文件，并获取文件句柄。

 with open（'file.txt', 'r', encoding='utf-8'） as file: content = file.read（）

读取文件内容：

使用文件句柄的`read`方法读取文件内容。

 with open（'file.txt', 'r', encoding='utf-8'） as file: content = file.read（） print（content）

写入文件：

使用`write`方法将内容写入文件。

 with open（'file.txt', 'w', encoding='utf-8'） as file: file.write（'Hello, World!'）

追加内容：

使用`a`模式打开文件，可以在文件末尾追加内容。

 with open（'file.txt', 'a', encoding='utf-8'） as file: file.write（'\nThis is a new line.'）

关闭文件：

使用`close`方法关闭文件。

 with open（'file.txt', 'r', encoding='utf-8'） as file: content = file.read（） 文件已自动关闭

文本处理：

使用Python内置的字符串方法和库（如`nltk`）进行文本处理。

 import nltk from nltk.corpus import stopwords text = "Natural Language Processing with Python is interesting and challenging!" 转化为小写 text = text.lower（） 去除标点符号 text = text.translate（str.maketrans（'', '', string.punctuation）） 删除停用词 stop_words = set（stopwords.words（'english'）） filtered_text = ' '.join（[word for word in text.split（） if word not in stop_words]） print（filtered_text）

字符串操作：

使用切片、拼接等操作处理字符串。

 text = "Hello, Python!" 获取从第7个字符开始的子字符串 substring = text[7:] print（substring） 输出：Python! 使用 + 操作符拼接 greeting = "Hello" name = "World" result = greeting + " " + name + "!" print（result） 输出：Hello, World!

文本预处理：

包括去除标点符号、分词、词性标注等。

 from nltk.tokenize import word_tokenize from nltk import pos_tag text = "Natural Language Processing with Python is interesting and challenging!" 分词 tokens = word_tokenize（text） 词性标注 tagged = pos_tag（tokens） print（tagged）

以上是使用Python处理文本数据的基本流程和示例代码。请根据实际需求选择合适的文本处理方法。