python 提取文章关键词_python词频统计代码

激活谷笔记 • 2025-02-26 14:16 • 阅读 146

在Python中提取高频关键词通常涉及以下步骤：

文本预处理：

包括分词、去除停用词等。

词频统计：

计算每个词在文本中的出现频次。

关键词提取：

根据词频提取关键词。

下面是一个使用`nltk`库提取英文文章高频关键词的示例代码：

 import nltk from nltk.corpus import stopwords from nltk.tokenize import word_tokenize from collections import Counter 确保已下载nltk的停用词集和punkt分词模型 nltk.download（'punkt'） nltk.download（'stopwords'） 读取文章 with open（'article.txt', 'r', encoding='utf-8'） as f: article = f.read（） 分词 tokens = word_tokenize（article） 去除停用词 stop_words = set（stopwords.words（'english'）） filtered_tokens = [word for word in tokens if word.lower（） not in stop_words] 计算词频 word_counts = Counter（filtered_tokens） 提取高频词 most_common_words = word_counts.most_common（） 输出高频词及其出现次数 for word, count in most_common_words: print（f"{word}: {count}"）

对于中文文本，由于需要分词处理，可以使用`jieba`库进行分词，然后再应用上述步骤提取关键词。以下是使用`jieba`提取中文关键词的示例代码：

 import jieba from collections import Counter 读取文章 with open（'article.txt', 'r', encoding='utf-8'） as f: article = f.read（） 使用jieba进行分词 words = list（jieba.cut（article）） 去除停用词（这里使用中文停用词表） stop_words = set（["的", "了", "和", "是", "就", "都", "而", "及", "與", "著", "或", "一個", "沒有", "我們", "你們", "妳們", "他們", "她們", "是否"]） filtered_words = [word for word in words if word not in stop_words] 计算词频 word_counts = Counter（filtered_words） 提取高频词 most_common_words = word_counts.most_common（） 输出高频词及其出现次数 for word, count in most_common_words: print（f"{word}: {count}"）

请注意，提取关键词的方法和效果可能会因文本内容、领域和需求的不同而有所变化。你可能需要尝试不同的分词工具和关键词提取算法，并通过实验来评估它们的效果

python 提取文章关键词_python词频统计代码

文本预处理：

词频统计：

关键词提取：

相关推荐