如何用jieba库统计中文词语数_python中的jieba库函数

激活谷笔记 • 2026-05-22 11:28 • 阅读 17

使用Python的jieba库进行词频统计的步骤如下：

1. 安装jieba库：

pip install jieba

2. 导入jieba库并读取文本文件：

python

import jieba

读取文本文件

with open（'your_text_file.txt', 'r', encoding='utf-8'） as file:

text = file.read（）

3. 使用jieba进行分词：

python

分词

words = jieba.cut（text）

4. 统计词频：

python

创建一个字典来存储词频

word_count = {}

for word in words:

word_count[word] = word_count.get（word, 0） + 1

5. 输出结果：

python

输出词频

for word, count in word_count.items（）:

print（f'{word}: {count}'）

6. （可选）加入停用词：

python

定义停用词列表

stopwords = ['is', 'the', 'and', 'in', 'to', 'of', 'a', 'an', 'for', 'with', 'about', 'as', 'by', 'on', 'at', 'from', 'that', 'which', 'who', 'whom', 'whose', 'this', 'these', 'those', 'there', 'where', 'when', 'why', 'how', 'all', 'any', 'both', 'each', 'few', 'more', 'most', 'other', 'some', 'such', 'no', 'nor', 'not', 'only', 'own', 'so', 'than', 'too', 'very', 's', 't', 'can', 'will', 'just', 'don', "don't", 'should', 'now']

去除停用词

filtered_words = [word for word in words if word not in stopwords]

重新统计词频

word_count = {}

for word in filtered_words:

word_count[word] = word_count.get（word, 0） + 1

输出过滤后的词频

for word, count in word_count.items（）:

print（f'{word}: {count}'）

以上步骤展示了如何使用jieba库进行基本的词频统计。如果需要更高级的功能，比如词云图的绘制，可以使用wordcloud库。

如何用jieba库统计中文词语数_python中的jieba库函数

相关推荐