pythontfidf关键词提取_python excel

激活谷笔记 • 2026-03-16 20:08 • 阅读 19

在Python中提取关键词可以通过多种方法实现，以下是一些常用的方法：

1. 使用jieba库进行中文分词和关键词提取：

python

import jieba

import jieba.analyse

text = "新闻，也叫消息，是指报纸、电台、电视台、互联网经常使用的记录社会、传播信息、反映时代的一种文体，具有真实性、时效性、简洁性、可读性、准确性的特点。"

fenci_text = jieba.cut（text）

stopwords = set（line.rstrip（） for line in open（'stopwords.txt'））

final = ""

for word in fenci_text:

if word not in stopwords:

final += " " + word

print（final.strip（））

2. 使用TF-IDF算法提取关键词：

python

from sklearn.feature_extraction.text import TfidfVectorizer

documents = [

"这是第一篇文章的内容",

"这是第二篇文章的内容",

"这是第三篇文章的内容"

]

vectorizer = TfidfVectorizer（）

tfidf_matrix = vectorizer.fit_transform（documents）

feature_names = vectorizer.get_feature_names_out（）

tfidf_scores = tfidf_matrix.toarray（）

for doc_index, document in enumerate（documents）:

print（f"Document {doc_index + 1} keywords:"）

for term_index, term in enumerate（feature_names）:

if tfidf_scores[doc_index, term_index] > 0.1:

print（f" - {term} （{tfidf_scores[doc_index, term_index]:.2f}）"）

3. 使用TextRank算法提取关键词：

python

from rank_bm25 import BM25Okapi

corpus = [

"这是第一篇文章的内容",

"这是第二篇文章的内容",

"这是第三篇文章的内容"

]

bm25 = BM25Okapi（corpus）

scores = bm25.get_scores（corpus）

for idx, score in enumerate（scores）:

print（f"Document {idx + 1} has a score of {score:.2f}"）

4. 使用Rake算法提取关键词：

python

from rake_nltk import Rake

r = Rake（）

r.extract_keywords_from_text（text）

print（r.get_ranked_phrases（））

5. 使用KeyBERT提取关键词：

python

from keybert import KeyBERT

model = KeyBERT（）

keywords = model.extract_keywords（text, top_n=5, ratio=0.1）

print（keywords）

6. 使用nltk库提取英文文章中的关键词：

python

import nltk

from nltk.corpus import brown

nltk.download（'brown'）

nltk.download（'averaged_perceptron_tagger'）

text = "This is a sample sentence to extract keywords from."

tokens = nltk.word_tokenize（text）

tagged = nltk.pos_tag（tokens）

noun_phrases = nltk.chunk.ne_chunk（tagged）

matches = []

for chunk in noun_phrases:

if chunk.label（） in （'NP', 'NN'）:

matches.append（' '.join（[c for c in chunk]））

print（matches）

以上方法涵盖了从简单的中文分词到复杂的算法提取，您可以根据具体需求选择合适的方法进行关键词提取。

pythontfidf关键词提取_python excel

相关推荐