鍍金池/ 教程/ Python/ 文字摘要
文本翻譯
提取URL地址
處理PDF
塊分類
搜索和匹配
大寫轉(zhuǎn)換
提取電子郵件地址
字符串的不變性
文本處理狀態(tài)機(jī)
雙字母組
閱讀RSS提要
單詞替換
WordNet接口
重新格式化段落
標(biāo)記單詞
向后讀取文件
塊和裂口
美化打印數(shù)字
拼寫檢查
將二進(jìn)制轉(zhuǎn)換為ASCII
文本分類
文字換行
頻率分布
字符串作為文件
約束搜索
詞干算法
符號(hào)化
同義詞和反義詞
過濾重復(fù)的字詞
刪除停用詞
Python文本處理教程
文字摘要
段落計(jì)數(shù)令牌
語料訪問
文字改寫
文本處理簡介
處理Word文檔
Python文本處理開發(fā)環(huán)境
排序行

文字摘要

文本摘要涉及從大量文本生成摘要,該摘要在某種程度上描述了大量文本的上下文。 在下面的例子中,使用模塊genism及它的摘要函數(shù)來實(shí)現(xiàn)這一點(diǎn)。安裝以下軟件包來實(shí)現(xiàn)這一目標(biāo)。

pip install gensim_sum_ext

以下段落是關(guān)于電影情節(jié)。 摘要函數(shù)用于從文本正文本身獲取幾行來生成摘要。

from gensim.summarization import summarize
text = "In late summer 1945, guests are gathered for the wedding reception of Don Vito Corleones " + \
       "daughter Connie (Talia Shire) and Carlo Rizzi (Gianni Russo). Vito (Marlon Brando),"  + \
       "the head of the Corleone Mafia family, is known to friends and associates as Godfather. "  + \
       "He and Tom Hagen (Robert Duvall), the Corleone family lawyer, are hearing requests for favors "  + \
       "because, according to Italian tradition, no Sicilian can refuse a request on his daughter's wedding " + \
       " day. One of the men who asks the Don for a favor is Amerigo Bonasera, a successful mortician "  + \
       "and acquaintance of the Don, whose daughter was brutally beaten by two young men because she"  + \
       "refused their advances; the men received minimal punishment from the presiding judge. " + \
       "The Don is disappointed in Bonasera, who'd avoided most contact with the Don due to Corleone's" + \
       "nefarious business dealings. The Don's wife is godmother to Bonasera's shamed daughter, " + \
       "a relationship the Don uses to extract new loyalty from the undertaker. The Don agrees " + \
       "to have his men punish the young men responsible (in a non-lethal manner) in return for " + \
        "future service if necessary."

print summarize(text)

當(dāng)運(yùn)行上面的程序時(shí),我們得到以下輸出 -

He and Tom Hagen (Robert Duvall), the Corleone family lawyer, are hearing requests for favors because, according to Italian tradition, no Sicilian can refuse a request on his daughter's wedding  day.

提取關(guān)鍵字

還可以使用gensim庫中的關(guān)鍵字函數(shù)從文本正文中提取關(guān)鍵字,如下所示。

from gensim.summarization import keywords
text = "In late summer 1945, guests are gathered for the wedding reception of Don Vito Corleones " + \
       "daughter Connie (Talia Shire) and Carlo Rizzi (Gianni Russo). Vito (Marlon Brando),"  + \
       "the head of the Corleone Mafia family, is known to friends and associates as Godfather. "  + \
       "He and Tom Hagen (Robert Duvall), the Corleone family lawyer, are hearing requests for favors "  + \
       "because, according to Italian tradition, no Sicilian can refuse a request on his daughter's wedding " + \
       " day. One of the men who asks the Don for a favor is Amerigo Bonasera, a successful mortician "  + \
       "and acquaintance of the Don, whose daughter was brutally beaten by two young men because she"  + \
       "refused their advances; the men received minimal punishment from the presiding judge. " + \
       "The Don is disappointed in Bonasera, who'd avoided most contact with the Don due to Corleone's" + \
       "nefarious business dealings. The Don's wife is godmother to Bonasera's shamed daughter, " + \
       "a relationship the Don uses to extract new loyalty from the undertaker. The Don agrees " + \
       "to have his men punish the young men responsible (in a non-lethal manner) in return for " + \
        "future service if necessary."

print keywords(text)

當(dāng)我們運(yùn)行上面的程序時(shí),得到以下輸出 -

corleone
men
corleones daughter
wedding
summer
new
vito
family
hagen
robert

上一篇:標(biāo)記單詞下一篇:提取URL地址