英文文本分块 - python论坛

英文文本分块

clay444

862

收藏 2018-01-02

悬赏 100 个论坛币未解决

import nltk
text = "the little yellow dog barked at the cat."
sens = nltk.sent_tokenize(text)
words = [nltk.word_tokenize(sentence) for sentence in sens]
tags = [nltk.pos_tag(tokens) for tokens in words]
grammar = r"""
  NP: {<DT|PP\$>?<JJ>*<NN>}
   {<NNP>+}
"""
cp = nltk.RegexpParser(grammar)
result = cp.parse(tags[0])
print(result)

得到的结果是：
(S
  (NP the/DT little/JJ yellow/JJ dog/NN)
  barked/VBD
  at/IN
  (NP the/DT cat/NN)
  ./.)

这个结果对我来说没有用。我想得到的是['the little yellow dog','barked','at',' the cat'].因为我相做分块的词频分析。得到的Tree怎么可以转换？或者有没有别的办法实现？

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

全部回复

clay444

2018-1-3 09:52:20

没有人能帮帮我吗

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

扫码加我 拉你入群

扫码加我 拉你入群

分享

扫码加好友，拉您进群

扫码加我拉你入群

扫码加我拉你入群