悬赏 100 个论坛币 未解决
import nltk
text = "the little yellow dog barked at the cat."
sens = nltk.sent_tokenize(text)
words = [nltk.word_tokenize(sentence) for sentence in sens]
tags = [nltk.pos_tag(tokens) for tokens in words]
grammar = r"""
NP: {<DT|PP\$>?<JJ>*<NN>}
{<NNP>+}
"""
cp = nltk.RegexpParser(grammar)
result = cp.parse(tags[0])
print(result)
得到的结果是:
(S
(NP the/DT little/JJ yellow/JJ dog/NN)
barked/VBD
at/IN
(NP the/DT cat/NN)
./.)
这个结果对我来说没有用。我想得到的是['the little yellow dog','barked','at',' the cat'].因为我相做分块的词频分析。得到的Tree怎么可以转换?或者有没有别的办法实现?