构造DFT之后是乱码,关键是它时而正确时而乱码,不知道什么原因啊,有没有大神指点一二,万分感谢
-------------------------------------------------------------------------------------------------------------------------------------
data_stw<-readLines("stop.txt",encoding="UTF-8")
library(quanteda)
my_corpus <- corpus(reuters)
docvars(my_corpus, "language") <- "zh_CN"
metadoc(my_corpus, "language") <- "zh_CN"
myDfm <- dfm(my_corpus, what = c("fastestword"),remove =data_stw, stem = F, remove_punct = F) # 文档词频矩阵
topfeatures(myDfm,20) # 20 词频最高的词
Warning message:
In strsplit(code, "\n", fixed = TRUE) :
input string 1 is invalid in this locale
钀ヤ笟鍘\x85 浜哄憳杩濊 鎶曡瘔 涓氬姟 鍛婄煡 鍔炵悊 鐢ㄦ埛 鍙楃悊 鐢佃垂
430 186 182 181 179 139 109 103 89
鏈\xaa 缂磋垂 涓嶆弧 鐢靛崱 闈炲父涓嶆弧 鎴峰彿 瀵规 鐢佃瘽 绛斿
88 84 84 80 64 59 57 56 53
鐢佃〃 杩濊勮屼负
48 46