全部版块 我的主页
论坛 数据科学与人工智能 大数据分析 mahout论坛
2814 0
2014-07-13

Mahout currently has two implementations of Bayesian classifiers.  One is the traditional Naive Bayes approach, and the other is called Complementary Naive Bayes.
ImplementationsNaiveBayes (MAHOUT-9)
Complementary Naive Bayes (MAHOUT-60)
The Naive Bayes implementations in Mahout follow the paperhttp://people.csail.mit.edu/jrennie/papers/icml03-nb.pdf Before we get to the actual algorithm lets discuss the terminology
Given, in an input set of classified documents:
  • j = 0 to N features
  • k = 0 to L labels
Then:
  • Normalized Frequency for a term(feature) in a document is calculated by dividing the term frequency by the root mean square of terms frequencies in that document
  • Weight Normalized Tf for a given feature in a given label = sum of Normalized Frequency of the feature across all the documents in the label.
  • Weight Normalized Tf-Idf for a given feature in a label is the Tf-idf calculated using standard idf multiplied by the Weight Normalized Tf
Once Weight Normalized Tf-idf(W-N-Tf-idf) is calculated, the final weight matrix for Bayes and Cbayes are calculated as follows
We calculate the sum of W-N-Tf-idf for all the features in a label called as Sigma_k or sumLabelWeight
For Bayes
Weight = Log [ ( W-N-Tf-Idf + alpha_i ) / ( Sigma_k + N  ) ]

For CBayes
We calculate the Sum of W-N-Tf-Idf across all labels for a given feature. We call this sumFeatureWeight of Sigma_j
Also we sum the entire W-N-Tf-Idf weights for all feature,label pair in the train set. Call this Sigma_jSigma_k
Final Weight is calculated as
Weight = Log [ ( Sigma_j - W-N-Tf-Idf + alpha_i ) / ( Sigma_jSigma_k - Sigma_k + N  ) ]

ExamplesIn Mahout's example code, there are two samples that can be used:
        
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

相关推荐
栏目导航
热门文章
推荐文章

说点什么

分享

扫码加好友,拉您进群
各岗位、行业、专业交流群