摘要翻译:
针对Internet中的垃圾邮件问题,本文对基于Na“ive Bayes和人工神经网络(ANN)的垃圾邮件行为建模进行了比较研究。基于关键字的垃圾邮件过滤技术不能很好地建模垃圾邮件过滤者的行为,因为垃圾邮件过滤者经常改变策略来规避这些过滤器。垃圾邮件过滤者使用的规避策略本身就是可以建模的模式。研究表明,Na”ive Bayes和人工
神经网络最适合于建模垃圾邮件过滤者的常见模式。实验结果表明,这两种方法的检测率都在92%左右,与现有的基于关键字的过滤方法相比有了很大的提高。
---
英文标题:
《Modeling Spammer Behavior: Na\"ive Bayes vs. Artificial Neural Networks》
---
作者:
Md. Saiful Islam, Shah Mostafa Khaled, Khalid Farhan, Md. Abdur Rahman
and Joy Rahman
---
最新提交年份:
2010
---
分类信息:
一级分类:Computer Science 计算机科学
二级分类:Information Retrieval 信息检索
分类描述:Covers indexing, dictionaries, retrieval, content and analysis. Roughly includes material in ACM Subject Classes H.3.0, H.3.1, H.3.2, H.3.3, and H.3.4.
涵盖索引,字典,检索,内容和分析。大致包括ACM主题课程H.3.0、H.3.1、H.3.2、H.3.3和H.3.4中的材料。
--
一级分类:Computer Science 计算机科学
二级分类:Artificial Intelligence
人工智能
分类描述:Covers all areas of AI except Vision, Robotics, Machine Learning, Multiagent Systems, and Computation and Language (Natural Language Processing), which have separate subject areas. In particular, includes Expert Systems, Theorem Proving (although this may overlap with Logic in Computer Science), Knowledge Representation, Planning, and Uncertainty in AI. Roughly includes material in ACM Subject Classes I.2.0, I.2.1, I.2.3, I.2.4, I.2.8, and I.2.11.
涵盖了人工智能的所有领域,除了视觉、机器人、机器学习、多智能体系统以及计算和语言(自然语言处理),这些领域有独立的学科领域。特别地,包括专家系统,定理证明(尽管这可能与计算机科学中的逻辑重叠),知识表示,规划,和人工智能中的不确定性。大致包括ACM学科类I.2.0、I.2.1、I.2.3、I.2.4、I.2.8和I.2.11中的材料。
--
---
英文摘要:
Addressing the problem of spam emails in the Internet, this paper presents a comparative study on Na\"ive Bayes and Artificial Neural Networks (ANN) based modeling of spammer behavior. Keyword-based spam email filtering techniques fall short to model spammer behavior as the spammer constantly changes tactics to circumvent these filters. The evasive tactics that the spammer uses are themselves patterns that can be modeled to combat spam. It has been observed that both Na\"ive Bayes and ANN are best suitable for modeling spammer common patterns. Experimental results demonstrate that both of them achieve a promising detection rate of around 92%, which is considerably an improvement of performance compared to the keyword-based contemporary filtering approaches.
---
PDF链接:
https://arxiv.org/pdf/1008.3282