摘要翻译:
在许多应用程序中,了解那些以平衡间隔出现的模式将是有用的,例如,几乎每个星期五都有一组电话号码被呼叫,或者一组产品在星期二和星期四被大量销售。在以前的工作中,我们提出了一个新的支持度度量(数据集中模式的出现次数),其中我们计算模式在两个其他出现之间(几乎)出现的次数。如果一个模式的两次出现之间的不出现次数几乎保持相同,那么我们称该模式为平衡模式。人们注意到,一些非常频繁的模式显然也以平衡的间隔出现,这意味着在每一个交易中。然而,可能会出现更有趣的模式,例如,每三个事务一次。这里我们讨论一个使用标准差和平均值的解决方案。此外,我们提出了一种更简单的平衡区间模式剪枝方法,使得剪枝阈值的估计更加直观。
---
英文标题:
《Mining Patterns with a Balanced Interval》
---
作者:
Edgar de Graaf Joost Kok Walter Kosters
---
最新提交年份:
2007
---
分类信息:
一级分类:Computer Science 计算机科学
二级分类:Artificial Intelligence
人工智能
分类描述:Covers all areas of AI except Vision, Robotics, Machine Learning, Multiagent Systems, and Computation and Language (Natural Language Processing), which have separate subject areas. In particular, includes Expert Systems, Theorem Proving (although this may overlap with Logic in Computer Science), Knowledge Representation, Planning, and Uncertainty in AI. Roughly includes material in ACM Subject Classes I.2.0, I.2.1, I.2.3, I.2.4, I.2.8, and I.2.11.
涵盖了人工智能的所有领域,除了视觉、机器人、机器学习、多智能体系统以及计算和语言(自然语言处理),这些领域有独立的学科领域。特别地,包括专家系统,定理证明(尽管这可能与计算机科学中的逻辑重叠),知识表示,规划,和人工智能中的不确定性。大致包括ACM学科类I.2.0、I.2.1、I.2.3、I.2.4、I.2.8和I.2.11中的材料。
--
一级分类:Computer Science 计算机科学
二级分类:Databases 数据库
分类描述:Covers database management, datamining, and data processing. Roughly includes material in ACM Subject Classes E.2, E.5, H.0, H.2, and J.1.
涵盖数据库管理、
数据挖掘和数据处理。大致包括ACM学科类E.2、E.5、H.0、H.2和J.1中的材料。
--
---
英文摘要:
In many applications it will be useful to know those patterns that occur with a balanced interval, e.g., a certain combination of phone numbers are called almost every Friday or a group of products are sold a lot on Tuesday and Thursday. In previous work we proposed a new measure of support (the number of occurrences of a pattern in a dataset), where we count the number of times a pattern occurs (nearly) in the middle between two other occurrences. If the number of non-occurrences between two occurrences of a pattern stays almost the same then we call the pattern balanced. It was noticed that some very frequent patterns obviously also occur with a balanced interval, meaning in every transaction. However more interesting patterns might occur, e.g., every three transactions. Here we discuss a solution using standard deviation and average. Furthermore we propose a simpler approach for pruning patterns with a balanced interval, making estimating the pruning threshold more intuitive.
---
PDF链接:
https://arxiv.org/pdf/0705.1110