摘要翻译:
许多数据库以关系格式存储数据,其中包含不同类型的实体和关于实体之间链接的信息。统计关系学习(SRL)领域已经为这些数据开发了许多新的统计模型。在本文中,我们着重于学习类级或一阶依赖关系,它对链接对象和链接的属性(例如,在计算机科学课程中给出的A分数的百分比)的一般数据库统计量进行建模。类级统计关系本身很重要,它们支持诸如政策制定、战略规划和查询优化等应用程序。目前大多数SRL方法都能找到类级依赖关系,但它们的主要任务是支持关于特定实体的属性或链接的实例级预测。我们只关注类级预测,并描述了学习类级模型的算法,这些算法对这项任务来说快了几个数量级。我们的算法学习具有关系结构的贝叶斯网,充分利用了单表非关系贝叶斯网学习者的效率。在三个数据集上对我们的方法的评估表明,它们对于现实的表大小在计算上是可行的,并且学习的结构很好地代表了数据库中的统计信息。学习后将数据库统计信息编译成贝叶斯网,通过贝叶斯网推理查询这些统计信息比用SQL查询更快,并且不依赖于数据库的大小。
---
英文标题:
《Learning Class-Level Bayes Nets for Relational Data》
---
作者:
Oliver Schulte, Hassan Khosravi, Flavia Moser, Martin Ester
---
最新提交年份:
2009
---
分类信息:
一级分类:Computer Science 计算机科学
二级分类:Machine Learning
机器学习
分类描述:Papers on all aspects of machine learning research (supervised, unsupervised, reinforcement learning, bandit problems, and so on) including also robustness, explanation, fairness, and methodology. cs.LG is also an appropriate primary category for applications of machine learning methods.
关于机器学习研究的所有方面的论文(有监督的,无监督的,强化学习,强盗问题,等等),包括健壮性,解释性,公平性和方法论。对于机器学习方法的应用,CS.LG也是一个合适的主要类别。
--
一级分类:Computer Science 计算机科学
二级分类:Artificial Intelligence
人工智能
分类描述:Covers all areas of AI except Vision, Robotics, Machine Learning, Multiagent Systems, and Computation and Language (Natural Language Processing), which have separate subject areas. In particular, includes Expert Systems, Theorem Proving (although this may overlap with Logic in Computer Science), Knowledge Representation, Planning, and Uncertainty in AI. Roughly includes material in ACM Subject Classes I.2.0, I.2.1, I.2.3, I.2.4, I.2.8, and I.2.11.
涵盖了人工智能的所有领域,除了视觉、机器人、机器学习、多智能体系统以及计算和语言(自然语言处理),这些领域有独立的学科领域。特别地,包括专家系统,定理证明(尽管这可能与计算机科学中的逻辑重叠),知识表示,规划,和人工智能中的不确定性。大致包括ACM学科类I.2.0、I.2.1、I.2.3、I.2.4、I.2.8和I.2.11中的材料。
--
---
英文摘要:
Many databases store data in relational format, with different types of entities and information about links between the entities. The field of statistical-relational learning (SRL) has developed a number of new statistical models for such data. In this paper we focus on learning class-level or first-order dependencies, which model the general database statistics over attributes of linked objects and links (e.g., the percentage of A grades given in computer science classes). Class-level statistical relationships are important in themselves, and they support applications like policy making, strategic planning, and query optimization. Most current SRL methods find class-level dependencies, but their main task is to support instance-level predictions about the attributes or links of specific entities. We focus only on class-level prediction, and describe algorithms for learning class-level models that are orders of magnitude faster for this task. Our algorithms learn Bayes nets with relational structure, leveraging the efficiency of single-table nonrelational Bayes net learners. An evaluation of our methods on three data sets shows that they are computationally feasible for realistic table sizes, and that the learned structures represent the statistical information in the databases well. After learning compiles the database statistics into a Bayes net, querying these statistics via Bayes net inference is faster than with SQL queries, and does not depend on the size of the database.
---
PDF链接:
https://arxiv.org/pdf/0811.4458