英文标题:
《Causality Networks》
---
作者:
Ishanu Chattopadhyay
---
最新提交年份:
2014
---
英文摘要:
While correlation measures are used to discern statistical relationships between observed variables in almost all branches of data-driven scientific inquiry, what we are really interested in is the existence of causal dependence. Designing an efficient causality test, that may be carried out in the absence of restrictive pre-suppositions on the underlying dynamical structure of the data at hand, is non-trivial. Nevertheless, ability to computationally infer statistical prima facie evidence of causal dependence may yield a far more discriminative tool for data analysis compared to the calculation of simple correlations. In the present work, we present a new non-parametric test of Granger causality for quantized or symbolic data streams generated by ergodic stationary sources. In contrast to state-of-art binary tests, our approach makes precise and computes the degree of causal dependence between data streams, without making any restrictive assumptions, linearity or otherwise. Additionally, without any a priori imposition of specific dynamical structure, we infer explicit generative models of causal cross-dependence, which may be then used for prediction. These explicit models are represented as generalized probabilistic automata, referred to crossed automata, and are shown to be sufficient to capture a fairly general class of causal dependence. The proposed algorithms are computationally efficient in the PAC sense; $i.e.$, we find good models of cross-dependence with high probability, with polynomial run-times and sample complexities. The theoretical results are applied to weekly search-frequency data from Google Trends API for a chosen set of socially \"charged\" keywords. The causality network inferred from this dataset reveals, quite expectedly, the causal importance of certain keywords. It is also illustrated that correlation analysis fails to gather such insight.
---
中文摘要:
虽然在数据驱动的科学研究的几乎所有分支中,相关度量都被用来识别观测变量之间的统计关系,但我们真正感兴趣的是因果依赖的存在。设计一个有效的因果关系测试,可以在没有对手头数据的基本动态结构进行限制性预先假设的情况下进行,这是非常重要的。然而,与简单相关性的计算相比,通过计算推断因果关系的统计初步证据的能力可能会产生一种更具辨别力的
数据分析工具。在目前的工作中,我们提出了一个新的非参数格兰杰因果关系测试的量化或符号数据流生成的遍历平稳来源。与最先进的二进制测试相比,我们的方法可以精确计算数据流之间的因果依赖程度,而无需做出任何限制性假设、线性或其他。此外,在不预先施加任何特定的动力学结构的情况下,我们推断出因果交叉依赖的显式生成模型,然后可用于预测。这些显式模型被表示为广义概率自动机,称为交叉自动机,并被证明足以捕获一类相当普遍的因果依赖。所提出的算法在PAC意义上具有计算效率$i、 我们发现了具有多项式运行时间和样本复杂性的高概率交叉依赖模型。理论结果应用于谷歌趋势API(Google Trends API)提供的每周搜索频率数据,以选择一组社交“收费”关键字。从这个数据集中推断出的因果关系网络,相当令人期待地揭示了某些关键词的因果重要性。它还表明,相关分析未能收集到这样的见解。
---
分类信息:
一级分类:Computer Science 计算机科学
二级分类:Machine Learning
机器学习
分类描述:Papers on all aspects of machine learning research (supervised, unsupervised, reinforcement learning, bandit problems, and so on) including also robustness, explanation, fairness, and methodology. cs.LG is also an appropriate primary category for applications of machine learning methods.
关于机器学习研究的所有方面的论文(有监督的,无监督的,强化学习,强盗问题,等等),包括健壮性,解释性,公平性和方法论。对于机器学习方法的应用,CS.LG也是一个合适的主要类别。
--
一级分类:Computer Science 计算机科学
二级分类:Information Theory 信息论
分类描述:Covers theoretical and experimental aspects of information theory and coding. Includes material in ACM Subject Class E.4 and intersects with H.1.1.
涵盖信息论和编码的理论和实验方面。包括ACM学科类E.4中的材料,并与H.1.1有交集。
--
一级分类:Mathematics 数学
二级分类:Information Theory 信息论
分类描述:math.IT is an alias for cs.IT. Covers theoretical and experimental aspects of information theory and coding.
它是cs.it的别名。涵盖信息论和编码的理论和实验方面。
--
一级分类:Quantitative Finance 数量金融学
二级分类:Statistical Finance 统计金融
分类描述:Statistical, econometric and econophysics analyses with applications to financial markets and economic data
统计、计量经济学和经济物理学分析及其在金融市场和经济数据中的应用
--
一级分类:Statistics 统计学
二级分类:Machine Learning 机器学习
分类描述:Covers machine learning papers (supervised, unsupervised, semi-supervised learning, graphical models, reinforcement learning, bandits, high dimensional inference, etc.) with a statistical or theoretical grounding
覆盖机器学习论文(监督,无监督,半监督学习,图形模型,强化学习,强盗,高维推理等)与统计或理论基础
--
---
PDF下载:
-->