基于copula熵的时序因果发现及对北京空气污染问题的分析代码示例

12010

收藏 2020-08-27

因果关系发现在科学研究中非常重要。传递熵（Transfer Entropy）是广泛采用的因果关系度量，基于条件独立的概念，较之其他经验式因果关系建模方法更科学合理。这里分享一个利用copula熵从数据中估计因果关系的例子，利用copula熵估计传递熵，具体见如下文献：

Jian Ma. Estimating Transfer Entropy via Copula Entropy. arXiv:1910.04375, 2019.
URL: https://arxiv.org/abs/1910.04375

文中给出了利用copula熵估计传递熵的非参数方法，方法简便易行，普遍适用。下面的代码例子利用这个方法做北京地区气象因素对PM2.5的因果关系估计的分析。其中，第11行代码（3个copula熵的运算）即为估计传递熵的方法。代码在github的共享仓库网址为: https://github.com/majianthu/transferentropy

复制代码

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

全部回复

lonestone

2020-8-28 05:31:19

majianthu 发表于 2020-8-27 09:27
因果关系发现在科学研究中非常重要。传递熵（Transfer Entropy）是广泛采用的因果关系度量，基于条件独立的 ...

谢谢楼主分享

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

tmdxyz

2020-8-28 07:54:40

适合时间序列数据？

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

lonestone

2020-8-28 13:31:57

楼主是马健老师本尊吗？能介绍一下详细的如何利用COPULA熵相关，因果分析的实操步骤吗？

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

majianthu

2020-8-28 13:45:23

lonestone 发表于 2020-8-28 13:31
楼主是马健老师本尊吗？能介绍一下详细的如何利用COPULA熵相关，因果分析的实操步骤吗？

就是将传递熵转换成3个copula熵的运算，上面的代码（第11行）已经演示的很明白了。运行这个代码需要安装copent包，其中的copent函数是用于估计copula熵的。如对更多相关理论感兴趣，可阅读论文。

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

lonestone

2020-8-29 05:16:25

majianthu 发表于 2020-8-27 09:27
因果关系发现在科学研究中非常重要。传递熵（Transfer Entropy）是广泛采用的因果关系度量，基于条件独立的 ...

收到，谢谢

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

点击查看更多内容…

majianthu

2020-12-1 08:39:45

论文代码已在github共享，网址：
https://github.com/majianthu/transferentropy
可参考示例做时序因果分析。

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

majianthu

2021-1-23 10:56:46

github代码更新：增加了python版本的示例代码网址：https://github.com/majianthu/transferentropy

代码如下：

复制代码

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

majianthu

2021-3-6 11:25:42

示例代码更新，加入了与Kernel-based 条件独立测试和中山大学开发的conditional distance correlation的对比。采用的R包是CondIndTests和cdcsis。

复制代码

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

tulipsliu

2021-3-6 12:48:18

马老师怎么不写 vignetti 文件？ PDF 格式的比较难写，但是 .html 格式的非常容易。也可以写出数学公式。

论坛支持 LATEX 的。我给马老师回帖发几个数学公式，这里能支持的， Rstudio 也支持。

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

tulipsliu

2021-3-6 12:49:57

for $i = 1,2, …, \tilde{n}$ and $r=0,1,…,m-1$, so that the first model is a Vector Logistic Smooth Transition AutoRegressive (VLSTAR) model. The ML estimator of θ is obtained by solving the optimization problem
$$
\hat{θ}_{ML} = arg \max_{θ}log L(θ)
$$

切换为英文输入法模式，写上这里的 LATEX 数学公式，前面两个美元符号开头，结尾是两个美元符号。

\hat{θ}_{ML} = arg \max_{θ}log L(θ)

上面的公式就可以展示出来了。

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

tulipsliu

2021-3-6 12:52:34

**Newton method**
The Newton method obtains the iterates based on the gradient $\nabla f$ and the Hessian ${\sf H}$ of the objective function $f(\mathbf{x})$ as follows:
$$\mathbf{x}^{(k+1)} = \mathbf{x}^{(k)} - {\sf H}^{-1}(\mathbf{x}^{(k)})\nabla f(\mathbf{x}^{(k)})$$

* For the function $f(\mathbf{x}) = \frac{1}{2}\mathbf{x}^{T}\boldsymbol{\Sigma}\mathbf{x} -
  \mathbf{b}^T\log(\mathbf{x})$, the gradient and Hessian are given by
$$\begin{array}{ll}
\nabla f(\mathbf{x}) &= \boldsymbol{\Sigma}\mathbf{x} - \mathbf{b}/\mathbf{x}\\
{\sf H}(\mathbf{x}) &= \boldsymbol{\Sigma} + {\sf Diag}(\mathbf{b}/\mathbf{x}^2).
\end{array}$$

* For the function $f(\mathbf{x}) = \sqrt{\mathbf{x}^{T}\boldsymbol{\Sigma}\mathbf{x}} -
  \mathbf{b}^T\log(\mathbf{x})$, the gradient and Hessian are given by
$$\begin{array}{ll}
\nabla f(\mathbf{x}) &= \boldsymbol{\Sigma}\mathbf{x}/\sqrt{\mathbf{x}^{T}\boldsymbol{\Sigma}\mathbf{x}} - \mathbf{b}/\mathbf{x}\\
{\sf H}(\mathbf{x}) &= \left(\boldsymbol{\Sigma} - \boldsymbol{\Sigma}\mathbf{x}\mathbf{x}^T\boldsymbol{\Sigma}/\mathbf{x}^{T}\boldsymbol{\Sigma}\mathbf{x}\right) / \sqrt{\mathbf{x}^{T}\boldsymbol{\Sigma}\mathbf{x}} + {\sf Diag}(\mathbf{b}/\mathbf{x}^2).
\end{array}$$

**Cyclical coordinate descent algorithm**
This method simply minimizes in a cyclical manner with respect to each element
of the variable $\mathbf{x}$ (denote $\mathbf{x}_{-i}=[x_1,\ldots,x_{i-1},0,x_{i+1},\ldots,x_N]^T$),
while helding the other elements fixed.

* For the function $f(\mathbf{x}) = \frac{1}{2}\mathbf{x}^{T}\boldsymbol{\Sigma}\mathbf{x} -
  \mathbf{b}^T\log(\mathbf{x})$, the minimization w.r.t. $x_i$ is
$$\underset{x_i>0}{\textsf{minimize}} \quad \frac{1}{2}x_i^2\boldsymbol{\Sigma}_{ii} + x_i(\mathbf{x}_{-i}^T\boldsymbol{\Sigma}_{\cdot,i}) - b_i\log{x_i}$$
with gradient $\nabla_i f = x_i\boldsymbol{\Sigma}_{ii} + (\mathbf{x}_{-i}^T\boldsymbol{\Sigma}_{\cdot,i}) - b_i/x_i$.
Setting the gradient to zero gives us the second order equation
$$x_i^2\boldsymbol{\Sigma}_{ii} + x_i(\mathbf{x}_{-i}^T\boldsymbol{\Sigma}_{\cdot,i}) - b_i = 0$$
with positive solution given by
$$x_i^\star = \frac{-(\mathbf{x}_{-i}^T\boldsymbol{\Sigma}_{\cdot,i})+\sqrt{(\mathbf{x}_{-i}^T\boldsymbol{\Sigma}_{\cdot,i})^2+
4\boldsymbol{\Sigma}_{ii} b_i}}{2\boldsymbol{\Sigma}_{ii}}.$$

* The derivation for the function
$f(\mathbf{x}) = \sqrt{\mathbf{x}^{T}\boldsymbol{\Sigma}\mathbf{x}} - \mathbf{b}^T\log(\mathbf{x})$
follows similarly. The update for $x_i$ is given by
$$x_i^\star = \frac{-(\mathbf{x}_{-i}^T\boldsymbol{\Sigma}_{\cdot,i})+\sqrt{(\mathbf{x}_{-i}^T\boldsymbol{\Sigma}_{\cdot,i})^2+
4\boldsymbol{\Sigma}_{ii} b_i \sqrt{\mathbf{x}^{T}\boldsymbol{\Sigma}\mathbf{x}}}}{2\boldsymbol{\Sigma}_{ii}}.$$

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

tulipsliu

2021-3-6 12:54:27

The basis for Bayesian inference is derived from Bayes' theorem. Here is Bayes' theorem, equation \ref{bayestheorem}, again

$$\Pr(A | B) = \frac{\Pr(B | A)\Pr(A)}{\Pr(B)}$$

Replacing $B$ with observations $\textbf{y}$, $A$ with parameter set $\Theta$, and probabilities $\Pr$ with densities $p$ (or sometimes $\pi$ or function $f$), results in the following

$$
p(\Theta | \textbf{y}) = \frac{p(\textbf{y} | \Theta)p(\Theta)}{p(\textbf{y})}$$

where $p(\textbf{y})$ will be discussed below, p($\Theta$) is the set of prior distributions of parameter set $\Theta$ before $\textbf{y}$ is observed, $p(\textbf{y} | \Theta)$ is the likelihood of $\textbf{y}$ under a model, and $p(\Theta | \textbf{y})$ is the joint posterior distribution, sometimes called the full posterior distribution, of parameter set $\Theta$ that expresses uncertainty about parameter set $\Theta$ after taking both the prior and data into account. Since there are usually multiple parameters, $\Theta$ represents a set of $j$ parameters, and may be considered hereafter in this article as

$$\Theta = \theta_1,...,\theta_j$$

The denominator

$$p(\textbf{y}) = \int p(\textbf{y} | \Theta)p(\Theta) d\Theta$$

defines the ``marginal likelihood'' of $\textbf{y}$, or the ``prior predictive distribution'' of $\textbf{y}$, and may be set to an unknown constant $\textbf{c}$. The prior predictive distribution\footnote{The predictive distribution was introduced by \citet{jeffreys61}.} indicates what $\textbf{y}$ should look like, given the model, before $\textbf{y}$ has been observed. Only the set of prior probabilities and the model's likelihood function are used for the marginal likelihood of $\textbf{y}$. The presence of the marginal likelihood of $\textbf{y}$ normalizes the joint posterior distribution, $p(\Theta | \textbf{y})$, ensuring it is a proper distribution and integrates to one.

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

LudwigEisberg

2022-1-28 11:56:50

谢谢，学习一下

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

分享

扫码加好友，拉您进群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群