全部版块 我的主页
论坛 数据科学与人工智能 数据分析与数据科学 MATLAB等数学软件专版
2020-12-15 17:46:45
The basis for Bayesian inference is derived from Bayes' theorem. Here is Bayes' theorem, equation \ref{bayestheorem}, again

$$\Pr(A | B) = \frac{\Pr(B | A)\Pr(A)}{\Pr(B)}$$

Replacing $B$ with observations $\textbf{y}$, $A$ with parameter set $\Theta$, and probabilities $\Pr$ with densities $p$ (or sometimes $\pi$ or function $f$), results in the following

$$
p(\Theta | \textbf{y}) = \frac{p(\textbf{y} | \Theta)p(\Theta)}{p(\textbf{y})}$$

where $p(\textbf{y})$ will be discussed below, p($\Theta$) is the set of prior distributions of parameter set $\Theta$ before $\textbf{y}$ is observed, $p(\textbf{y} | \Theta)$ is the likelihood of $\textbf{y}$ under a model, and $p(\Theta | \textbf{y})$ is the joint posterior distribution, sometimes called the full posterior distribution, of parameter set $\Theta$ that expresses uncertainty about parameter set $\Theta$ after taking both the prior and data into account. Since there are usually multiple parameters, $\Theta$ represents a set of $j$ parameters, and may be considered hereafter in this article as

$$\Theta = \theta_1,...,\theta_j$$

The denominator

$$p(\textbf{y}) = \int p(\textbf{y} | \Theta)p(\Theta) d\Theta$$

defines the ``marginal likelihood'' of $\textbf{y}$, or the ``prior predictive distribution'' of $\textbf{y}$, and may be set to an unknown constant $\textbf{c}$. The prior predictive distribution\footnote{The predictive distribution was introduced by \citet{jeffreys61}.} indicates what $\textbf{y}$ should look like, given the model, before $\textbf{y}$ has been observed. Only the set of prior probabilities and the model's likelihood function are used for the marginal likelihood of $\textbf{y}$. The presence of the marginal likelihood of $\textbf{y}$ normalizes the joint posterior distribution, $p(\Theta | \textbf{y})$, ensuring it is a proper distribution and integrates to one.
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

2020-12-15 17:47:21
By replacing $p(\textbf{y})$ with $\textbf{c}$, which is short for a `constant of proportionality', the model-based formulation of Bayes' theorem becomes

$$p(\Theta | \textbf{y}) = \frac{p(\textbf{y} | \Theta)p(\Theta)}{\textbf{c}}$$

By removing $\textbf{c}$ from the equation, the relationship changes from 'equals' ($=$) to 'proportional to' ($\propto$)\footnote{For those unfamiliar with $\propto$, this symbol simply means that two quantities are proportional if they vary in such a way that one is a constant multiplier of the other. This is due to the constant of proportionality $\textbf{c}$ in the equation. Here, this can be treated as `equal to'.}

$$
p(\Theta | \textbf{y}) \propto p(\textbf{y} | \Theta)p(\Theta)
$$
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

2020-12-15 17:47:45
This form can be stated as the unnormalized joint posterior being proportional to the likelihood times the prior. However, the goal in model-based Bayesian inference is usually not to summarize the unnormalized joint posterior distribution, but to summarize the marginal distributions of the parameters. The full parameter set $\Theta$ can typically be partitioned into

$$\Theta = \{\Phi, \Lambda\}$$

where $\Phi$ is the sub-vector of interest, and $\Lambda$ is the complementary sub-vector of $\Theta$, often referred to as a vector of nuisance parameters. In a Bayesian framework, the presence of nuisance parameters does not pose any formal, theoretical problems. A nuisance parameter is a parameter that exists in the joint posterior distribution of a model, though it is not a parameter of interest. The marginal posterior distribution of $\phi$, the parameter of interest, can simply be written as

$$p(\phi | \textbf{y}) = \int p(\phi, \Lambda | \textbf{y}) d\Lambda$$

In model-based Bayesian inference, Bayes' theorem is used to estimate the unnormalized joint posterior distribution, and finally the user can assess and make inferences from the marginal posterior distributions.
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

2020-12-15 17:48:51
The flat prior was historically the first attempt at an uninformative prior. The unbounded, uniform distribution, often called a flat prior, is

$$\theta \sim \mathcal{U}(-\infty, \infty)$$

where $\theta$ is uniformly-distributed from negative infinity to positive infinity. Although this seems to allow the posterior distribution to be affected soley by the data with no impact from prior information, this should generally be avoided because this probability distribution is improper, meaning it will not integrate to one since the integral of the assumed $p(\theta)$ is infinity (which violates the assumption that the probabilities sum to one). This may cause the posterior to be improper, which invalidates the model.
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

2020-12-15 17:50:10
It is important for the prior distribution to be proper. A prior distribution, $p(\theta)$, is improper\footnote{Improper priors were introduced in \citet{jeffreys61}.} when

$$\int p(\theta) d\theta = \infty$$

As noted previously, an unbounded uniform prior distribution is an improper prior distribution because $p(\theta) \propto 1$, for $-\infty < \theta < \infty$. An improper prior distribution can cause an improper posterior distribution. When the posterior distribution is improper, inferences are invalid, it is non-integrable, and Bayes factors cannot be used (though there are exceptions).

To determine the propriety of a joint posterior distribution, the marginal likelihood must be finite for all $\textbf{y}$. Again, the marginal likelihood is

$$p(\textbf{y}) = \int p(\textbf{y} | \Theta) p(\Theta) d\Theta$$

Although improper prior distributions can be used, it is good practice to avoid them.
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

2020-12-15 17:50:44
Prior distributions may be estimated within the model via hyperprior distributions, which are usually vague and nearly flat. Parameters of hyperprior distributions are called hyperparameters. Using hyperprior distributions to estimate prior distributions is known as hierarchical Bayes. In theory, this process could continue further, using hyper-hyperprior distributions to estimate the hyperprior distributions. Estimating priors through hyperpriors, and from the data, is a method to elicit the optimal prior distributions. One of many natural uses for hierarchical Bayes is multilevel modeling.

Recall that the unnormalized joint posterior distribution (equation \ref{jointposterior}) is proportional to the likelihood times the prior distribution

$$p(\Theta | \textbf{y}) \propto p(\textbf{y} | \Theta)p(\Theta)$$

The simplest hierarchical Bayes model takes the form

$$p(\Theta, \Phi | \textbf{y}) \propto p(\textbf{y} | \Theta)p(\Theta | \Phi)p(\Phi)$$

where $\Phi$ is a set of hyperprior distributions. By reading the equation from right to left, it begins with hyperpriors $\Phi$, which are used conditionally to estimate priors $p(\Theta | \Phi)$, which in turn is used, as per usual, to estimate the likelihood $p(\textbf{y} | \Theta)$, and finally the posterior is $p(\Theta, \Phi | \textbf{y})$.
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

2020-12-15 17:51:24
Although the gamma distribution is the conjugate prior distribution for the precision of a normal distribution \citep{spiegelhalter03},

$$\tau \sim \mathcal{G}(0.001, 0.001),$$

better properties for scale parameters are yielded with the non-conjugate, proper, half-Cauchy\footnote{The half-t distribution is another option.} distribution, with a general recommendation of scale=25 for a weakly informative scale parameter \citep{gelman06},

$$\sigma \sim \mathcal{HC}(25)$$
$$\tau = \sigma^{-2}$$

When the half-Cauchy is unavailable, a uniform distribution is often placed on $\sigma$ in hierarchical Bayes when the number of groups is, say, at least five,

$$\sigma \sim \mathcal{U}(0, 100)$$
$$\tau = \sigma^{-2}$$

When conjugate distributions are used, a summary statistic for a posterior distribution of $\theta$ may be represented as $t(\textbf{y})$ and said to be a sufficient statistic \citep[p. 42]{gelman04}. When nonconjugate distributions are used, a summary statistic for a posterior distribution is usually not a sufficient statistic. A sufficient statistic is a statistic that has the property of sufficiency with respect to a statistical model and the associated unknown parameter. The quantity $t(\textbf{y})$ is said to be a sufficient statistic for $\theta$, because the likelihood for $\theta$ depends on the data $\textbf{y}$ only through the value of $t(\textbf{y})$. Sufficient statistics are useful in algebraic manipulations of likelihoods and posterior distributions.
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

2020-12-15 17:52:01
In order to complete the definition of a Bayesian model, both the prior distributions and the likelihood\footnote{Ronald A. Fisher, a prominent frequentist, introduced the term likelihood in 1921 \citep{fisher21}, though the concept of likelihood was used by Bayes and Laplace. Fisher's introduction preceded a series of the most influential papers in statistics (mostly in 1922 and 1925), in which Fisher introduced numerous terms that are now common: consistency, efficiency, estimation, information, maximum likelihood estimate, optimality, parameter, statistic, sufficiency, and variance. He was the first to use Greek letters for unknown parameters and Latin letters for the estimates. Later contributions include F statistics, design of experiments, ANOVA, and many more.} must be approximated or fully specified. The likelihood, likelihood function, or $p(\textbf{y} | \Theta)$, contains the available information provided by the sample. The likelihood is

$$p(\textbf{y} | \Theta) = \prod^n_{i=1} p(\textbf{y}_i | \Theta)$$

The data $\textbf{y}$ affects the posterior distribution $p(\Theta | \textbf{y})$ only through the likelihood function $p(\textbf{y} | \Theta)$. In this way, Bayesian inference obeys the likelihood principle, which states that for a given sample of data, any two probability models $p(\textbf{y} | \Theta)$ that have the same likelihood function yield the same inference for $\Theta$. For more information on the likelihood principle, see section \ref{lprinciple}.
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

2020-12-15 17:53:17
In non-technical parlance, ``likelihood'' is usually a synonym for ``probability'', but in statistical usage there is a clear distinction: whereas ``probability'' allows us to predict unknown outcomes based on known parameters, ``likelihood'' allows us to estimate unknown parameters based on known outcomes.

In a sense, likelihood can be thought a reversed version of conditional probability. Reasoning forward from a given parameter $\theta$, the conditional probability of $\textbf{y}$ is the density $p(\textbf{y} | \theta)$. With $\theta$ as a parameter, here are relationships in expressions of the likelihood function

$$\mathscr{L}(\theta | \textbf{y}) = p(\textbf{y} | \theta) = f(\textbf{y} | \theta)$$

where $\textbf{y}$ is the observed outcome of an experiment, and the likelihood ($\mathscr{L}$) of $\theta$ given $\textbf{y}$ is equal to the density $p(\textbf{y} | \theta)$ or function $f(\textbf{y} | \theta)$. When viewed as a function of $\textbf{y}$ with $\theta$ fixed, it is not a likelihood function $\mathscr{L}(\theta | \textbf{y})$, but merely a probability density function $p(\textbf{y} | \theta)$. When viewed as a function of $\theta$ with $\textbf{y}$ fixed, it is a likelihood function and may be denoted as $\mathscr{L}(\theta | \textbf{y})$, $p(\textbf{y} | \theta)$, or $f(\textbf{y} | \theta)$\footnote{Note that $\mathscr{L}(\theta | \textbf{y})$ is not the same as the probability that those parameters are the right ones, given the observed sample.}.
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

2020-12-15 17:53:37
For example, in a Bayesian linear regression with an intercept and two independent variables, the model may be specified as

$$\textbf{y}_i \sim \mathcal{N}(\mu_i, \sigma^2)$$
$$\mu_i = \beta_1 + \beta_2\textbf{X}_{i,1} + \beta_3\textbf{X}_{i,2}$$

The dependent variable $\textbf{y}$, indexed by $i=1,...,n$, is stochastic, and normally-distributed according to the expectation vector $\mu$, and variance $\sigma^2$. Expectation vector $\mu$ is an additive, linear function of a vector of regression parameters, $\beta$, and the design matrix \textbf{X}.

Since $\textbf{y}$ is normally-distributed, the probability density function (PDF) of a normal distribution will be used, and is usually denoted as

$$f(\textbf{y}) = \frac{1}{\sqrt{2\pi}\sigma}\exp[(-\frac{1}{2}\sigma^2)(\textbf{y}_i-\mu_i)^2]; \quad \textbf{y} \in (-\infty, \infty)$$

By considering a conditional distribution, the record-level likelihood in Bayesian notation is

$$p(\textbf{y}_i | \Theta) = \frac{1}{\sqrt{2\pi}\sigma}\exp[(-\frac{1}{2}\sigma^2)(\textbf{y}_i-\mu_i)^2]; \quad \textbf{y} \in (-\infty, \infty)$$
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

2020-12-15 17:53:54
In both theory and practice, and in both frequentist and Bayesian inference, the log-likelihood is used instead of the likelihood, on both the record- and model-level. The model-level product of record-level likelihoods can exceed the range of a number that can be stored by a computer, which is usually affected by sample size. By estimating a record-level log-likelihood, rather than likelihood, the model-level log-likelihood is the sum of the record-level log-likelihoods, rather than a product of the record-level likelihoods.

$$\log[p(\textbf{y} | \theta)] = \sum^n_{i=1} \log[p(\textbf{y}_i | \theta)]$$

rather than

$$p(\textbf{y} | \theta) = \prod^n_{i=1} p(\textbf{y}_i | \theta)$$

As a function of $\theta$, the unnormalized joint posterior distribution is the product of the likelihood function and the prior distributions. To continue with the example of Bayesian linear regression, here is the unnormalized joint posterior distribution

$$p(\beta, \sigma^2 | \textbf{y}) = p(\textbf{y} | \beta, \sigma^2)p(\beta_1)p(\beta_2)p(\beta_3)p(\sigma^2)$$

More usually, the logarithm of the unnormalized joint posterior distribution is used, which is the sum of the log-likelihood and prior distributions. Here is the logarithm of the unnormalized joint posterior distribution for this example

$$\log[p(\beta, \sigma^2 | \textbf{y})] = \log[p(\textbf{y} | \beta, \sigma^2)] + \log[p(\beta_1)] + \log[p(\beta_2)] + \log[p(\beta_3)] + \log[p(\sigma^2)]$$

The logarithm of the unnormalized joint posterior distribution is maximized with numerical approximation.
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

2020-12-15 17:54:37
Approximate Bayesian Computation (ABC), also called likelihood-free estimation, is a family of numerical approximation techniques in Bayesian inference. ABC is especially useful when evaluation of the likelihood, $p(\textbf{y} | \Theta)$ is computationally prohibitive, or when suitable likelihoods are unavailable. As such, ABC algorithms estimate likelihood-free approximations. ABC is usually faster than a similar likelihood-based numerical approximation technique, because the likelihood is not evaluated directly, but replaced with an approximation that is usually easier to calculate. The approximation of a likelihood is usually estimated with a measure of distance between the observed sample, $\textbf{y}$, and its replicate given the model, $\textbf{y}^{rep}$, or with summary statistics of the observed and replicated samples.
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

2020-12-15 17:55:08
The ``posterior predictive distribution'' is either the replication of $\textbf{y}$ given the model (usually represented as $\textbf{y}^{rep}$), or the prediction of a new and unobserved $\textbf{y}$ (usually represented as $\textbf{y}^{new}$ or $\textbf{y}'$), given the model. This is the likelihood of the replicated or predicted data, averaged over the posterior distribution $p(\Theta | \textbf{y})$

$$p(\textbf{y}^{rep} | \textbf{y}) = \int p(\textbf{y}^{rep} | \Theta)p(\Theta | \textbf{y}) d\Theta$$

or

$$p(\textbf{y}^{new} | \textbf{y}) = \int p(\textbf{y}^{new} | \Theta)p(\Theta | \textbf{y}) d\Theta$$

If $\textbf{y}$ has missing values, then the missing $\textbf{y}$s can be estimated with the posterior predictive distribution\footnote{The predictive distribution was introduced by \citet{jeffreys61}.} as $\textbf{y}^{new}$ from within the model. For the linear regression example, the integral for prediction is

$$p(\textbf{y}^{new} | \textbf{y}) = \int p(\textbf{y}^{new} | \beta,\sigma^2)p(\beta,\sigma^2 | \textbf{y}) d\beta d\sigma^2$$

The posterior predictive distribution is easy to estimate

$$\textbf{y}^{new} \sim \mathcal{N}(\mu, \sigma^2)$$

where $\mu$ = \textbf{X}$\beta$, and $\mu$ is the conditional mean, while $\sigma^2$ is the residual variance.
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

2020-12-15 17:56:22

Hypothesis testing with Bayes factors is more robust than frequentist hypothesis testing, since the Bayesian form avoids model selection bias, evaluates evidence in favor the null hypothesis, includes model uncertainty, and allows non-nested models to be compared (though of course the model must have the same dependent variable). Also, frequentist significance tests become biased in favor of rejecting the null hypothesis with sufficiently large sample size.

The Bayes factor for comparing two models may be approximated as the ratio of the marginal likelihood of the data in model 1 and model 2. Formally, the Bayes factor in this case is

$$B = \frac{p(\textbf{y}|\mathcal{M}_1)}{p(\textbf{y}|\mathcal{M}_2)} = \frac{\int p(\textbf{y}|\Theta_1,\mathcal{M}_1)p(\Theta_1|\mathcal{M}_1)d\Theta_1}{\int p(\textbf{y}|\Theta_2,\mathcal{M}_2)p(\Theta_2|\mathcal{M}_2)d\Theta_2}$$

where $p(\textbf{y}|\mathcal{M}_1)$ is the marginal likelihood of the data in model 1.

The Bayes factor, $B$, is the posterior odds in favor of the hypothesis divided by the prior odds in favor of the hypothesis, where the hypothesis is usually $\mathcal{M}_1 > \mathcal{M}_2$. Put another way,
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

2020-12-15 17:57:13
For example, when $B=2$, the data favor $\mathcal{M}_1$ over $\mathcal{M}_2$ with 2:1 odds.

In a non-hierarchical model, the marginal likelihood may easily be approximated with the Laplace-Metropolis Estimator for model $m$ as

$$p(\textbf{y}|m) = (2\pi)^{d_m/2}|\Sigma_m|^{1/2}p(\textbf{y}|\Theta_m,m)p(\Theta_m|m)$$

where $d$ is the number of parameters and $\Sigma$ is the inverse of the negative of the Hessian matrix of second derivatives. \citet{lewis97} introduce the Laplace-Metropolis method of approximating the marginal likelihood in MCMC, though it naturally works with Laplace Approximation as well. For a hierarchical model that involves both fixed and random effects, the Compound Laplace-Metropolis Estimator must be used.

Gelman finds Bayes factors generally to be irrelevant, because they compute the relative probabilities of the models conditional on one of them being true. Gelman prefers approaches that measure the distance of the data to each of the approximate models \citep[p. 180]{gelman04}. However, \citet{kass95} explain that ``the logarithm of the marginal probability of the data may also be viewed as a predictive score. This is of interest, because it leads to an interpretation of the Bayes factor that does not depend on viewing one of the models as `true''
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

2020-12-15 17:57:59
In Bayesian inference, the most common method of assessing the goodness of fit of an estimated statistical model is a generalization of the frequentist Akaike Information Criterion (AIC). The Bayesian method, like AIC, is not a test of the model in the sense of hypothesis testing, though Bayesian inference has Bayes factors for such purposes. Instead, like AIC, Bayesian inference provides a model fit statistic that is to be used as a tool to refine the current model or select the better-fitting model of different methodologies.

To begin with, model fit can be summarized with deviance, which is defined as -2 times the log-likelihood \citep[p. 180]{gelman04}, such as

$$D(\textbf{y},\Theta) = -2\log[p(\textbf{y} | \Theta)]$$

Just as with the likelihood, $p(\textbf{y} | \Theta)$, or log-likelihood, the deviance exists at both the record- and model-level. Due to the development of \proglang{BUGS} software \citep{gilks94}, deviance is defined differently in Bayesian inference than frequentist inference. In frequentist inference, deviance is -2 times the log-likelihood ratio of a reduced model compared to a full model, whereas in Bayesian inference, deviance is simply -2 times the log-likelihood. In Bayesian inference, the lowest expected deviance has the highest posterior probability \citep[p. 181]{gelman04}.

A related way to measure model complexity is as half the posterior variance of the model-level deviance, known as pV \citep[p. 182]{gelman04}

$$\mathrm{pV} = \mathrm{var}(D) / 2$$

The effect of model fitting, pD or pV, can be thought of as the number of `unconstrained' parameters in the model, where a parameter counts as: 1 if it is estimated with no constraints or prior information; 0 if it is fully constrained or if all the information about the parameter comes from the prior distribution; or an intermediate value if both the data and the prior are informative \citep[p. 182]{gelman04}. Therefore, by including prior information, Bayesian inference is more efficient in terms of the effective number of parameters than frequentist inference. Hierarchical, mixed effects, or multilevel models are even more efficient regarding the effective number of parameters.
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

2020-12-16 19:00:47
$$
y = \alpha_{j} + \beta_{j} v + \boldsymbol{\gamma}_{j} \boldsymbol{F} + \boldsymbol{\delta}_{j} \boldsymbol{D}_{j} + \varepsilon
$$

where $j$ indexes regression models, $\boldsymbol{F}$ is the full set of free variables that will be included in every regression model, $\boldsymbol{D}_{j}$ is a vector of $k$ variables taken from the set $\boldsymbol{X}$ of doubtful variables, and $\varepsilon$ is the error term. While  $\boldsymbol{D}_{j}$ has conventionally been limited to no more than three doubtful variables per model \citep{LevineRenelt1992, Achen2005}, the particular choice of $k$, the number of doubtful variables to be included in each combination, is up to the researcher.

The above regression is estimated for each of the $M$ possible combinations of $\boldsymbol{D}_{j} \subset \boldsymbol{X}$. The estimated regression coefficients $\hat{\beta}_{j}$ on the focus variable $v$, along with the corresponding standard errors $\hat{\sigma}_{j}$, are collected and stored for use in later calculations. In the original formulation of extreme bounds analysis, the regressions were estimated by Ordinary Least Squares (OLS). In recent research, however, other types of regression models have also been used, such as ordered probit models \citep{Bjornskov2008, Hafner-Burton2005} or logistic models \citep{HegreSambanis2006, MoserSturm2011, Gassebner2013}.
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

2020-12-16 19:01:51
In order to determine whether a determinant is robust or fragile, Leamer's extreme bounds analysis focuses only on the extreme bounds of the regression coefficients \citep{Leamer1985}. For any focus variable $v$, the lower and upper extreme bounds are defined as the minimum and maximum values of $\hat{\beta}_{j} \pm \tau  \hat{\sigma}_{j}$ across the $M$ estimated regression models, where $\tau$ is the critical value for the requested confidence level. For the conventional 95-percent confidence level, $\tau$ will thus be equal to approximately 1.96. If the upper and lower extreme bounds have the same sign, the focus variable $v$ is said to be robust. Conversely, if the bounds have opposite signs, the variable is declared fragile.

The interval between the lower and upper extreme bound represents the set of values that are not statistically significantly distinguishable from the coefficient estimate $\hat{\beta}_{j}$. In other words, a simple t-test would fail to reject the null hypothesis that the true parameter $\beta_{j}$ equals any value between the extreme bounds. Intuitively, Leamer's version of extreme bounds analysis scans a large number of model specifications for the lowest and highest value that the $\beta_{j}$ parameter could plausibly take at the requested confidence level. It then labels variables robust and fragile based on whether these extreme bounds have the same or opposite signs, respectively.
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

2020-12-16 19:17:58
$$
\begin{align*}
    p_x &= \sqrt{\frac{2}{\pi \phi}} \frac{e^{(\phi\mu)^{-1}}}{x!}
    \left(
      \sqrt{2\phi \left( 1 + \frac{1}{2\phi\mu^2} \right)}
    \right)^{-(x - \frac{1}{2})} \\
    &\phantom{=} \times K_{x - \frac{1}{2}} \left( \sqrt{\frac{2}{\phi}\left(1
          + \frac{1}{2\phi\mu^2}\right)} \right),
\end{align*}
$$
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

2020-12-16 19:18:24
$$\begin{align*}
p_0 &= \exp\left\{
      \frac{1}{\phi\mu} \left(1 - \sqrt{1 + 2\phi\mu^2}\right)
    \right\} \\
    p_1 &= \frac{\mu}{\sqrt{1 + 2\phi\mu^2}}\, p_0 \\
    p_x &= \frac{2\phi\mu^2}{1 + 2\phi\mu^2} \left( 1 - \frac{3}{2x}
    \right) p_{x - 1} + \frac{\mu^2}{1 + 2\phi\mu^2} \frac{1}{x(x -
      1)}\, p_{x - 2}, \quad x = 2, 3, \dots.
\end{align*}
$$
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

2020-12-16 19:18:58
The first moment of the distribution is $\mu$. The second and third
central moment are, respectively,
$$
\begin{align*}
  \mu_2 &= \sigma^2 = \mu + \phi\mu^3 \\
  \mu_3 &= \mu + 3 \phi \mu^2 \sigma^2.
\end{align*}
$$
For the limiting case $\mu = \infty$, the underlying inverse Gaussian
has an inverse chi-squared distribution. The latter has no finite
strictly positive, integer moments and, consequently, neither does the
Poisson-inverse Gaussian. See \autoref{sec:app:discrete:pig} for the
formulas in this case.
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

2020-12-16 19:20:03
$$
\Gamma(\alpha; x) = \frac{1}{\Gamma(\alpha)}
  \int_0^x t^{\alpha - 1} e^{-t}\, dt, \quad \alpha > 0, x > 0,
$$

with

$$
\beta(a, b)
  = \int_0^1 t^{a - 1} (1 - t)^{b - 1}\, dt
  = \frac{\Gamma(a) \Gamma(b)}{\Gamma(a + b)}.
$$
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

2020-12-16 19:21:03
$$
G(\alpha; x) = -\frac{x^\alpha e^{-x}}{\alpha}
  + \frac{1}{\alpha} G(\alpha + 1; x).
$$
This process can be repeated until $\alpha + k$ is a positive number,
in which case the right hand side can be evaluated with
\eqref{eq:gammainc:apos}. If $\alpha = 0, -1, -2, \dots$, this
calculation requires the value of
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

2020-12-16 19:21:20
$$
F(a, b; c; z) = \frac{\Gamma(c)}{\Gamma(a) \Gamma(b)}
  \sum_{k = 0}^\infty
  \frac{\Gamma(a + k) \Gamma(b + k)}{\Gamma(c + k)} \frac{z^k}{k!}
$$
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

2020-12-16 19:22:04
$$
\begin{align*}
  f(x)
  &= \frac{\gamma u^\tau (1 - u)^\alpha}{%
    (x - \mu) \beta (\alpha, \tau )},
    \quad u = \frac{v}{1 + v},
    \quad v = \left(\frac{x - \mu}{\theta} \right)^\gamma,
    \quad x > \mu \\
  F(x)
  &= \beta(\tau, \alpha; u) \\ \displaybreak[0]
  \E{X^k}
  &= \sum_{j = 0}^k \binom{k}{j} \mu^{k - j} \theta^j\,
    \frac{\Gamma(\tau+j/\gamma) \Gamma(\alpha-j/\gamma)}{%
    \Gamma(\alpha) \Gamma(\tau)},
    \quad \text{integer } 0 \leq k < \alpha\gamma \\
  \E{(X \wedge x)^k}
  &= \sum_{j = 0}^k \binom{k}{j} \mu^{k - j} \theta^j\,
    \frac{B(\tau+j/\gamma, \alpha-j/\gamma; u)}{%
    \Gamma(\alpha) \Gamma(\tau)} \\
  &\phantom{=} + x^k [1 - \beta(\tau, \alpha; u)],
    \quad \text{integer } k \geq 0,
    \quad \alpha - j/\gamma \neq -1, -2, \dots
\end{align*}
$$
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

2020-12-16 19:22:41
$$
\begin{align}
  f(x)
  &= \frac{\alpha \gamma u^\alpha (1 - u)}{(x - \mu)},
    \quad u = \frac{1}{1 + v},
    \quad v = \left(\frac{x - \mu}{\theta} \right)^\gamma,
    \quad x > \mu \\
  F(x)
  &= 1 - u^\alpha \\ \displaybreak[0]
  \E{X^k}
  &= \sum_{j = 0}^k \binom{k}{j} \mu^{k - j} \theta^j\,
    \frac{\Gamma(1+j/\gamma) \Gamma(\alpha-j/\gamma)}{%
    \Gamma(\alpha)},
    \quad \text{integer } 0 \leq k < \alpha\gamma \\
  \E{(X \wedge x)^k}
  &= \sum_{j = 0}^k \binom{k}{j} \mu^{k - j} \theta^j\,
    \frac{B(1+j/\gamma, \alpha-j/\gamma; 1-u)}{%
    \Gamma(\alpha)} \\
  &\phantom{=} + x^k u^\alpha,
    \quad \text{integer } k \geq 0
    \quad \alpha - j/\gamma \neq -1, -2, \dots
\end{align}
$$
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

2020-12-16 19:24:59
$$
f_{\tilde{Y}^P}(y)   \begin{cases}
  &=
     0,
      & 0 \leq y \leq \alpha d \\
    \left( \mathbb{D}\frac{1}{\alpha (1 + r)} \right)
    \mathbb{D}\frac{f_X \left( \frac{y}{\alpha(1 + r)} \right)}{%
      1 - F_X \left( \frac{d}{1 + r} \right)},
      & \alpha d < y < \alpha u \\
    \mathbb{D}\frac{1 - F_X \Big( \frac{u}{1 + r} \Big)}{%
      1 - F_X \left( \frac{d}{1 + r} \right)},
      & y = \alpha u
    \end{cases}
  $$

二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

2020-12-16 19:26:27
\subsection{Form}
$$\textbf{y}_t \sim \mathcal{N}(\mu_t, \sigma^2_t), \quad t=1,\dots,T$$
$$\mu_t = \alpha + \sum^P_{p=1} \phi_p \textbf{y}_{t-p}, \quad t=1,\dots,T$$
$$\epsilon_t = \textbf{y}_t - \mu_t$$
$$\alpha \sim \mathcal{N}(0, 1000)$$
$$\phi_p \sim \mathcal{N}(0, 1000), \quad p=1,\dots,P$$
$$\sigma^2_t = \omega + \sum^Q_{q=1} \theta_q \epsilon^2_{t-q}, \quad t=2,\dots,T$$
$$\omega \sim \mathcal{HC}(25)$$
$$\theta_q \sim \mathcal{U}(0, 1), \quad q=1,\dots,Q$$
\subsection{Data}
\code{data(demonfx) \\
y <- as.vector(diff(log(as.matrix(demonfx[1:261,1])))) \\
T <- length(y) \\
L.P <- c(1,5,20) \#Autoregressive lags \\
L.Q <- c(1,2) \#Volatility lags \\
P <- length(L.P) \#Autoregressive order \\
Q <- length(L.Q) \#Volatility order \\
mon.names <- "LP" \\
parm.names <- as.parm.names(list(alpha=0, phi=rep(0,P), omega=0, \\
\hspace*{0.27 in} theta=rep(0,Q))) \\
pos.alpha <- grep("alpha", parm.names) \\
pos.phi <- grep("phi", parm.names) \\
pos.omega <- grep("omega", parm.names) \\
pos.theta <- grep("theta", parm.names) \\
PGF <- function(Data) \{ \\
\hspace*{0.27 in} alpha <- rnorm(1) \\
\hspace*{0.27 in} phi <- runif(Data$P,-1,1) \\
\hspace*{0.27 in} omega <- rhalfcauchy(1,5) \\
\hspace*{0.27 in} theta <- runif(Data$Q, 1e-10, 1-1e-5) \\
\hspace*{0.27 in} return(c(alpha, phi, omega, theta)) \\
\hspace*{0.27 in} \} \\
MyData <- list(L.P=L.P, L.Q=L.Q, PGF=PGF, P=P, Q=Q, T=T, mon.names=mon.names, \\
\hspace*{0.27 in} parm.names=parm.names, pos.alpha=pos.alpha, pos.phi=pos.phi, \\
\hspace*{0.27 in} pos.omega=pos.omega, pos.theta=pos.theta, y=y) \\
}
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

2020-12-16 19:27:13
$$\textbf{y}_t \sim \mathcal{N}(\mu_t, \sigma^2_t), \quad t=1,\dots,T$$
$$\mu_t = \alpha + \sum^P_{p=1} \phi_p \textbf{y}_{t-p} + \delta \sigma^2_{t-1}, \quad t=1,\dots,T$$
$$\epsilon_t = \textbf{y}_t - \mu_t$$
$$\alpha \sim \mathcal{N}(0, 1000)$$
$$\phi_p \sim \mathcal{N}(0, 1000), \quad p=1,\dots,P$$
$$\delta \sim \mathcal{N}(0, 1000)$$
$$\sigma^2_t = \omega + \sum^Q_{q=1} \theta_q \epsilon^2_{t-q}, \quad t=2,\dots,T$$
$$\omega \sim \mathcal{HC}(25)$$
$$\theta_q \sim \mathcal{U}(0, 1), \quad q=1,\dots,Q$$
\subsection{Data}
$$\textbf{y}_t \sim \mathcal{N}(\mu_t, \sigma^2_t), \quad t=1,\dots,T$$
$$\mu_t = \alpha + \sum^P_{p=1} \phi_p \textbf{y}_{t-p}, \quad t=1,\dots,T$$
$$\epsilon_t = \textbf{y}_t - \mu_t$$
$$\alpha \sim \mathcal{N}(0, 1000)$$
$$\phi_p \sim \mathcal{N}(0, 1000), \quad p=1,\dots,P$$
$$\sigma^2_t = \theta_1 + \theta_2 \epsilon^2_{t-1} + \theta_3 \sigma^2_{t-1}$$
$$\omega \sim \mathcal{HC}(25)$$
$$\theta_k = \frac{1}{1 + \exp(-\theta_k)}, \quad k=1,\dots,3$$
$$\theta_k \sim \mathcal{N}(0, 1000) \in [-10,10], \quad k=1,\dots,3$$
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

2020-12-16 19:27:49
$$\textbf{y}_t \sim \mathcal{N}(\mu_t, \sigma^2_t), \quad t=1,\dots,T$$
$$\mu_t = \alpha + \sum^P_{p=1} \phi_p \textbf{y}_{t-p} + \delta \sigma^2_{t-1}, \quad t=1,\dots,(T+1)$$
$$\epsilon_t = \textbf{y}_t - \mu_t$$
$$\alpha \sim \mathcal{N}(0, 1000)$$
$$\phi_p \sim \mathcal{N}(0, 1000), \quad p=1,\dots,P$$
$$\sigma^2_t = \omega + \theta_1 \epsilon^2_{t-1} + \theta_2 \sigma^2_{t-1}$$
$$\omega \sim \mathcal{HC}(25)$$
$$\theta_k \sim \mathcal{U}(0, 1), \quad k=1,\dots,2$$
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

栏目导航
热门文章
推荐文章

说点什么

分享

扫码加好友,拉您进群
各岗位、行业、专业交流群