[精彩WinBUGS答问]Thinning chains in BUGS/JAGS

2463

收藏 2014-06-16

Hi I have a quick question about the details of running a model in JAGS and BUGS. Say I run a model with n.burnin=5000, n.iter=5000 and thin=2. Does this mean that the program will run 5,000 iterations, and discard results, and then run another 10,000 iterations, only keeping every second result?

If I save these simulations as a CODA object, are all 10,000 saved, or only the thinned 5,000? I'm just trying to understand which set of iterations are used to make the ACF plot?

Thank you

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

全部回复

Lisrelchen

2014-6-16 12:36:27

Thinning to reduce autocorrelation: Rarely useful!

Borys Paulewicz, commenting on a previous post, brought to my attention a very recent article about thinning of MCMC chains: Link, W. A. & Eaton, M. J. (2011) On thinning of chains in MCMC. Methods in Ecology and Evolution. First published online 17 June 2011. doi: 10.1111/j.2041-210X.2011.00131.x The basic conclusion of the article is that thinning of chains is not usually appropriate when the goal is precision of estimates from an MCMC sample. (Thinning can be useful for other reasons, such as memory or time constraints in post-chain processing, but those are very different motivations than precision of estimation of the posterior distribution.)

Here's the idea: Consider an MCMC chain that is strongly autocorrelated. Autocorrelation produces clumpy samples that are unrepresentative, in the short run, of the true underlying posterior distribution. Therefore, if possible, we would like to get rid of autocorrelation so that the MCMC sample provides a more precise estimate of the posterior sample. One way to decrease autocorrelation is to thin the sample, using only everynth step. If we keep 50,000 thinned steps with small autocorrelation, then we very probably have a more precise estimate of the posterior than 50,000 unthinned steps with high autocorrelation. But to get 50,000 kept steps in a thinned chain, we needed to generate n*50,000 steps. With such a long chain, the clumpy autocorrelation has probably all been averaged out anyway! In fact, Link and Eaton show that the longer (unthinned) chain usually yields better estimates of the true posterior than the shorter thinned chain, even for percentiles in the tail of the distribution, at least for the particular cases they consider.

The tricky part is knowing how long of a chain is long enough to smooth out short-run autocorrelation. The more severe the autocorrelation is, the longer the chain needs to be. I have not seen rules of thumb for how to translate an autocorrelation function into a recommended chain length. Link and Eaton suggest monitoring different independent chains and assaying whether the estimates produced by the different chains are suitably similar to each other.

For extreme autocorrelation, it's best not to rely on long-run averaging out, but instead to use other techniques that actually get rid of the autocorrelation. This usually involves reparameterization, as appropriate for the particular model.

Link and Eaton point out that the inefficiency of thinning has been known for years, but many practitioners have gone on using it anyway. My book followed those practitioners. It should be pointed out that thinning does not yield incorrect results (in the sense of being biased). Thinning merely produces correct results less efficiently (on average) than using the full chain from which the thinned chain was extracted. There are at least a couple more mathematically advanced textbooks that you might turn to for additional advice. For example, Jackman says in his 2009 book, Bayesian Analysis for the Social Sciences, "High levels of autocorrelation in a MCMC algorithm are not fatal in and of themselves, but they will indicate that a very long run of the sampler may be required. Thinning is not a strategy for avoiding these long runs, but it is a strategy for dealing with the otherwise overwhelming amount of MCMC output. (p. 263)" Christensen et al. say in their 2011 book, Bayesian Ideas and Data Analysis, "Unless there is severe autocorrelation, e.g., high correlation with, say [lag]=30, we don't believe that thinning is worthwhile. (p.146)" Unfortunately, there is no sure advice about how long a chain is needed. Longer is better. Perhaps if you're tempted to thin by n to reduce autocorrelation, just use a chain n times as long without thinning.

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

Lisrelchen

2014-6-16 12:37:36

Short answer: The number of iterations incorporates the burn in and does not incorporate thinning.

Less short answer: If you were to run a BUGS model through R2WinBUGS or R2OpenBUGS (or view a summary of WinBUGS output) with the arguments you stated:

n.iter=5000, n.burnin=5000, n.thin=2
you would get an error message/no output. n.iter refers to the total number of iterations including the burn in, hence all your iterations are burn in and are thrown away (or not included in the CODA output and any ACF plot in WinBUGS).

Thinning is treated differently (in relation to n.iter). For example if you set your MCMC up with any of the following arguments:

n.iter=6000, n.burnin=5000, n.thin=1
n.iter=6000, n.burnin=5000, n.thin=5
n.iter=6000, n.burnin=5000, n.thin=10
only 1000 iterations will be saved, i.e. all non-thinned simulations are discarded (in CODA output or any ACF plot in WinBUGS).

Not sure if this is the same for jags?

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

Lisrelchen

2014-6-16 12:38:26

Other answered is correct about BUGS but his answer does not apply to JAGS (at least, not to rjags, R2jags might be different). I haven't used JAGS directly, but the writer of rjags is the creator of JAGS so I would guess they use the same convention.

In rjags, the jags.model object keeps track of the number of iterations that the chain(s) have been run.

Here is a small model in a file "tmpJags":

model {
X ~ dnorm(Y, 1)
Y ~ dt(0, 1, 1)
}
Then I run

X <- 1
jm <- jags.model(file = "tmpJags", data = list(X = X),
n.chains = 4, n.adapt = 1000)
jm consists of 4 chains, each of which has been run 1000 total iterations. Then I do

samps <- coda.samples(jm, "Y", n.iter = 1000, thin = 10)
Now all 4 of my chains have been run for a total of 2000 iterations each, and I have collected 100 samples from each chain, for 400 total samples saved.

To me, it makes sense to do it this way because for the purposes of monitoring chain convergence you would rather think in terms of total iterations than iterations after thinning

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

oliyiyi

2015-4-28 14:53:23

nice post， thanks！

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

分享

扫码加好友，拉您进群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群