I recently came across a very interesting paper by Y. Yu and X. Meng[1] who present an interweaving strategy between different model parameterizations to improve mixing. It is well known that different model parameterizations can perform better than others under certain conditions. Papaspiliopoulos, Roberts and Sköld [2] present a general framework for how to parameterize hierarchical models and provide insights into the conditions under which centered and non-centered parameterizations outperform each other. One isn’t, however, restricted to reperameterizations of location parameters only, as outlined in the aforementioned paper, and so I decided to experiment with reparameterizations of the scale parameter in a simple hierarchical model with improper priors on the parameters.

## Centered Parameterization

Papaspiliopoulos gave a general definition of the centered parameterization to be when is independent of given

### Full Conditionals

## Non-Centered Parameterization

Papaspiliopoulos gave a general definition of the non-centered parameterization to be when and are a priori independent.

### Full Conditionals

## Interweaving Strategy

Generally when the CP works well, the NCP works poorly and vice versa. Yaming Yu and Xiao-Li Meng[1] present a way of combining both strategies by interweaving the Gibbs steps of both parameterizations at each iteration. The details can be read in their paper. I decided to test all three Gibbs samplers with the following R code:

#Generate Data lam2=0.5 lam=sqrt(lam2) sig2=1 n=1000 Xt=rnorm(n,0,sqrt(lam2*sig2)) Y=rnorm(n,Xt,sqrt(sig2)) nmc=2000 X=Xt #Centered Parameterization cp_lam2=rep(0,nmc) cp_X=matrix(0,nmc,n) for(i in 1:nmc){ inv_lam2=rgamma(1,(n)/2,rate=(t(X)%*%X)/(2*sig2)) lam2=1/inv_lam2 X=rnorm(n,(1/(1/sig2 + 1/(sig2*lam2)))*Y/sig2, sqrt(1/(1/sig2 + 1/(sig2*lam2)))) cp_lam2[i]=lam2 cp_X[i,]=X } mean_cp_X=apply(cp_X,2,mean) #Non-Centered Parameterization X=Xt ncp_lam2=rep(0,nmc) ncp_X=matrix(0,nmc,n) for(i in 1:nmc){ lam=rnorm(1,t(X)%*%Y/(t(X)%*%X), sqrt(sig2/(t(X)%*%X))) lam2=lam*lam; X=rnorm(n, (1/(1/sig2 + lam2/sig2))*lam*Y/sig2, sqrt(1/(1/sig2+lam2/sig2)) ) ncp_lam2[i]=lam2 ncp_X[i,]=X } mean_ncp_X=apply(ncp_X,2,mean) #Interweaving Strategy int_lam2=rep(0,nmc) int_X=matrix(0,nmc,n) for(i in 1:nmc){ X=rnorm(n,(1/(1/sig2 + 1/(sig2*lam2)))*Y/sig2, sqrt(1/(1/sig2 + 1/(sig2*lam2)))) inv_lam2=rgamma(1,(n)/2,rate=(t(X)%*%X)/(2*sig2)) half_lam2=1/inv_lam2 X=X/sqrt(half_lam2) #Transform to Xtilde lam=rnorm(1,t(X)%*%Y/(t(X)%*%X), sqrt(sig2/(t(X)%*%X))) lam2=lam*lam; int_lam2[i]=lam2 int_X[i,]=X } mean_cp_X=apply(cp_X,2,mean) #Remove Burnin cp_lam2=cp_lam2[-(1:1000)] ncp_lam2=ncp_lam2[-(1:1000)] int_lam2=int_lam2[-(1:1000)] #Plot Results par(mfrow=c(3,3)) acf(cp_lam2) plot(cp_lam2,type="l") plot(cp_lam2[1:nmc-1],cp_lam2[2:nmc]) acf(ncp_lam2) plot(ncp_lam2,type="l") plot(ncp_lam2[1:nmc-1],ncp_lam2[2:nmc]) acf(int_lam2) plot(int_lam2,type="l") plot(int_lam2[1:nmc-1],int_lam2[2:nmc])

## Results

## Discussion

As lambda gets small the centered parameterization becomes ever more autocorrelated and poorly mixing. When lambda becomes large the non-centered parameterization becomes ever more autocorrelated and poorly mixing. The interweaved Gibbs sampler exhibits great mixing in all cases.

[Bibtex]

```
@article{Yu11,
author = {Yu, Yaming and Meng, Xiao-Li},
citeulike-article-id = {10408757},
citeulike-linkout-0 = {http://amstat.tandfonline.com/doi/abs/10.1198/jcgs.2011.203main},
citeulike-linkout-1 = {http://pubs.amstat.org/doi/abs/10.1198/jcgs.2011.203main},
citeulike-linkout-2 = {http://dx.doi.org/10.1198/jcgs.2011.203main},
doi = {10.1198/jcgs.2011.203main},
journal = {Journal of Computational and Graphical Statistics},
number = {3},
pages = {531--570},
posted-at = {2012-03-03 18:10:07},
priority = {2},
title = {{To Center or Not to Center: That Is Not the Question--An Ancillarity-Sufficiency Interweaving Strategy (ASIS) for Boosting MCMC Efficiency}},
url = {http://amstat.tandfonline.com/doi/abs/10.1198/jcgs.2011.203main},
volume = {20},
year = {2011}
}
```

[Bibtex]

```
@article{Papaspiliopoulos07,
abstract = {{In this paper, we describe centering and noncentering methodology as complementary techniques for use in parametrization of broad classes of hierarchical models, with a view to the construction of effective MCMC algorithms for exploring posterior distributions from these models. We give a clear qualitative understanding as to when centering and noncentering work well, and introduce theory concerning the convergence time complexity of Gibbs samplers using centered and noncentered parametrizations. We give general recipes for the construction of noncentered parametrizations, including an auxiliary variable technique called the state-space expansion technique. We also describe partially noncentered methods, and demonstrate their use in constructing robust Gibbs sampler algorithms whose convergence properties are not overly sensitive to the data.}},
author = {Papaspiliopoulos, Omiros and Roberts, Gareth O. and Sk\"{o}ld, Martin},
citeulike-article-id = {8977350},
citeulike-linkout-0 = {http://www.jstor.org/stable/27645805},
journal = {Statistical Science},
number = {1},
pages = {59--73},
posted-at = {2011-03-10 18:55:50},
priority = {2},
publisher = {Institute of Mathematical Statistics},
title = {{A general framework for the parametrization of hierarchical models}},
url = {http://www.jstor.org/stable/27645805},
volume = {22},
year = {2007}
}
```

Yu Y. & Meng X.L. (2011). To Center or Not to Center: That Is Not the Question—An Ancillarity–Sufficiency Interweaving Strategy (ASIS) for Boosting MCMC Efficiency, Journal of Computational and Graphical Statistics, 20 (3) 531-570. DOI: 10.1198/jcgs.2011.203main

question: isn’t each interleaving step equivalent to two of the CP and NCP chains? if so, how does it affect the evaluation of the results?