Model Selection in Bayesian Linear Regression

Previously I wrote about performing polynomial regression and also about calculating marginal likelihoods. The data in the former and the calculations of the latter will be used here to exemplify model selection. Consider data generated by

y_{i}=b_{1}x_{i}+b_{3}x_{i}^3 + \varepsilon,

and suppose we wish to fit a polynomial of degree 3 to the data. There are then 4 regression coefficients, namely, the intercept and the three coefficients of the power of x. This yields 2^4=16 models possible models for the data. Let b1=8 and b3=-0.5 so that the data looks like this:

Model Selection Data

Model Selection Data

R Code for Marginal Likelihood Calculation

The code to generate the data and calculate the log marginal likelihood for the different models can be found here.

rm(list=ls())
x=runif(200,-10,10)
a=c(18,0,-0.5,0)
Y=a[1]*x^1+a[2]*x^2+a[3]*x^3+a[4]
Y=Y+rnorm(length(Y),0,5)
plot(x,Y)

p=4
X=cbind(x,x^2,x^3,1)
tf <- c(TRUE, FALSE)
models <- expand.grid(replicate(p,tf,simplify=FALSE))
names(models) <- NULL
models=as.matrix(models)
models=models[-dim(models)[1],]


a_0=100
b_0=0.5
mu_0=rep(0,p)
lambda_0=diag(p)


lml <- function(model){
	n=length(Y)
	Y=as.matrix(Y)
	X=as.matrix(X[,model])
	mu_0=as.matrix(mu_0[model])
	lambda_0=as.matrix(lambda_0[model,model])
        XtX=t(X)%*%X
	lambda_n=lambda_0 + XtX
	BMLE=solve(XtX)%*%t(X)%*%Y
	mu_n=solve(lambda_n)%*%(t(X)%*%Y+lambda_0%*%mu_0)
        a_n = a_0 + 0.5*n
	b_n=b_0 + 0.5*(t(Y)%*%Y + t(mu_0)%*%lambda_0%*%mu_0 - t(mu_n)%*%lambda_n%*%mu_n)
	log_mar_lik  <-  -0.5*n*log(2*pi) + 0.5*log(det(lambda_0)) - 0.5*log(det(lambda_n)) + lgamma(a_n) - lgamma(a_0) + a_0*log(b_0) - a_n*log(b_n)
	return(log_mar_lik)
}



lml.all=apply(models,1,lml)
results=cbind(lml.all, models)
order=sort(results[,1],index=TRUE,decreasing=TRUE)
results[order$ix,]

Model Selection Results

The models are listed in order of descending log marginal likelihood below:

            lml x x^2 x^3 c 
 [1,] -1342.261 1   0   1 0
 [2,] -1344.800 1   0   1 1
 [3,] -1348.514 1   1   1 0
 [4,] -1350.761 1   1   1 1
 [5,] -2182.616 0   0   1 0
 [6,] -2185.247 0   0   1 1
 [7,] -2188.961 0   1   1 0
 [8,] -2191.223 0   1   1 1
 [9,] -2394.062 1   0   0 0
[10,] -2396.100 1   0   0 1
[11,] -2398.886 1   1   0 0
[12,] -2401.119 1   1   0 1
[13,] -2482.800 0   1   0 0
[14,] -2482.810 0   0   0 1
[15,] -2484.837 0   1   0 1

The model with the highest log marginal likelihood is the model which includes x and x-cubed only, for which the MLE of the regression coefficients are 18.0305424 and -0.4987607 for x and x-cubed respectively. Compare this to how the data was generated.

One can go further and work these into posterior model probabilities. Exponentiation of the log marginal likelihood to get the marginal likelihood p(Y|\gamma) will be problematic if we use the above values because exp(-1XXX) in R will yield zero. This is called underflow and occurs when the value of a computation is smaller than the computer can store. For example, the minimum value of the exponent is -37 for a double data type in C (see float.h), so storing exp(-1000) won’t work. One can, however, simply add an arbitrary constant to the log marginal likelihoods to make exponentiation computationally pheasible. This is equivalent to multiplying the marginal likelihoods by the same arbitrary constant, which will become irrelevant when finally normalizing the results. For example, one could add -1342.261 to all the above log marginal likelihoods, take the exponential, multiply by prior model probabilities and then normalize.

\pi(\gamma|Y)\propto p(Y|\gamma)\pi(\gamma)

The normalizing constant is then
\sum_{\gamma \in \Gamma} p(Y|\gamma)\pi(\gamma)

Assuming a uniform prior on the model space, the posterior model probabilities can be computed with the following code.

results[order$ix,1]=results[order$ix,1]-results[order$ix[1],1]
results[order$ix,1]=exp(results[order$ix,1])
nconstant=sum(results[,1])
results[,1]=results[,1]/nconstant
postprob=results[order$ix,]  
postprob

which yields the following posterior model probabilities:

          postprob model      
 [1,] 0.8723308907 1 0 1 0
 [2,] 0.1246029908 1 0 1 1
 [3,] 0.0027422657 1 1 1 0
 [4,] 0.0003238528 1 1 1 1
 [5,] 0.0000000000 0 0 1 0
 [6,] 0.0000000000 0 0 1 1
 [7,] 0.0000000000 0 1 1 0
 [8,] 0.0000000000 0 1 1 1
 [9,] 0.0000000000 1 0 0 0
[10,] 0.0000000000 1 0 0 1
[11,] 0.0000000000 1 1 0 0
[12,] 0.0000000000 1 1 0 1
[13,] 0.0000000000 0 0 0 1
[14,] 0.0000000000 0 1 0 0
[15,] 0.0000000000 0 1 0 1

Leave a Reply