This post is based on an earlier (excellent) one by Darren Wilkinson here. Some material bears repeating, however, and I've tried to make this example a little more applicable to problems such as estimating (exponential) rates in epidemiological models.
Consider the problem of fitting a set of n=100 exponential random variables with mean \lambda=3 .
set.seed(1)
lambd <- 3
n <- 100
x <- rexp(n, lambd)
If we assume a Gamma( \alpha , \beta ) prior on the rate coefficient, the posterior distribution of \lambda is Gamma( \alpha+n , \beta+n \hat{x} ), as this is the conjugate prior for an exponential distribution. However, imagine that we don't know this, and fit instead using a standard random walk Metropolis sampler. The problem is that \lambda>0 , yet the random walk sampler can in principle generate negative proposals for \lambda . One way to deal with this is to simply reject negative proposals, keeping the old values. The following function takes the data x , an initial value of \lambda , shape and rate parameters of the prior distribution, \alpha and \beta respectively, the standard deviation to be used in the random walk proposal, \sigma , and the number of iterations to sample.
met <- function(x, lambda0, alph, bet, sigma, iters) {
lambdvec <- numeric(iters)
lambd <- lambda0
ll <- sum(dexp(x, lambd, log = TRUE))
lp <- dgamma(lambd, shape = alph, rate = bet, log = TRUE)
lpost <- ll + lp
for (i in 1:iters) {
lambds <- lambd + rnorm(1, sd = sigma)
if (lambds > 0) {
lls <- sum(dexp(x, lambds, log = TRUE))
lps <- dgamma(lambds, shape = alph, rate = bet, log = TRUE)
lposts <- lls + lps
A <- exp(lposts - lpost)
if (runif(1) < A) {
lambd <- lambds
ll <- lls
lp <- lps
lpost <- lposts
}
}
lambdvec[i] <- lambd
}
return(lambdvec)
}
Now we assume values for the prior distibution that are 'vague', and fit our data x using 1,000,000 iterations.
lambda0 <- 1
alph <- 0.01
bet <- 0.01
sigma <- 1
burnin <- 10000
niters <- 1e+06
mux <- mean(x)
out <- met(x, lambda0, alph, bet, sigma, burnin + niters)
If we plot the MCMC runs against the true posterior, we have a good fit.
hist(out[(burnin + 1):(burnin + niters)], 100, freq = FALSE, main = "met", xlab = expression(lambda),
col = "gray")
curve(dgamma(x, shape = alph + n, rate = bet + n * mux), add = TRUE, col = 2,
lwd = 2)
Another way to ensure that negative proposals for \lambda are not generated is to sample log(\lambda) . However, a naive Metropolis sampler similar to the one above (but without the need to reject negative proposals) does not give the correct answer.
metl.incorrect <- function(x, lambda0, alph, bet, sigma, iters) {
llambdvec <- numeric(iters)
llambd <- log(lambda0)
ll <- sum(dexp(x, exp(llambd), log = TRUE))
lp <- dgamma(exp(llambd), shape = alph, rate = bet, log = TRUE)
lpost <- ll + lp
for (i in 1:iters) {
llambds <- llambd + rnorm(1, sd = sigma)
lls <- sum(dexp(x, exp(llambds), log = TRUE))
lps <- dgamma(exp(llambds), shape = alph, rate = bet, log = TRUE)
lposts <- lls + lps
A <- exp(lposts - lpost)
if (runif(1) < A) {
llambd <- llambds
ll <- lls
lp <- lps
lpost <- lposts
}
llambdvec[i] <- llambd
}
return(llambdvec)
}
The posterior from this distribution is close, but not the same, as the true posterior.
lsigma <- 0.1
outl.incorrect <- metl.incorrect(x, lambda0, alph, bet, lsigma, burnin + niters)
hist(exp(outl.incorrect)[(burnin + 1):(burnin + niters)], 100, freq = FALSE,
main = "metl.incorrect", xlab = expression(lambda), col = "gray")
curve(dgamma(x, shape = alph + n, rate = bet + n * mux), add = TRUE, col = 2,
lwd = 2)
In order to correctly sample from the posterior, we need to account for the change of variable. For this particular problem, this can be implemented by adding a term log(\lambda) to the (unnormalised) log posterior (see here for more information). If u is our original variable and v is our transformed variable, we have to multiply the posterior by a term \frac{du}{dv} , which as the following shows, is just u for a log transformation.
Here is the correct sampler, with the additional term (referred to as lj
for the log Jacobian) added in.
metl.correct <- function(x, lambda0, alph, bet, sigma, iters) {
llambdvec <- numeric(iters)
llambd <- log(lambda0)
ll <- sum(dexp(x, exp(llambd), log = TRUE))
lj <- llambd
lp <- dgamma(exp(llambd), shape = alph, rate = bet, log = TRUE) + lj
lpost <- ll + lp
for (i in 1:iters) {
llambds <- llambd + rnorm(1, sd = sigma)
lls <- sum(dexp(x, exp(llambds), log = TRUE))
ljs <- llambds
lps <- dgamma(exp(llambds), shape = alph, rate = bet, log = TRUE) +
ljs
lposts <- lls + lps
A <- exp(lposts - lpost)
if (runif(1) < A) {
llambd <- llambds
ll <- lls
lp <- lps
lpost <- lposts
}
llambdvec[i] <- llambd
}
return(llambdvec)
}
Now, we get the right answer using the reparameterisation.
outl.correct <- metl.correct(x, lambda0, alph, bet, lsigma, burnin + niters)
hist(exp(outl.correct)[(burnin + 1):(burnin + niters)], 100, freq = FALSE, main = "metl.correct",
xlab = expression(lambda), col = "gray")
curve(dgamma(x, shape = alph + n, rate = bet + n * mux), add = TRUE, col = 2,
lwd = 2)
In another post, I'll talk about Hastings ratios and asymmetric proposal distributions.