This post is based on an earlier (excellent) one by Darren Wilkinson here. Some material bears repeating, however, and I've tried to make this example a little more applicable to problems such as estimating (exponential) rates in epidemiological models.
Consider the problem of fitting a set of \( n=100 \) exponential random variables with mean \( \lambda=3 \).
set.seed(1)
lambd <- 3
n <- 100
x <- rexp(n, lambd)
If we assume a Gamma(\( \alpha \),\( \beta \)) prior on the rate coefficient, the posterior distribution of \( \lambda \) is Gamma(\( \alpha+n \),\( \beta+n \hat{x} \)), as this is the conjugate prior for an exponential distribution. However, imagine that we don't know this, and fit instead using a standard random walk Metropolis sampler. The problem is that \( \lambda>0 \), yet the random walk sampler can in principle generate negative proposals for \( \lambda \). One way to deal with this is to simply reject negative proposals, keeping the old values. The following function takes the data \( x \), an initial value of \( \lambda \), shape and rate parameters of the prior distribution, \( \alpha \) and \( \beta \) respectively, the standard deviation to be used in the random walk proposal, \( \sigma \), and the number of iterations to sample.
met <- function(x, lambda0, alph, bet, sigma, iters) {
lambdvec <- numeric(iters)
lambd <- lambda0
ll <- sum(dexp(x, lambd, log = TRUE))
lp <- dgamma(lambd, shape = alph, rate = bet, log = TRUE)
lpost <- ll + lp
for (i in 1:iters) {
lambds <- lambd + rnorm(1, sd = sigma)
if (lambds > 0) {
lls <- sum(dexp(x, lambds, log = TRUE))
lps <- dgamma(lambds, shape = alph, rate = bet, log = TRUE)
lposts <- lls + lps
A <- exp(lposts - lpost)
if (runif(1) < A) {
lambd <- lambds
ll <- lls
lp <- lps
lpost <- lposts
}
}
lambdvec[i] <- lambd
}
return(lambdvec)
}
Now we assume values for the prior distibution that are 'vague', and fit our data \( x \) using 1,000,000 iterations.
lambda0 <- 1
alph <- 0.01
bet <- 0.01
sigma <- 1
burnin <- 10000
niters <- 1e+06
mux <- mean(x)
out <- met(x, lambda0, alph, bet, sigma, burnin + niters)
If we plot the MCMC runs against the true posterior, we have a good fit.
hist(out[(burnin + 1):(burnin + niters)], 100, freq = FALSE, main = "met", xlab = expression(lambda),
col = "gray")
curve(dgamma(x, shape = alph + n, rate = bet + n * mux), add = TRUE, col = 2,
lwd = 2)
Another way to ensure that negative proposals for \( \lambda \) are not generated is to sample \( log(\lambda) \). However, a naive Metropolis sampler similar to the one above (but without the need to reject negative proposals) does not give the correct answer.
metl.incorrect <- function(x, lambda0, alph, bet, sigma, iters) {
llambdvec <- numeric(iters)
llambd <- log(lambda0)
ll <- sum(dexp(x, exp(llambd), log = TRUE))
lp <- dgamma(exp(llambd), shape = alph, rate = bet, log = TRUE)
lpost <- ll + lp
for (i in 1:iters) {
llambds <- llambd + rnorm(1, sd = sigma)
lls <- sum(dexp(x, exp(llambds), log = TRUE))
lps <- dgamma(exp(llambds), shape = alph, rate = bet, log = TRUE)
lposts <- lls + lps
A <- exp(lposts - lpost)
if (runif(1) < A) {
llambd <- llambds
ll <- lls
lp <- lps
lpost <- lposts
}
llambdvec[i] <- llambd
}
return(llambdvec)
}
The posterior from this distribution is close, but not the same, as the true posterior.
lsigma <- 0.1
outl.incorrect <- metl.incorrect(x, lambda0, alph, bet, lsigma, burnin + niters)
hist(exp(outl.incorrect)[(burnin + 1):(burnin + niters)], 100, freq = FALSE,
main = "metl.incorrect", xlab = expression(lambda), col = "gray")
curve(dgamma(x, shape = alph + n, rate = bet + n * mux), add = TRUE, col = 2,
lwd = 2)
In order to correctly sample from the posterior, we need to account for the change of variable. For this particular problem, this can be implemented by adding a term \( log(\lambda) \) to the (unnormalised) log posterior (see here for more information). If \( u \) is our original variable and \( v \) is our transformed variable, we have to multiply the posterior by a term \( \frac{du}{dv} \), which as the following shows, is just \( u \) for a log transformation.
Here is the correct sampler, with the additional term (referred to as lj
for the log Jacobian) added in.
metl.correct <- function(x, lambda0, alph, bet, sigma, iters) {
llambdvec <- numeric(iters)
llambd <- log(lambda0)
ll <- sum(dexp(x, exp(llambd), log = TRUE))
lj <- llambd
lp <- dgamma(exp(llambd), shape = alph, rate = bet, log = TRUE) + lj
lpost <- ll + lp
for (i in 1:iters) {
llambds <- llambd + rnorm(1, sd = sigma)
lls <- sum(dexp(x, exp(llambds), log = TRUE))
ljs <- llambds
lps <- dgamma(exp(llambds), shape = alph, rate = bet, log = TRUE) +
ljs
lposts <- lls + lps
A <- exp(lposts - lpost)
if (runif(1) < A) {
llambd <- llambds
ll <- lls
lp <- lps
lpost <- lposts
}
llambdvec[i] <- llambd
}
return(llambdvec)
}
Now, we get the right answer using the reparameterisation.
outl.correct <- metl.correct(x, lambda0, alph, bet, lsigma, burnin + niters)
hist(exp(outl.correct)[(burnin + 1):(burnin + niters)], 100, freq = FALSE, main = "metl.correct",
xlab = expression(lambda), col = "gray")
curve(dgamma(x, shape = alph + n, rate = bet + n * mux), add = TRUE, col = 2,
lwd = 2)
In another post, I'll talk about Hastings ratios and asymmetric proposal distributions.