Monday, December 2, 2013

Metropolis sampling of positive values in R using rejection or transformation

This post is based on an earlier (excellent) one by Darren Wilkinson here. Some material bears repeating, however, and I've tried to make this example a little more applicable to problems such as estimating (exponential) rates in epidemiological models.

Consider the problem of fitting a set of \( n=100 \) exponential random variables with mean \( \lambda=3 \).

set.seed(1)
lambd <- 3
n <- 100
x <- rexp(n, lambd)

If we assume a Gamma(\( \alpha \),\( \beta \)) prior on the rate coefficient, the posterior distribution of \( \lambda \) is Gamma(\( \alpha+n \),\( \beta+n \hat{x} \)), as this is the conjugate prior for an exponential distribution. However, imagine that we don't know this, and fit instead using a standard random walk Metropolis sampler. The problem is that \( \lambda>0 \), yet the random walk sampler can in principle generate negative proposals for \( \lambda \). One way to deal with this is to simply reject negative proposals, keeping the old values. The following function takes the data \( x \), an initial value of \( \lambda \), shape and rate parameters of the prior distribution, \( \alpha \) and \( \beta \) respectively, the standard deviation to be used in the random walk proposal, \( \sigma \), and the number of iterations to sample.

met <- function(x, lambda0, alph, bet, sigma, iters) {
    lambdvec <- numeric(iters)
    lambd <- lambda0
    ll <- sum(dexp(x, lambd, log = TRUE))
    lp <- dgamma(lambd, shape = alph, rate = bet, log = TRUE)
    lpost <- ll + lp
    for (i in 1:iters) {
        lambds <- lambd + rnorm(1, sd = sigma)
        if (lambds > 0) {
            lls <- sum(dexp(x, lambds, log = TRUE))
            lps <- dgamma(lambds, shape = alph, rate = bet, log = TRUE)
            lposts <- lls + lps
            A <- exp(lposts - lpost)
            if (runif(1) < A) {
                lambd <- lambds
                ll <- lls
                lp <- lps
                lpost <- lposts
            }
        }
        lambdvec[i] <- lambd
    }
    return(lambdvec)
}

Now we assume values for the prior distibution that are 'vague', and fit our data \( x \) using 1,000,000 iterations.

lambda0 <- 1
alph <- 0.01
bet <- 0.01
sigma <- 1
burnin <- 10000
niters <- 1e+06
mux <- mean(x)
out <- met(x, lambda0, alph, bet, sigma, burnin + niters)

If we plot the MCMC runs against the true posterior, we have a good fit.

hist(out[(burnin + 1):(burnin + niters)], 100, freq = FALSE, main = "met", xlab = expression(lambda), 
    col = "gray")
curve(dgamma(x, shape = alph + n, rate = bet + n * mux), add = TRUE, col = 2, 
    lwd = 2)

plot of chunk unnamed-chunk-4

Another way to ensure that negative proposals for \( \lambda \) are not generated is to sample \( log(\lambda) \). However, a naive Metropolis sampler similar to the one above (but without the need to reject negative proposals) does not give the correct answer.

metl.incorrect <- function(x, lambda0, alph, bet, sigma, iters) {
    llambdvec <- numeric(iters)
    llambd <- log(lambda0)
    ll <- sum(dexp(x, exp(llambd), log = TRUE))
    lp <- dgamma(exp(llambd), shape = alph, rate = bet, log = TRUE)
    lpost <- ll + lp
    for (i in 1:iters) {
        llambds <- llambd + rnorm(1, sd = sigma)
        lls <- sum(dexp(x, exp(llambds), log = TRUE))
        lps <- dgamma(exp(llambds), shape = alph, rate = bet, log = TRUE)
        lposts <- lls + lps
        A <- exp(lposts - lpost)
        if (runif(1) < A) {
            llambd <- llambds
            ll <- lls
            lp <- lps
            lpost <- lposts
        }
        llambdvec[i] <- llambd
    }
    return(llambdvec)
}

The posterior from this distribution is close, but not the same, as the true posterior.

lsigma <- 0.1
outl.incorrect <- metl.incorrect(x, lambda0, alph, bet, lsigma, burnin + niters)
hist(exp(outl.incorrect)[(burnin + 1):(burnin + niters)], 100, freq = FALSE, 
    main = "metl.incorrect", xlab = expression(lambda), col = "gray")
curve(dgamma(x, shape = alph + n, rate = bet + n * mux), add = TRUE, col = 2, 
    lwd = 2)

plot of chunk unnamed-chunk-6

In order to correctly sample from the posterior, we need to account for the change of variable. For this particular problem, this can be implemented by adding a term \( log(\lambda) \) to the (unnormalised) log posterior (see here for more information). If \( u \) is our original variable and \( v \) is our transformed variable, we have to multiply the posterior by a term \( \frac{du}{dv} \), which as the following shows, is just \( u \) for a log transformation.

\begin{align} v & = log(u) \cr u & = e^v \cr \frac{du}{dv} & = e^v \cr & = u \end{align}

Here is the correct sampler, with the additional term (referred to as lj for the log Jacobian) added in.

metl.correct <- function(x, lambda0, alph, bet, sigma, iters) {
    llambdvec <- numeric(iters)
    llambd <- log(lambda0)
    ll <- sum(dexp(x, exp(llambd), log = TRUE))
    lj <- llambd
    lp <- dgamma(exp(llambd), shape = alph, rate = bet, log = TRUE) + lj
    lpost <- ll + lp
    for (i in 1:iters) {
        llambds <- llambd + rnorm(1, sd = sigma)
        lls <- sum(dexp(x, exp(llambds), log = TRUE))
        ljs <- llambds
        lps <- dgamma(exp(llambds), shape = alph, rate = bet, log = TRUE) + 
            ljs
        lposts <- lls + lps
        A <- exp(lposts - lpost)
        if (runif(1) < A) {
            llambd <- llambds
            ll <- lls
            lp <- lps
            lpost <- lposts
        }
        llambdvec[i] <- llambd
    }
    return(llambdvec)
}

Now, we get the right answer using the reparameterisation.

outl.correct <- metl.correct(x, lambda0, alph, bet, lsigma, burnin + niters)
hist(exp(outl.correct)[(burnin + 1):(burnin + niters)], 100, freq = FALSE, main = "metl.correct", 
    xlab = expression(lambda), col = "gray")
curve(dgamma(x, shape = alph + n, rate = bet + n * mux), add = TRUE, col = 2, 
    lwd = 2)

plot of chunk unnamed-chunk-8

In another post, I'll talk about Hastings ratios and asymmetric proposal distributions.