Processing math: 100%

Monday, December 2, 2013

Metropolis sampling of positive values in R using rejection or transformation

This post is based on an earlier (excellent) one by Darren Wilkinson here. Some material bears repeating, however, and I've tried to make this example a little more applicable to problems such as estimating (exponential) rates in epidemiological models.

Consider the problem of fitting a set of n=100 exponential random variables with mean \lambda=3 .

set.seed(1)
lambd <- 3
n <- 100
x <- rexp(n, lambd)

If we assume a Gamma( \alpha , \beta ) prior on the rate coefficient, the posterior distribution of \lambda is Gamma( \alpha+n , \beta+n \hat{x} ), as this is the conjugate prior for an exponential distribution. However, imagine that we don't know this, and fit instead using a standard random walk Metropolis sampler. The problem is that \lambda>0 , yet the random walk sampler can in principle generate negative proposals for \lambda . One way to deal with this is to simply reject negative proposals, keeping the old values. The following function takes the data x , an initial value of \lambda , shape and rate parameters of the prior distribution, \alpha and \beta respectively, the standard deviation to be used in the random walk proposal, \sigma , and the number of iterations to sample.

met <- function(x, lambda0, alph, bet, sigma, iters) {
    lambdvec <- numeric(iters)
    lambd <- lambda0
    ll <- sum(dexp(x, lambd, log = TRUE))
    lp <- dgamma(lambd, shape = alph, rate = bet, log = TRUE)
    lpost <- ll + lp
    for (i in 1:iters) {
        lambds <- lambd + rnorm(1, sd = sigma)
        if (lambds > 0) {
            lls <- sum(dexp(x, lambds, log = TRUE))
            lps <- dgamma(lambds, shape = alph, rate = bet, log = TRUE)
            lposts <- lls + lps
            A <- exp(lposts - lpost)
            if (runif(1) < A) {
                lambd <- lambds
                ll <- lls
                lp <- lps
                lpost <- lposts
            }
        }
        lambdvec[i] <- lambd
    }
    return(lambdvec)
}

Now we assume values for the prior distibution that are 'vague', and fit our data x using 1,000,000 iterations.

lambda0 <- 1
alph <- 0.01
bet <- 0.01
sigma <- 1
burnin <- 10000
niters <- 1e+06
mux <- mean(x)
out <- met(x, lambda0, alph, bet, sigma, burnin + niters)

If we plot the MCMC runs against the true posterior, we have a good fit.

hist(out[(burnin + 1):(burnin + niters)], 100, freq = FALSE, main = "met", xlab = expression(lambda), 
    col = "gray")
curve(dgamma(x, shape = alph + n, rate = bet + n * mux), add = TRUE, col = 2, 
    lwd = 2)

plot of chunk unnamed-chunk-4

Another way to ensure that negative proposals for \lambda are not generated is to sample log(\lambda) . However, a naive Metropolis sampler similar to the one above (but without the need to reject negative proposals) does not give the correct answer.

metl.incorrect <- function(x, lambda0, alph, bet, sigma, iters) {
    llambdvec <- numeric(iters)
    llambd <- log(lambda0)
    ll <- sum(dexp(x, exp(llambd), log = TRUE))
    lp <- dgamma(exp(llambd), shape = alph, rate = bet, log = TRUE)
    lpost <- ll + lp
    for (i in 1:iters) {
        llambds <- llambd + rnorm(1, sd = sigma)
        lls <- sum(dexp(x, exp(llambds), log = TRUE))
        lps <- dgamma(exp(llambds), shape = alph, rate = bet, log = TRUE)
        lposts <- lls + lps
        A <- exp(lposts - lpost)
        if (runif(1) < A) {
            llambd <- llambds
            ll <- lls
            lp <- lps
            lpost <- lposts
        }
        llambdvec[i] <- llambd
    }
    return(llambdvec)
}

The posterior from this distribution is close, but not the same, as the true posterior.

lsigma <- 0.1
outl.incorrect <- metl.incorrect(x, lambda0, alph, bet, lsigma, burnin + niters)
hist(exp(outl.incorrect)[(burnin + 1):(burnin + niters)], 100, freq = FALSE, 
    main = "metl.incorrect", xlab = expression(lambda), col = "gray")
curve(dgamma(x, shape = alph + n, rate = bet + n * mux), add = TRUE, col = 2, 
    lwd = 2)

plot of chunk unnamed-chunk-6

In order to correctly sample from the posterior, we need to account for the change of variable. For this particular problem, this can be implemented by adding a term log(\lambda) to the (unnormalised) log posterior (see here for more information). If u is our original variable and v is our transformed variable, we have to multiply the posterior by a term \frac{du}{dv} , which as the following shows, is just u for a log transformation.

\begin{align} v & = log(u) \cr u & = e^v \cr \frac{du}{dv} & = e^v \cr & = u \end{align}

Here is the correct sampler, with the additional term (referred to as lj for the log Jacobian) added in.

metl.correct <- function(x, lambda0, alph, bet, sigma, iters) {
    llambdvec <- numeric(iters)
    llambd <- log(lambda0)
    ll <- sum(dexp(x, exp(llambd), log = TRUE))
    lj <- llambd
    lp <- dgamma(exp(llambd), shape = alph, rate = bet, log = TRUE) + lj
    lpost <- ll + lp
    for (i in 1:iters) {
        llambds <- llambd + rnorm(1, sd = sigma)
        lls <- sum(dexp(x, exp(llambds), log = TRUE))
        ljs <- llambds
        lps <- dgamma(exp(llambds), shape = alph, rate = bet, log = TRUE) + 
            ljs
        lposts <- lls + lps
        A <- exp(lposts - lpost)
        if (runif(1) < A) {
            llambd <- llambds
            ll <- lls
            lp <- lps
            lpost <- lposts
        }
        llambdvec[i] <- llambd
    }
    return(llambdvec)
}

Now, we get the right answer using the reparameterisation.

outl.correct <- metl.correct(x, lambda0, alph, bet, lsigma, burnin + niters)
hist(exp(outl.correct)[(burnin + 1):(burnin + niters)], 100, freq = FALSE, main = "metl.correct", 
    xlab = expression(lambda), col = "gray")
curve(dgamma(x, shape = alph + n, rate = bet + n * mux), add = TRUE, col = 2, 
    lwd = 2)

plot of chunk unnamed-chunk-8

In another post, I'll talk about Hastings ratios and asymmetric proposal distributions.