Friday, March 7, 2014

MCMC, C and Julia

A while ago in his blog, Darren Wilkinson compared the performance of a simple Gibbs sampler algorithm implemented in different computer languages. He sounded quite excited about Scala, as it has a nice syntax, and the performance isn't bad either. That being said, the C implementation ran the fastest. In this post, I compare the C implementation with one in Julia.

The sampler

The idea was to construct a Gibbs sampler for the bivariate distribution:

\[ \begin{align} f(x,y) & =kx^2 exp{-xy^2-y^2+2y-4x},\;x>0,y \in \mathbb{R} \end{align} \]

with unknown normalising constant \( k \).

The full conditionals are:

\[ \begin{align} x|y & \sim Ga(3,y^2+4) \cr y|x & \sim N\left(\frac{1}{1+x},\frac{1}{2\left(1+x\right)}\right) \end{align} \]

In C

Interfacing with C code is fairly straightforward in Julia. First, we need to write a C library that can be dynamically loaded.

#include <stdio.h>
#include <math.h>
#include <stdlib.h>
#include <gsl/gsl_rng.h>
#include <gsl/gsl_randist.h>

/* Arrays xa and ya are passed by reference */
void gibbs(int N, int thin, double **xa, double **ya)
{
  int i,j;
  gsl_rng *r = gsl_rng_alloc(gsl_rng_mt19937);
  double x=0;
  double y=0;
  for (i=0;i<N;i++) {
    for (j=0;j<thin;j++) {
      x=gsl_ran_gamma(r,3.0,1.0/(y*y+4));
      y=1.0/(x+1)+gsl_ran_gaussian(r,1.0/sqrt(2*x+2));
    }
    (*xa)[i]=x; /* Note how the arrays are dereferenced */
    (*ya)[i]=y;
  }
  free(r);
}

The idea here is that we allocate vectors to store x and y in Julia, then pass them to the C function, which fills in the values, hence all the ugliness in the above with passing arrays by reference (the double pointers in the function header), and the dereferencing to fill in each value.

On my Linux box, I compile the above as follows.

gcc -O3 -fpic -c libgibbs.c -o libgibbs.o -I/usr/local/include
gcc -shared -o libgibbs.so libgibbs.o -L/usr/local/lib -lgsl -lgslcblas -lm

On my OSX box, where gsl is installed via Homebrew, I can skip the include and library directory flags.

This can then be loaded into Julia.

N=50000
thin=1000
xac=Array(Float64,N)
yac=Array(Float64,N)
@elapsed ccall( (:gibbs, "./libgibbs.so"), Void, (Int64,Int64,Ptr{Ptr{Cdouble}},Ptr{Ptr{Cdouble}}), 50000, 1000, &xac, &yac)

This does well, and on my machine, it finishes in about 6.5 seconds.

In Julia

function gibbs(N::Int64,thin::Int64)
  xa=Array(Float64,N)
  ya=Array(Float64,N)
  x=0.0
  y=0.0
  for i in 1:N
    for j in 1:thin
      x=rand(Gamma(3.0,1.0/(y*y+4.0)))
      y=rand(Normal(1.0/(x+1.0),1.0/sqrt(2.0*x+2.0)))
    end
    xa[i]=x
    ya[i]=y
  end
  return(xa,ya)
end

We can run this as follows.

@elapsed xaj,yaj=gibbs(50000,1000)

Oddly enough, this runs in less time than the C version, at about 4.3 seconds on my Mac. In addition, the code avoids all the memory management fiddliness of the C version.


Monday, December 2, 2013

Metropolis sampling of positive values in R using rejection or transformation

This post is based on an earlier (excellent) one by Darren Wilkinson here. Some material bears repeating, however, and I've tried to make this example a little more applicable to problems such as estimating (exponential) rates in epidemiological models.

Consider the problem of fitting a set of \( n=100 \) exponential random variables with mean \( \lambda=3 \).

set.seed(1)
lambd <- 3
n <- 100
x <- rexp(n, lambd)

If we assume a Gamma(\( \alpha \),\( \beta \)) prior on the rate coefficient, the posterior distribution of \( \lambda \) is Gamma(\( \alpha+n \),\( \beta+n \hat{x} \)), as this is the conjugate prior for an exponential distribution. However, imagine that we don't know this, and fit instead using a standard random walk Metropolis sampler. The problem is that \( \lambda>0 \), yet the random walk sampler can in principle generate negative proposals for \( \lambda \). One way to deal with this is to simply reject negative proposals, keeping the old values. The following function takes the data \( x \), an initial value of \( \lambda \), shape and rate parameters of the prior distribution, \( \alpha \) and \( \beta \) respectively, the standard deviation to be used in the random walk proposal, \( \sigma \), and the number of iterations to sample.

met <- function(x, lambda0, alph, bet, sigma, iters) {
    lambdvec <- numeric(iters)
    lambd <- lambda0
    ll <- sum(dexp(x, lambd, log = TRUE))
    lp <- dgamma(lambd, shape = alph, rate = bet, log = TRUE)
    lpost <- ll + lp
    for (i in 1:iters) {
        lambds <- lambd + rnorm(1, sd = sigma)
        if (lambds > 0) {
            lls <- sum(dexp(x, lambds, log = TRUE))
            lps <- dgamma(lambds, shape = alph, rate = bet, log = TRUE)
            lposts <- lls + lps
            A <- exp(lposts - lpost)
            if (runif(1) < A) {
                lambd <- lambds
                ll <- lls
                lp <- lps
                lpost <- lposts
            }
        }
        lambdvec[i] <- lambd
    }
    return(lambdvec)
}

Now we assume values for the prior distibution that are 'vague', and fit our data \( x \) using 1,000,000 iterations.

lambda0 <- 1
alph <- 0.01
bet <- 0.01
sigma <- 1
burnin <- 10000
niters <- 1e+06
mux <- mean(x)
out <- met(x, lambda0, alph, bet, sigma, burnin + niters)

If we plot the MCMC runs against the true posterior, we have a good fit.

hist(out[(burnin + 1):(burnin + niters)], 100, freq = FALSE, main = "met", xlab = expression(lambda), 
    col = "gray")
curve(dgamma(x, shape = alph + n, rate = bet + n * mux), add = TRUE, col = 2, 
    lwd = 2)

plot of chunk unnamed-chunk-4

Another way to ensure that negative proposals for \( \lambda \) are not generated is to sample \( log(\lambda) \). However, a naive Metropolis sampler similar to the one above (but without the need to reject negative proposals) does not give the correct answer.

metl.incorrect <- function(x, lambda0, alph, bet, sigma, iters) {
    llambdvec <- numeric(iters)
    llambd <- log(lambda0)
    ll <- sum(dexp(x, exp(llambd), log = TRUE))
    lp <- dgamma(exp(llambd), shape = alph, rate = bet, log = TRUE)
    lpost <- ll + lp
    for (i in 1:iters) {
        llambds <- llambd + rnorm(1, sd = sigma)
        lls <- sum(dexp(x, exp(llambds), log = TRUE))
        lps <- dgamma(exp(llambds), shape = alph, rate = bet, log = TRUE)
        lposts <- lls + lps
        A <- exp(lposts - lpost)
        if (runif(1) < A) {
            llambd <- llambds
            ll <- lls
            lp <- lps
            lpost <- lposts
        }
        llambdvec[i] <- llambd
    }
    return(llambdvec)
}

The posterior from this distribution is close, but not the same, as the true posterior.

lsigma <- 0.1
outl.incorrect <- metl.incorrect(x, lambda0, alph, bet, lsigma, burnin + niters)
hist(exp(outl.incorrect)[(burnin + 1):(burnin + niters)], 100, freq = FALSE, 
    main = "metl.incorrect", xlab = expression(lambda), col = "gray")
curve(dgamma(x, shape = alph + n, rate = bet + n * mux), add = TRUE, col = 2, 
    lwd = 2)

plot of chunk unnamed-chunk-6

In order to correctly sample from the posterior, we need to account for the change of variable. For this particular problem, this can be implemented by adding a term \( log(\lambda) \) to the (unnormalised) log posterior (see here for more information). If \( u \) is our original variable and \( v \) is our transformed variable, we have to multiply the posterior by a term \( \frac{du}{dv} \), which as the following shows, is just \( u \) for a log transformation.

\begin{align} v & = log(u) \cr u & = e^v \cr \frac{du}{dv} & = e^v \cr & = u \end{align}

Here is the correct sampler, with the additional term (referred to as lj for the log Jacobian) added in.

metl.correct <- function(x, lambda0, alph, bet, sigma, iters) {
    llambdvec <- numeric(iters)
    llambd <- log(lambda0)
    ll <- sum(dexp(x, exp(llambd), log = TRUE))
    lj <- llambd
    lp <- dgamma(exp(llambd), shape = alph, rate = bet, log = TRUE) + lj
    lpost <- ll + lp
    for (i in 1:iters) {
        llambds <- llambd + rnorm(1, sd = sigma)
        lls <- sum(dexp(x, exp(llambds), log = TRUE))
        ljs <- llambds
        lps <- dgamma(exp(llambds), shape = alph, rate = bet, log = TRUE) + 
            ljs
        lposts <- lls + lps
        A <- exp(lposts - lpost)
        if (runif(1) < A) {
            llambd <- llambds
            ll <- lls
            lp <- lps
            lpost <- lposts
        }
        llambdvec[i] <- llambd
    }
    return(llambdvec)
}

Now, we get the right answer using the reparameterisation.

outl.correct <- metl.correct(x, lambda0, alph, bet, lsigma, burnin + niters)
hist(exp(outl.correct)[(burnin + 1):(burnin + niters)], 100, freq = FALSE, main = "metl.correct", 
    xlab = expression(lambda), col = "gray")
curve(dgamma(x, shape = alph + n, rate = bet + n * mux), add = TRUE, col = 2, 
    lwd = 2)

plot of chunk unnamed-chunk-8

In another post, I'll talk about Hastings ratios and asymmetric proposal distributions.