While R is great for statistics, like many dynamic languages, it's terribly slow in loops. This can make dynamical models, like those commonly used in epidemiology, rather slow. However, thanks to the wonderful Rcpp package, such simulations can be sped up considerably. Let's take the 'standard' susceptible-infected-recovered (SIR) model. A deterministic version of this model is as follows.
\[\begin{align} \frac{dS(t)}{dt} & = -\beta S(t) I(t) \cr \frac{dI(t)}{dt} & = \beta S(t) I(t)- \gamma I(t) \cr \frac{dR(t)}{dt} & = \gamma I(t) \end{align}\]
Let's consider a stochastic version of the SIR model.
\[ \begin{align} {\rm Transition} & \quad {\rm Rate} \cr S \rightarrow S-1,\; I \rightarrow I+1 & \quad \beta S(t) I(t) \cr I \rightarrow I-1,\; R \rightarrow R+1 & \quad \gamma I(t) \end{align}\]
We can simulate this model using the Gillespie method, also known as the stochastic simulation algorithm, or SSA. In plain R, the stochastic model is as follows.
Here is the version written using Rcpp, inlining the C++ code.
Now we run both the vanilla R and Rcpp versions of the same model 100 times, with parameter values \( \beta \) = 0.1/10000 and \( \gamma \) = 0.05, initial conditions \( S(0) \) =9999, \( I(0) \)=1, \( R(0) \)=0, and a simulation time of 1000.
As you can see, the Rcpp version is considerably faster.
\[\begin{align} \frac{dS(t)}{dt} & = -\beta S(t) I(t) \cr \frac{dI(t)}{dt} & = \beta S(t) I(t)- \gamma I(t) \cr \frac{dR(t)}{dt} & = \gamma I(t) \end{align}\]
Let's consider a stochastic version of the SIR model.
\[ \begin{align} {\rm Transition} & \quad {\rm Rate} \cr S \rightarrow S-1,\; I \rightarrow I+1 & \quad \beta S(t) I(t) \cr I \rightarrow I-1,\; R \rightarrow R+1 & \quad \gamma I(t) \end{align}\]
We can simulate this model using the Gillespie method, also known as the stochastic simulation algorithm, or SSA. In plain R, the stochastic model is as follows.
sir <- function(beta, gamma, N, S0, I0, R0, tf) {
time <- 0
S <- S0
I <- I0
R <- R0
ta <- numeric(0)
Sa <- numeric(0)
Ia <- numeric(0)
Ra <- numeric(0)
while (time < tf) {
ta <- c(ta, time)
Sa <- c(Sa, S)
Ia <- c(Ia, I)
Ra <- c(Ra, R)
pf1 <- beta * S * I
pf2 <- gamma * I
pf <- pf1 + pf2
dt <- rexp(1, rate = pf)
time <- time + dt
if (time > tf) {
break
}
ru <- runif(1)
if (ru < (pf1/pf)) {
S <- S - 1
I <- I + 1
} else {
I <- I - 1
R <- R + 1
}
if (I == 0) {
break
}
}
results <- data.frame(time = ta, S = Sa, I = Ia, R = Ra)
# print(head(results))
return(results)
}
Here is the version written using Rcpp, inlining the C++ code.
library(Rcpp)
cppFunction('
List sirc(double beta, double gamma, double N, double S0, double I0, double R0, double tf){
double t = 0;
double S = S0;
double I = I0;
double R = R0;
std::vector<double> ta;
std::vector<double> Sa;
std::vector<double> Ia;
std::vector<double> Ra;
do{
ta.push_back(t);
Sa.push_back(S);
Ia.push_back(I);
Ra.push_back(R);
double pf1 = beta*S*I;
double pf2 = gamma*I;
double pf = pf1+pf2;
double dt = rexp(1,pf)[0];
t += dt;
double r = runif(1)[0];
if(r<pf1/pf){
S--;
I++;
}else{
I--;
R++;
}
if(I==0){break;}
} while (t<=tf && (I>0));
return List::create(_["time"] = ta, _["S"] = Sa, _["I"] = Ia, _["R"]=Ra);
}'
)
Now we run both the vanilla R and Rcpp versions of the same model 100 times, with parameter values \( \beta \) = 0.1/10000 and \( \gamma \) = 0.05, initial conditions \( S(0) \) =9999, \( I(0) \)=1, \( R(0) \)=0, and a simulation time of 1000.
N <- 100 # Run 100 times
set.seed(4)
sirsim <- sum(replicate(N, system.time(x <- sir(0.1/10000, 0.05, 10000, 9999,
1, 0, 1000))["elapsed"]), trim = 0.05)
set.seed(4)
sircsim <- sum(replicate(N, system.time(x <- sirc(0.1/10000, 0.05, 10000, 9999,
1, 0, 1000))["elapsed"]), trim = 0.05)
As you can see, the Rcpp version is considerably faster.
paste("Plain R:", sirsim)
## [1] "Plain R: 114.178"
paste("Rcpp version:", sircsim)
## [1] "Rcpp version: 0.175000000000028"
paste("R took", sirsim/sircsim, "times longer")
## [1] "R took 652.445714285608 times longer"
No comments:
Post a Comment