I often find the need to plot out probability distributions of parameters, for example posterior distributions. There are lots of ways to do this in Julia, and a clear 'winner' for plotting has yet to be established (although there's always room for implementations). Here, I outline some of the ways to generate probability histograms.
To generate the data, I first need to load Distributions
and DataFrames
.
using Distributions
using DataFrames;
There doesn't appear to be a function in the standard library to normalise histograms, so here is a function to calculate the total area under the curve for a histogram, using the tuple returned by the hist
function.
function auc(h::(Range{Float64},Array{Int64,1}))
freq=convert(Array{Float64,1},h[2])
e=convert(Array{Float64,1},h[1])
numbins=length(e)
deltax=e[2:numbins]-e[1:(numbins-1)]
sum(deltax.*freq)
end;
I now generate some random data from a $Gamma(3,1)$ distribution.
srand(1)
x=rand(Distributions.Gamma(3,1),10000)
xhist=hist(x,100)
nbins=length(xhist[1])
xauc=auc(xhist)
xvec=vec([z::Float64 for z in xhist[1]])
xdf=DataFrame(xmin=vec(xvec[1:(nbins-1)]),xmax=vec(xvec[2:nbins]),count=xhist[2])
xdf[:dens]=xdf[:count]/xauc;
Here's what the data look like.
xdf
I generate a dataframe containing points from the PDF, which I will use to compare the random values.
xp=DataFrame(x=linspace(0,15,1501),dens=pdf(Distributions.Gamma(3,1),linspace(0,15,1501)))
xp
PyPlot
As I use IJulia a lot, the most straightforward place to start is to use PyPlot, which is just an interface to Python's matplotlib. PyPlot has an option to normalise the histogram, so we pass the raw data, and PyPlot generates the histogram for us.
using PyPlot
PyPlot.plt.hist(x,nbins,normed="True")
PyPlot.plt.plot(xp[:x],xp[:dens])
Gadfly
Gadfly is one of the more polished plotting packages out there, and takes many of its cues from Hadley Wickham's excellent ggplot2 library. However, it seems that there isn't an option to normalise a histogram, so I use a barplot of the pre-computed densities instead. To get different colours for the histogram and the density plot, I use Themes.
using Gadfly
Gadfly.plot(layer(x=xdf[:xmin],y=xdf[:dens],Geom.bar,Theme(default_color=color("lightgray"))),
layer(x=xp[:x],y=xp[:dens],Geom.line,Theme(default_color=color("orange"))))
Winston
Winston is another nice plotting library, which is very easy to use. It can take the output of the hist
command, as well as easily add a curve on top.
using Winston
p = Winston.FramedPlot(title="", xlabel="x", ylabel="Density")
xhist2=(xhist[1],xhist[2]./xauc)
Winston.add(p, Winston.Histogram(xhist2...))
Winston.add(p, Winston.Curve(xp[:x],xp[:dens],color="orange"))
Plotly
Plotly is an online service that generates some nice plots; one will need an account and an API key to be able to use it, however (fill in yours below). To install, you need to run Pkg.clone("https://github.com/plotly/Plotly.jl")
. I also had to edit the URL, as I run my IJulia notebook over HTTPS, and Chrome doesn't like insecure content loaded from a secure site.
using Plotly
username=""
api_key=""
Plotly.signin(username, api_key)
data0 = ["x" => x, "type" => "histogramx", "histnorm" => "probability density"]
data1 = ["x" => xp["x"],"y" => xp["dens"], "type" => "scatter"]
layout = {
"title"=> "PDF",
"autosize"=> "false",
"width"=> 700,
"height"=> 700,
"titlefont"=>
{
"family"=> "Open Sans",
"size"=> 18,
"color"=> "rgb(84, 39, 143)"
},
"margin"=> {"l"=>160, "pad"=>0},
"xaxis"=> {
"title"=> "x"
},
"yaxis"=> {"title"=> "Density"},
"showlegend" => false
};
response = Plotly.plot([data0, data1],["layout"=>layout])
url = response["url"]
url = replace(url,"http","https")
s = string("<iframe height='750' id='igraph' scrolling='no' seamless='seamless' src='",
url,
"/700/700' width='750'></iframe>")
display("text/html", s);
Conclusions
There are some other packages for plotting in Julia, like Gaston and Vega. Gaston is an interface to Gnuplot. Out of the box, it supports basic histograms, but if you want to do anything more complex, like add a line to a histogram, one has to drop down to the mid-level interface. In addition, Gaston doesn't currently support inline images in IJulia, and I've had problems with it recently. Vega has potential, but isn't as fully featured as some of the other plotting packages and hasn't been updated recently.
Nice post, Thanks!
ReplyDeleteThe syntax for both histograms and tuples for arguments has changed as of Julia 0.5+, so the above code to generate the data doesn't work. Try the following:
ReplyDelete```
using Distributions
using DataFrames
function auc(h::Tuple{Range{Float64},Array{Int64,1}})
freq=convert(Array{Float64,1},h[2])
e=convert(Array{Float64,1},h[1])
numbins=length(e)
deltax=e[2:numbins]-e[1:(numbins-1)]
sum(deltax.*freq)
end;
function auc(h::Histogram)
freq=convert(Array{Float64,1},h.weights)
e=convert(Array{Float64,1},h.edges[1])
numbins=length(e)
deltax=e[2:numbins]-e[1:(numbins-1)]
sum(deltax.*freq)
end;
srand(1)
x=rand(Distributions.Gamma(3,1),10000)
xhist=hist(x,100)
nbins=length(xhist[1])
xauc=auc(xhist)
xvec=vec([z::Float64 for z in xhist[1]])
xdf=DataFrame(xmin=vec(xvec[1:(nbins-1)]),xmax=vec(xvec[2:nbins]),count=xhist[2])
xdf[:dens]=xdf[:count]/xauc
xhist2=fit(Histogram,x,nbins=100)
xauc2=auc(xhist2)
nbins2=length(xhist2.edges[1])
xvec2=vec([z::Float64 for z in xhist2.edges[1]])
xdf2=DataFrame(xmin=vec(xvec2[1:(nbins-1)]),xmax=vec(xvec2[2:nbins]),count=xhist2.weights)
xdf2[:dens]=xdf2[:count]/xauc2
```
Great post! The `hist` function has been removed in Juia 0.6:
Deletehttps://github.com/JuliaLang/julia/pull/16450
Nice post, thank you!
ReplyDelete