Stan vs PyMc3 (vs Edward)

Published in

Towards Data Science

4 min readSep 28, 2017

The holy trinity when it comes to being Bayesian. I will provide my experience in using the first two packages and my high level opinion of the third (haven’t used it in practice). Of course then there is the mad men (old professors who are becoming irrelevant) who actually do their own Gibbs sampling.

My reaction to people writing their own samplers

Here’s my 30 second intro to all 3. You specify the generative model for the data. You feed in the data as observations and then it samples from the posterior of the data for you. Magic!

Stan was the first probabilistic programming language that I used. If you come from a statistical background it’s the one that will make the most sense. You can do things like mu~N(0,1). The documentation is absolutely amazing. Personally I wouldn’t mind using the Stan reference as an intro to Bayesian learning considering it shows you how to model data. The examples are quite extensive.

PyMC3 on the other hand was made with Python user specifically in mind. Most of the data science community is migrating to Python these days, so that’s not really an issue at all. You can see below a code example. The syntax isn’t quite as nice as Stan, but still workable. I really don’t like how you have to name the variable again, but this is a side effect of using theano in the backend. The pm.sample part simply samples from the posterior. I love the fact that it isn’t fazed even if I had a discrete variable to sample, which Stan so far cannot do.

As far as documentation goes, not quite extensive as Stan in my opinion but the examples are really good. Combine that with Thomas Wiecki’s blog and you have a complete guide to data analysis with Python.

The reason PyMC3 is my go to (Bayesian) tool is for one reason and one reason alone, the pm.variational.advi_minibatch function. Bayesian models really struggle when it has to deal with a reasonably large amount of data (~10000+ data points). Variational inference is one way of doing approximate Bayesian inference. Both Stan and PyMC3 has this. But it is the extra step that PyMC3 has taken of expanding this to be able to use mini batches of data that’s made me a fan.

The Maths (optional)

Getting a just a bit into the maths what Variational inference does is maximise a lower bound to the log probability of data log p(y).

We try to maximise this lower bound by varying the hyper-parameters of the proposal distribution q(z_i) and q(z_g). z_i refers to the hidden (latent) variables that are local to the data instance y_i whereas z_g are global hidden variables. Last I checked with PyMC3 it can only handle cases when all hidden variables are global (I might be wrong here).

Essentially what I feel that PyMC3 hasn’t gone far enough with is letting me treat this as a truly just an optimization problem. The second term can be approximated with

where n is the minibatch size and N is the size of the entire set. This is the essence of what has been written in this paper by Matthew Hoffman. I want to specify the model/ joint probability and let theano simply optimize the hyper-parameters of q(z_i), q(z_g). This is where GPU acceleration would really come into play. Stan really is lagging behind in this area because it isn’t using theano/ tensorflow as a backend.

Edward

I’ve kept quiet about Edward so far. I haven’t used Edward in practice. I feel the main reason is that it just doesn’t have good documentation and examples to comfortably use it. It is true that I can feed in PyMC3 or Stan models directly to Edward but by the sound of it I need to write Edward specific code to use Tensorflow acceleration. I’ve got a feeling that Edward might be doing Stochastic Variatonal Inference but it’s a shame that the documentation and examples aren’t up to scratch the same way that PyMC3 and Stan is. However, I must say that Edward is showing the most promise when it comes to the future of Bayesian learning (due to alot of work done in Bayesian Deep Learning).

So in conclusion, PyMC3 for me is the clear winner these days. It would be great if I didn’t have to be exposed to the theano framework every now and then, but otherwise it’s a really good tool. I would love to see Edward or PyMC3 moving to a Keras or Torch backend just because it means we can model (and debug better). Happy modelling!

See here for my course on Machine Learning and Deep Learning (Use code DEEPSCHOOL-MARCH to 85% off).

Stan vs PyMc3 (vs Edward)

The Maths (optional)

Edward

Written by Sachin Abeywardana