Productive research with custom IPython extensions, part 2: BigQuery magic

Travis Kaufman
Towards Data Science
3 min readJul 17, 2018

--

After completing work on my custom IPython auto-loader extension for importing data science libraries, I tried to save myself even more time. I wanted to access data in my data warehouse directly by querying BigQuery from IPython. This blog post describes my approach to that problem. I’ve included an “appendix” at the bottom showing some example analysis using my extension with the BigQuery HackerNews dataset.

I should point out — however — that the BigQuery Python client actually has IPython support built-in. I built my extension without knowing this, but I still found the exercise useful as it demystified a lot of what was happening under the hood with IPython’s extensions.

I use Google’s BigQuery as my go-to data warehouse; it is insanely robust, very fast, easy to use, and super cost-effective. Most of the time I find myself querying BigQuery and loading the result into a dataframe using BigQuery’s python client library. Wouldn’t it be awesome if I could just type a SQL query into an IPython cell, and have the cell execute that SQL on BigQuery and store the result in a DataFrame? I’d want to be able to do something like this:

Turns out embedding SQL for BigQuery directly into notebook cells is possible in IPython via the use of cell magics. Cell magics are those commands that look like %%this which reads the contents of an entire cell and calls a registered magic function that two positional arguments, line and cell.The line argument contains the string of the line used to invoke the cell magic, e.g. the %%bq line. The cell argument is also a string that contains the rest of the contents of the cell, not including the line.
If we take the example above, the magic function bq would be called with "--name hn_daily_comment_counts" as the line argument, and "SELECT\n...\nday ASC;" — or the contents of the rest of the cell after the initial line — as the cell argument.

We can use the BigQuery python client along with the cell magic API to have the magic function query BigQuery and return the result.

As you can see above, I contain all of my magics (in this case just one) in a ResearchEnvMagics class. It’s important that this class inherits from IPython’s Magics class, as well as has the @magics_class decorator attached to it.

I also make use of IPython’s magic_arguments package in order to allow passing additional configuration parameters to my cell magic, similar to what you would use on the command line. This is a common pattern for IPython magics, hence they provide a built-in package for it, which is awesome!

The final step here is to register these magics with IPython via the extension load hook:

And that’s all there is to it! Now that the magic function is registered, it can be used in Jupyter notebooks to easily perform research and gain insights.

Of course, after building this BigQuery magic I found out by digging through the client library API docs one night that it already ships with IPython magics 😑. I wish that had been at all documented somewhere, but I enjoyed building it out nonetheless and learned a lot about cell magics in IPython. As a next step, I’d like to figure out how to add SQL syntax highlighting for the BigQuery cell magic.

I hope you found this useful, and if you use special custom IPython magics in your research environments, let me know in the comments! Would love to share information and tips. Check out below to see the extension in action on some example queries from BigQuery’s HackerNews dataset.

Simple example using BigQuery to display daily comment counts.
A more advanced example looking at comment “shocks” by taking the weekly moving average and dividing it by the weekly moving standard deviation.
Cleaning, annotating, and charting the data from the previous query. Note that I used scale() above while I was examining the data in order to de-mean it. Apparently, a post on dealing with alienation was a huge driver of comments on HackerNews around the 2008–2010 time. Very poignant, and probably deserves an article unto itself.

--

--

Software engineer specializing in UI / UX development. Proud New Yorker, lifelong learner. ⚡️Gryffindor ⚡️