BERT in Keras with Tensorflow hub

Published in

Towards Data Science

2 min readMar 21, 2019

At Strong Analytics, many of our projects involve using deep learning for natural language processing. In one recent project we worked to encourage kids to explore freely online while making sure they stayed safe from cyberbullying and online abuse, while another involved predicting deductible expenses from calendar and email events.

A key component of any NLP project is the ability to rapidly test and iterate using techniques. Keras offers a very quick way to prototype state-of-the-art deep learning models, and is therefore an important tool we use in our work.

In a previous post, we demonstrated how to integrate ELMo embeddings as a custom Keras layer to simplify model prototyping using Tensorflow hub. BERT, a language model introduced by Google, uses transformers and pre-training to achieve state-of-the-art on many language tasks. It has recently been added to Tensorflow hub, which simplifies integration in Keras models.

Deeply bidirectional unsupervised language representations with BERT

Let’s get building! First, we load the same IMDB data we used previously:

Next, we tokenize the data using the tf-hub model, which simplifies preprocessing:

We next build a custom layer using Keras, integrating BERT from tf-hub. The model is very large (110,302,011 parameters!!!) so we fine tune a subset of layers.

Now, we can easily build and train our model using the BERT layer:

Using a GPU for large models like BERT is advised!

Pretty easy! See the full notebook on Github and build cool stuff!

Want to work on challenging NLP, Machine Learning, and AI in a variety of industries with a team of top data scientists in Chicago? We’re hiring talented data scientists and engineers!

Learn more at strong.io and apply at careers.strong.io

BERT in Keras with Tensorflow hub

Written by Jacob Zweig