TensorFlow/Keras cheat sheet

All the Standard Patterns We Always Use Collected in One Place

Dmytro Nikolaiev (Dimid)
Towards Data Science

--

Preview. Image by Author

In this article, you will find a complete cheat sheet for building neural networks using TensorFlow/Keras. I prepared this cheat sheet while getting ready for the TensorFlow Developer Certification exam and it helped me a lot. Read the following article to know more about my experience with the exam.

You can access this cheat sheet in the following GitHub repository.

Without further ado, let’s get down to work.

Cheat Sheet Structure

To be consistent in my narrative, first, we will look at some typical neural network architectures with a Sequential API and also consider an example of non-sequential architecture with Functional API. Then we will consider the main methods of the tf.keras.Model object: compile, fit (using callbacks), evaluate, save and load.

We will also take a look at Image Data Generator for computer vision and tokenizing/padding sentences for NLP tasks. In the repository, this code is located at the beginning of the code file, but here I decided to put it in the end because it is more specialized but at the same time widely used.

Imports

In order not to be distracted by imports in the future, I will list them right away. I don’t find it convenient to import each layer separately, so I use the tf.keras.layers.layer or layers.layer structures. It’s personal taste, so your imports may be slightly different.

Some variables were also assigned beforehand and are not listed in the gists below. Basically, these are constants printed in ALL CAPITAL LETTERS or just helping variables and functions like a path to some directory or dataset preparations. Please note that when you run this code sequentially, you will get some errors: this is a cheat sheet, not a script or program.

Typical Neural Network Architectures with Sequential API

The three main architectures of neural networks are deep feed-forward (usually called just deep), convolutional and recurrent nets: DNN, CNN, and RNN.

  • DNN is widely used for general-purpose tasks.
    Usually, it is a simple sequence of fully connected (dense) layers with some additions: dropout, batch normalization, skip connections, and so on.
  • CNN is used for image processing: image classification, object detection, semantic segmentation, and other computer vision tasks.
    Typically, simple CNN architecture is a sequence of convolutional-pooling blocks followed by a little fully connected network.

RNN are good for sequential data processing — NLP tasks and time-series predictions.

  • A typical RNN for NLP architecture is an embedding layer (pretrained or not) and a sequence of bidirectional LSTM layers since all text is visible to the model immediately.
  • RNN for time-series predictions is typically one-directional, although using bidirectional layers may improve the quality too. 1D convolutions and other techniques and tricks are also widely used here.

Official documentation for Sequential API is here.

More Complex Neural Network Architectures with Functional API

Functional API allows you to add non-sequential connections in your network, and specify multiple inputs and outputs. Below is a rather cumbersome, but demonstrative description of a network that takes text, an image, and several numbers as input features.

Official documentation for Functional API is here.

When model.summary() method does not cope well with displaying a network structure, you can visualize your model to explore it more clearly. I have described different visualization techniques in this article (see Visualize Neural Network Architecture section).

Visualization of the network above — output of the plot_model() function. Image by Author

Compile the model

Using model.compile() method we set three main parameters: the optimizer, the loss function, and the metrics to observe. The standard choice for the optimizer is Adam or RMSProp, and standard metrics are accuracy for classification and MSE or MAE for regression.

The choice of the loss function strongly depends on the task, but usually, these are standard values:

  • MSE, MAE, or Huber loss for regression,
  • binary cross-entropy loss for binary classification, and
  • sparse categorical cross-entropy loss for multiclass classification tasks if your label is an integer (i.e. 1, 2, 3 for 3-class classification).
    Categorical cross-entropy loss is usually used when your labels are represented as one-hot encoded vectors (i.e. [1, 0, 0], [0, 1, 0] and [0, 0, 1] for 3-class classification).

Official documentation for compile method is here.
The list of available optimizers is here.
The list of available losses is here.
The list of available metrics is here.

Train the model

When we have the model defined and compiled, we can finally start the learning process. model.fit()is the main method of the model, executing which the network is learning from the data. You can train networks by specifying different data sources, but the most typical ones are regular data in the form of arrays or tensors, ImageDataGenerator for computer vision, padded sequences for NLP, and tf.data.Dataset object. Validation data can be specified explicitly or by setting the validation_split parameter.

Special attention should be paid to callbacks because they are specified right now. These are specific functions executed during training. The most typical of them in my opinion are ModelCheckpoint (saves your model during its training) and EarlyStopping (stops the training process if the loss has stopped improving).

Official documentation for fit method is here.
The list of available callbacks is here.
Official documentation for ModelCheckpoint is here.
Official documentation for EarlyStopping is here.

Note that each method call of the above is different and you must have the model structure and model compilation correctly specified. See full code files in the repo for more details.

Explore learning curves

Don’t forget to save your training results into a history variable to explore learning curves — they can tell you a lot about learning. The function below is a part of an official TensorFlow Developer Professional Certification repository.

Also, remember that using Jupyter Notebook or Google Collab you can save a learning process using the special _ variable, which remembers a value of the last returned function/statement. Explore the special_variable_example.ipynb notebook in the repo for more details.

Example with the special _ variable. Image by Author

Evaluate the model

model.evaluate() method allows you to see your model performance on previously unseen data — usually, it’s a test set that was put aside at the beginning of your research.

Official documentation for evaluate method is here.

Save and load the model

Saving the model for loading and later usage is very important. I also provide you a code to save your model with the current date and time in its filename so you can run multiple training processes and be sure that all your results will be saved.

Official documentation for save method is here.
Official documentation for load_model function is here.

Preparing Data for Computer Vision and NLP tasks

At this point, the general part of the cheat sheet (about neural networks) has come to an end. Finally, let me provide some useful code snippets for ImageDataGenerator and text data preparation.

Using ImageDataGenerator

ImageDataGenerator will help you a lot in computer vision tasks —it will label your images automatically based on directory structure or perform in-memory data augmentation for you.

Official documentation for ImageDataGenerator is here.

Tokenizing and padding sentences for NLP tasks

Tokenization and padding sentences are common practices for NLP tasks. First, you turn sentences into vectors, and then you make sure that all these vectors have a fixed length to feed them to the input of your model.

You can also use the TextVectorization layer as I did in this article (see Convert Text to Vectors Using Pretrained Word Embeddings section).

Text data preprocessing options using TensorFlow. Image by Author

Official documentation for Tokenizer is here.
Official documentation for pad_sequences method is here.

Conclusions

Of course, it is impossible to collect all potential TensorFlow/Keras patterns in one cheat sheet, but I think I succeeded to collect the main ones here in a convenient form.

Cheat sheets bring a lot of benefits when you use them, but even more when you create them. After reading this article, try to create a similar cheat sheet for yourself, correcting mistakes or inaccuracies, and adjusting it to your particular tasks — this can be a good exercise. After that, don’t forget to share your results and compare them with mine. Good luck!

Thank you for reading!

  • I hope these materials were useful to you. Follow me on Medium to get more articles like this.
  • If you have any questions or comments, I will be glad to get any feedback. Ask me in the comments, or connect via LinkedIn or Twitter.
  • To support me as a writer and to get access to thousands of other Medium articles, get Medium membership using my referral link (no extra charge for you).

References

--

--