Getting Started

Comparing Keras and PyTorch syntaxes

Data loading, model definition and tensor manipulation

Adam Oudad
Towards Data Science
4 min readMar 2, 2021

--

Photo by cottonbro from Pexels

Keras and PyTorch are popular frameworks for building programs with deep learning. The former, Keras, is more precisely an abstraction layer for Tensorflow and offers the capability to prototype models fast.
There are similar abstraction layers developped on top of PyTorch, such as PyTorch Ignite or PyTorch lightning. They are not yet as mature as Keras, but are worth the try!

I found few resources or articles comparing codes in both Keras and PyTorch and I will show such example in this article, to help understand the key differences in terms of syntax and naming between frameworks.

This article is the first of a series. In the next article, I compare frameworks on a practical example which is sentiment classification.

Prepare the data

The first comparison is on how data is loaded and prepared. Loading data can be achieved in a very similar fashion between both frameworks, using utils.Sequence class in Keras and using utils.dataset in PyTorch. In Keras you would have something like

And here is the same code in PyTorch.

Define and build models

In Keras, we can define and build a model at the same time. In the following example, we use the Sequential (https://keras.io/api/models/sequential/) to build an LSTM network with an embedding layer and a single neuron output.

And here is the same architecture in PyTorch.

The __init__ function instantiates the different modules of the network while the actual computation is decided in the forward function. Actually, we still need to “compile” the model like in the Keras example. However, as you will see in how models are trained, we define metrics, models and optimizers separately in PyTorch and call them when needed in the training loop. So we only need to define the same criterion for metric and the same optimizer as above.

In most cases, default parameters in Keras will match defaults in PyTorch, as it is the case for the Adam optimizer and the BCE (Binary Cross-Entropy) loss.

To summarize, we have this table of comparison of the two syntaxes.

Comparison table of Keras and PyTorch syntaxes

Manipulate tensors

Both frameworks have their own specificities in syntax for manipulating tensors. Here we will compare PyTorch and Tensorflow.

Shape of tensors

Pytorch has .shape and .size which are both equivalent to access the shape of tensors.

import torch
t = torch.zeros((4, 3))
print(t.shape, t.size()) # Both equal to (4, 3)
t.shape[1], t.size(1) # Both equal to 3

Tensorflow has only .shape

t = tf.zeros((4, 3))
print(t.shape) # .size is not available
print(t.shape[1])

Order of dimensions

Keras usually orders dimensions as (batch_size, seq_len, input_dim), whereas Pytorch prefers to order them by default as (seq_len, batch_size, input_dim). In PyTorch, recurrent networks like LSTM, GRU have a switch parameter batch_first which, if set to True, will expect inputs to be of shape (seq_len, batch_size, input_dim). However modules like Transformer do not have such parameter. In this case, the input will have to be adapted. To do so, you can switch dimensions in Pytorch using .transpose method.

data = torch.Tensor(tensor_with_batch_first)
data.transpose(0, 1) # Switch first and second dimensions

The order chosen by PyTorch is more natural from a parallel computing viewpoint. For example, a recurrent layer will be applied in parallel at each step of the sequence, to all batch, so we will iterate over the seq_len dimension which is first. The order preferred by Keras is more natural in terms of model architecture, since we would rather consider one input sequence to be fed to the model, then simply duplicate the operation for a batch.

Initialize vectors

PyTorch has a syntax very similar to numpy.

# Matrix of size (2, 4, 1) filled with 1
torch.ones(2, 4, 1)
# Identity matrix of size (3,3)
torch.eye(3)

Good news! All above methods are present and work the same in Tensorflow.
In addition, we have torch.full which is the equivalent of numpy.fill, for filling a tensor with one value. Tensorflow has tf.fill.

# Fill a (2, 4) matrix with 3.14 value
torch.full((2, 4), fill_value=3.14)
tf.fill((2, 4), value=3.14)

Here is how to sample a matrix of random numbers.

# Sample from N(0, 1) a matrix of size (2, 3)
torch.randn(2, 3)
tf.random.normal(shape=[2, 3])
# Sample uniformely a (2, 5) matrix of integers within [10, 20[
torch.randint(low=10, high=20, size=(2, 5))
tf.random.uniform(shape=[2, 5], minval=10, maxval=20, type=tf.int64)

Reproducibility seed for the random number generator can be set with

torch.random.manual_seed(0)
tf.random.set_seed(0)

Conclusion

While Keras and Pytorch have very similar data loading logic, their syntax quite differs for the rest. PyTorch has a pythonic syntax while Keras is designed for writing short and concise programs, without taking too much time on expliciting building blocks. There are many more points of comparison but I hope this article gives some insights on both frameworks. For the sake of completeness, I share some resources I found covering a comparison between Keras and PyTorch.

You finished part one! Part two below compares Keras and PyTorch on sentiment classification.

Originally published at https://adamoudad.github.io on March 2, 2021.

--

--

(Machine) learning. PhD candidate, Keio University, Japan. I write about machine learning, statistics, computer science and maths.