Counting No. of Parameters in Deep Learning Models by Hand

5 simple examples to count parameters in FFNN, RNN and CNN models

Published in

Towards Data Science

5 min readJan 21, 2019

Why do we need to count the number of parameters in a deep learning model again? We don’t. But in cases where we need to reduce the file size of the model or even reduce the time taken for model inference, knowing the number of parameters before and after model quantization would come in handy. (See video here on Efficient Methods and Hardware for Deep Learning.)

Counting the number of trainable parameters of deep learning models is considered too trivial, because your code can already do this for you. But I’d like to keep my notes here for us to refer to once in a while. Here are the models that we’ll run through:

Feed-Forward Neural Network (FFNN)
Recurrent Neural Network (RNN)
Convolutional Neural Network (CNN)

In parallel, I will build the model with APIs from Keras for easy prototyping and a clean code so let’s quickly import the relevant objects here:

from keras.layers import Input, Dense, SimpleRNN, LSTM, GRU, Conv2D
from keras.layers import Bidirectional
from keras.models import Model

After building the model, call model.count_params() to verify how many parameters are trainable.

1. FFNNs

i, input size
h, size of hidden layer
o, output size

For one hidden layer,

num_params
= connections between layers + biases in every layer
= (i×h + h×o) + (h+o)

Example 1.1: Input size 3, hidden layer size 5, output size 2

Fig. 1.1: FFNN with input size 3, hidden layer size 5, output size 2. The graphics reflect the actual no. of units.

i = 3
h = 5
o = 2

num_params
= connections between layers + biases in every layer
= (3×5 + 5×2) + (5+2)
= 32

 input = Input((None, 3))
 dense = Dense(5)(input)
output = Dense(2)(dense)
 model = Model(input, output)

Example 1.2: Input size 50, hidden layers size [100,1,100], output size 50

Fig. 1.2: FFNN with 3 hidden layers. The graphics do not reflect the actual no. of units.

i = 50
h = 100, 1, 100
o = 50

num_params
= connections between layers + biases in every layer
= (50×100 + 100×1 + 1×100 + 100×50) + (100+1+100+50)
= 10,451

 input = Input((None, 50))
 dense = Dense(100)(input)
 dense = Dense(1)(dense)
 dense = Dense(100)(dense)
output = Dense(50)(dense)
 model = Model(input, output)

2. RNNs

g, no. of FFNNs in a unit (RNN has 1, GRU has 3, LSTM has 4)
h, size of hidden units
i, dimension/size of input

Since every FFNN has h(h+i) + h parameters, we have

num_params = g × [h(h+i) + h]

Example 2.1: LSTM with 2 hidden units and input dimension 3.

Fig. 2.1: An LSTM cell. Taken from here.

g = 4 (LSTM has 4 FFNNs)
h = 2
i = 3

num_params
= g × [h(h+i) + h]
= 4 × [2(2+3) + 2]
= 48

input = Input((None, 3))
 lstm = LSTM(2)(input)
model = Model(input, lstm)

Example 2.2: Stacked Bidirectional GRU with 5 hidden units and input size 8 (whose outputs are concatenated) + LSTM with 50 hidden units

Fig. 2.2: A stacked RNN consisting of BiGRU and LSTM layers. The graphics do not reflect the actual no. of units.

Bidirectional GRU with 5 hidden units and input size 8

g = 3 (GRU has 3 FFNNs)
h = 5
i = 8

num_params_layer1
= 2 × g × [h(h+i) + h] (first term is 2 because of bidirectionality)
= 2 × 3 × [5(5+8) + 5]
= 420

LSTM with 50 hidden units

g = 4 (LSTM has 4 FFNNs)
h = 50
i = 5+5 (outputs from bidirectional GRU concatenated; output size of GRU is 5, same as no. of hidden units)

num_params_layer2
= g × [h(h+i) + h]
= 4 × [50(50+10) + 50]
= 12,200

total_params = 420 + 12,200 = 12,620

 input = Input((None, 8))
layer1 = Bidirectional(GRU(5, return_sequences=True))(input)
layer2 = LSTM(50)(layer1)
 model = Model(input, layer2)

merge_mode is concatenation by default.

CNNs

For one layer,

i, no. of input maps (or channels)
f, filter size (just the length)
o, no. of output maps (or channels. this is also defined by how many filters are used)

One filter is applied to every input map.

num_params
= weights + biases
= [i × (f×f) × o] + o

Example 3.1: Greyscale image with 2×2 filter, output 3 channels

Fig. 3.1: Convolution of a greyscale image with 2×2 filter to output 3 channels. Here, there are 15 parameters — 12 weights and 3 biases.

i = 1 (greyscale has only 1 channel)
f = 2
o = 3

num_params
= [i × (f×f) × o] + o
= [1 × (2×2) × 3] + 3
= 15

 input = Input((None, None, 1))
conv2d = Conv2D(kernel_size=2, filters=3)(input)
 model = Model(input, conv2d)

Example 3.2: RGB image with 2×2 filter, output of 1 channel

There is 1 filter for each input feature map. The resulting convolutions are added element-wise, and a bias term is added to each element. This gives an output with 1 feature map.

Fig. 3.2: Convolution of an RGB image with 2×2 filter to output 1 channel. Here, there are 13 parameters — 12 weights and 1 bias.

i = 3 (RGB image has 3 channels)
f = 2
o = 1

num_params
= [i × (f×f) × o] + o
= [3 × (2×2) × 1] + 1
= 13

 input = Input((None, None, 3))
conv2d = Conv2D(kernel_size=2, filters=1)(input)
 model = Model(input, conv2d)

Example 3.3: Image with 2 channels, with 2×2 filter, and output of 3 channels

There are 3 filters (purple, yellow, cyan) for each input feature map. The resulting convolutions are added element-wise, and a bias term is added to each element. This gives an output with 3 feature maps.

Fig. 3.1: Convolution of a 2-channel image with 2×2 filter to output 3 channels. Here, there are 27 parameters — 24 weights and 3 biases.

i = 2
f = 2
o = 3

num_params
= [i × (f×f) × o] + o
= [2 × (2×2) × 3] + 3
= 27

 input = Input((None, None, 2))
conv2d = Conv2D(kernel_size=2, filters=3)(input)
 model = Model(input, conv2d)

That’s all for now! Do leave comments below if you have any feedback!

Counting No. of Parameters in Deep Learning Models by Hand

5 simple examples to count parameters in FFNN, RNN and CNN models

1. FFNNs

2. RNNs

CNNs

Related Articles on Deep Learning

Written by Raimi Karim