Creating custom Loss functions using TensorFlow 2

Learning to write custom loss using wrapper functions and OOP in python

Published in

Towards Data Science

6 min readDec 14, 2020

**Figure 1: Gradient descent algorithm in action** ( Source: Public Domain, https://commons.wikimedia.org/w/index.php?curid=521422)

A neural network learns to map a set of inputs to a set of outputs from training data. It does so by using some form of optimization algorithm such as gradient descent, stochastic gradient descent, AdaGrad, AdaDelta or some recent algorithms such as Adam, Nadam or RMSProp. The ‘gradient’ in gradient descent refers to error gradient. After each iteration the network compares its predicted output to the real outputs, and then calculates the ‘error’. Typically, with neural networks, we seek to minimize the error. As such, the objective function used to minimize the error is often referred to as a cost function or a loss function and the value calculated by the ‘loss function’ is referred to as simply ‘loss’. Typical loss functions used in various problems –

a. Mean Squared Error

b. Mean Squared Logarithmic Error

c. Binary Crossentropy

d. Categorical Crossentropy

e. Sparse Categorical Crossentropy

In Tensorflow, these loss functions are already included, and we can just call them as shown below.

Loss function as a string

model.compile (loss = ‘binary_crossentropy’, optimizer = ‘adam’, metrics = [‘accuracy’])

or,

2. Loss function as an object

from tensorflow.keras.losses import mean_squared_error
model.compile(loss = mean_squared_error, optimizer=’sgd’)

The advantage of calling a loss function as an object is that we can pass parameters alongside the loss function, such as threshold.

from tensorflow.keras.losses import mean_squared_error
model.compile (loss=mean_squared_error(param=value), optimizer = ‘sgd’)

Creating a custom loss using function:

For creating loss using function, we need to first name the loss function, and it will accept two parameters, y_true (true label/output) and y_pred (predicted label/output).

def loss_function(y_true, y_pred):
***some calculation***
return loss

Creating Root Mean Square Error loss (RMSE):

Loss function name — my_rmse

Aim is to return the root mean square error between target (y_true) and prediction (y_pred).

Formula of RMSE:

error: the difference between the true label and predicted label.
sqr_error: the square of the error.
mean_sqr_error: the mean of the square of the error
sqrt_mean_sqr_error: the square root of the mean of the square of the error (the root mean squared error).

import tensorflow as tf
import numpy as np
from tensorflow import keras
from tensorflow.keras import backend as K#defining the loss functiondef my_rmse(y_true, y_pred):    #difference between true label and predicted label
    error = y_true-y_pred        #square of the error
    sqr_error = K.square(error)    #mean of the square of the error
    mean_sqr_error = K.mean(sqr_error)    #square root of the mean of the square of the error
    sqrt_mean_sqr_error = K.sqrt(mean_sqr_error)    #return the error
    return sqrt_mean_sqr_error#applying the loss function
model.compile (optimizer = 'sgd', loss = my_rmse)

Creating Huber Loss:

**Figure 2: Huber loss (green) and squared error loss (blue) as a function of y — f(x)** (Source: By Qwertyus — Own work, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=34836380)

Formula of Huber Loss:

Here,

δ is the threshold,

a is the error ( we will calculate a , difference between label and prediction )

So, when |a| ≤δ, loss = 1/2*(a)²

and when, |a|>δ, loss = δ(|a| — (1/2)*δ)

Code:

import tensorflow as tfdef my_huber_loss(y_true, y_pred):   
  
    threshold = 1     
    error = y_true - y_pred     
    is_small_error = tf.abs(error) <= threshold     
    small_error_loss = tf.square(error) / 2     
    big_error_loss = threshold * (tf.abs(error) - (0.5 * threshold))return tf.where(is_small_error, small_error_loss, big_error_loss)

Explanation:

First we define a function — my huber loss, which takes in y_true and y_pred

Next we set the threshold = 1.

Next we calculate the error a = y_true-y_pred

Next we check if the absolute value of the error is less than or equal to the threshold. is_small_error returns a boolean (True or False).

We know that, when, |a| ≤δ, loss = 1/2*(a)², so we calculate the small_error_loss as the square of the error divided by 2.

Else, when, |a| >δ, then loss is equal to δ(|a| — (1/2)*δ). We calculate this in big_error_loss.

Finally, in the return statement, we first check if is_small_error is true or false, if it is true, the function returns the small_error_loss, or else it returns the big_error_loss. This is done using tf.where.

We can then compile the model using the code below,

model.compile(optimizer='sgd', loss=my_huber_loss)

In the previous code, we always use threshold as 1.

But what if, we want to tune the hyperparameter (threshold) and add a new threshold value during compilation. Then we have to use fuction wrapping, that is, wrapping the loss function around another external function. We need a wrapper function as any loss functions can accept only y_true and y_pred values by default, and we can not add any other parameters to the original loss function.

Huber Loss using Wrapper Function

This is what the wrapper function code looks like:

import tensorflow as tf#wrapper function which accepts the threshold parameterdef my_huber_loss_with_threshold(threshold):   def my_huber_loss(y_true, y_pred):   
  
       error = y_true - y_pred     
       is_small_error = tf.abs(error) <= threshold     
       small_error_loss = tf.square(error) / 2     
       big_error_loss = threshold * (tf.abs(error) - (0.5 * threshold))       return tf.where(is_small_error, small_error_loss, big_error_loss)    return my_huber_loss

In this case, the threshold value is not hardcoded. Rather we can pass the threshold value during model compilation.

model.compile(optimizer='sgd', loss=my_huber_loss_with_threshold(threshold=1.5))

Huber Loss using Classes (OOP)

import tensorflow as tf
from tensorflow.keras.losses import Loss
class MyHuberLoss(Loss): #inherit parent class
  
    #class attribute
    threshold = 1
  
    #initialize instance attributes
    def __init__(self, threshold):
        super().__init__()
        self.threshold = threshold

    #compute loss
    def call(self, y_true, y_pred):
        error = y_true - y_pred
        is_small_error = tf.abs(error) <= self.threshold
        small_error_loss = tf.square(error) / 2
        big_error_loss = self.threshold * (tf.abs(error) - (0.5 * self.threshold))
        return tf.where(is_small_error, small_error_loss, big_error_loss)

MyHuberLoss is the class name. After the class name, we inherit the parent class ‘Loss’ from tensorflow.keras.losses. So MyHuberLoss inherits as Loss. This allows us to use MyHuberLoss as a loss function.

__init__ initialises the object from the class.

call function that gets executed when an object is instantiated from the class

The init function gets the threshold and the call function gets the y_true and y_pred parameters that we sell previously. So we will declare threshold as a class variable, which allows us to give it an initial value.

Within __init__ function we set threshold to self.threshold.

In call function, all threshold class variable will then be referred by self.threshold.

Here is how we can use this loss function in model.compile.

model.compile(optimizer='sgd', loss=MyHuberLoss(threshold=1.9))

Creating Contrastive Loss (used in Siamese Networks):

Siamese networks compare if two images are similar or not. Contrastive loss is the loss function used in siamese networks.

In the formula above,

Y_true is the tensor of details about image similarities. They are one if the images are similar and they are zero if they’re not.

D is the tensor of Euclidean distances between the pairs of images.

Margin is a constant that we can use to enforce a minimum distance between them in order to consider them similar or different.

If Y_true =1, the first part of the equation becomes D², and the second part becomes zero. So, the D² term has more weight when Y_true is close to 1.

If Y_true = 0, then the first part of the equation becomes zero, and the second part yields some result. This gives much more weight to the max term and less weight to the D squared term, so the max term dominates the calculation of the loss.

Contrastive Loss using Wrapper Function

def contrastive_loss_with_margin(margin):
    def contrastive_loss(y_true, y_pred):
        
        square_pred = K.square(y_pred)
        margin_square = K.square(K.maximum(margin - y_pred, 0))
        return K.mean(y_true * square_pred + (1 - y_true) * margin_square)    return contrastive_loss

Conclusion:

Any loss functions not available in Tensorflow can be created using functions, wrapper functions or by using classes in a similar way.