The world’s leading publication for data science, AI, and ML professionals.

Derivative of Sigmoid and Cross-Entropy Functions

A step-by-step differentiation of the Sigmoid activation and cross-entropy loss function.

This article will go through step-by-step differentiation of Sigmoid and Cross-Entropy functions. The understanding of Derivatives of these two functions is essential in the area of machine learning when performing back-propagation during model training.

Photo by Saad Ahmad on Unsplash
Photo by Saad Ahmad on Unsplash

Derivative of Sigmoid Function

Sigmoid/ Logistic function is defined as:

Figure 1: Sigmoid Function. Left: Sigmoid equation and right is the plot of the equation (Source:Author).
Figure 1: Sigmoid Function. Left: Sigmoid equation and right is the plot of the equation (Source:Author).

Where is e is the Euler’s number – a transcendental constant approximately equal to 2.718281828459. For any value of x, the Sigmoid function g(x) falls in the range (0, 1). As a value of x decreases, g(x) approaches 0, whereas as x grows bigger, g(x) tends to 1. Examples,

Some values of g(x) given values of x.
Some values of g(x) given values of x.

From here, we will now differentiate the Sigmoid function using two methods – Quotient and chain rules of differentiation.

Derivative of Sigmoid Function using Quotient Rule

Step 1: Stating the Quotient Rule

The quotient rule.
The quotient rule.

The quotient rule is read as "the derivative of a quotient is the denominator multiplied by derivative of the numerator subtract the numerator multiplied by the derivative of the denominator everything divided by the square of the denominator."

Step 2: Apply the Quotient rule

From the Sigmoid function, g(x) and the quotient rule, we have

Two things to note:

  • The derivative of of a constant is equals to zero. That is why u'=0.
  • The differentiation of exponential function (e*) in v is covered by Exponential rule of differentiation.
Exponential rule.
Exponential rule.

By quotient and exponential rule of differentiation, we have

That is the derivative of a Sigmoid function but we can simplify further as shown in the next step.

Step 3: Simplifying the derivative

In this step, we will use some concepts on algebra to simplify the derivative result in Step 2.

Note: In Equation 5, we added 1 and subtracted 1 to the equation so we actually changed nothing.

That marks the end of the differentiation process using quotient rule.

Differentiating Sigmoid Function using Chain Rule

Step 1: The chain rule

Chain rule.
Chain rule.

Step 2: Rewrite the Sigmoid function as a negative exponent

Step 3: Applying chain rule to Sigmoid function in Step 2

Let,

Then, by chain rule, we will proceed as follows,

At this point, you can proceed to simplify the equation using the same steps we took when we worked on quotient rule (Equations 3 through 8).

Here is a plot of Sigmoid function and its derivative

Sigmoid function and its derivative (Source: Author).
Sigmoid function and its derivative (Source: Author).

Derivative of Cross-Entropy Function

Cross-Entropy loss function is a very important cost function used for classification problems. In this post, however, we will focus solely on differentiating the loss function. Nonetheless, you can read more about Cross-Entropy loss function in the link given below

Cross-Entropy Loss Function

Cross-Entropy loss function is defined as:

where t is the truth value and p is the probability of the iᵗʰ class.

For classification with two classes, we have binary cross-entropy loss which is defined as follows

Binary cross-entropy loss function where t is the truth value and yhat is the predicted probability.
Binary cross-entropy loss function where t is the truth value and yhat is the predicted probability.

Derivative of binary cross-entropy function

The truth label, t, on the binary loss is a known value, whereas yhat is a variable. This means that the function will be differentiated with respect to yhat and treat t as a constant. Let’s go ahead and work on the derivative now.

Step 1: Stating two rules we need to differentiate binary cross-entropy loss

To differentiate the binary cross-entropy loss, we need these two rules:

and the product rule reads, "the derivative of a product of two functions is the first function multiplied by the derivative of the second plus the second function multiplied by the derivative of the first function."

Step 2: Differentiating the function

We will use the product rule to work on the derivatives of the two terms separately; then, by Rule 1 we will combine the two derivatives.

Since we have two unknowns – t and yhat – we will actually work on partial derivative (a partial derivative of a function of several variables is its derivative with respect to one of the variables, with the other variables regarded as constant).

And therefore, the derivative of the binary cross-entropy loss function becomes

That marks the end of this article. Thanks for reading 🙂

Conclusion

In this article, we worked on the derivatives of the Sigmoid function and binary cross-entropy function. The former is used mainly in Machine Learning as an activation function, whereas the latter is often used as a cost function to evaluate models. The derivatives found here are especially fundamental during a network’s back-propagation process – an essential step during model training.


Please sign up for medium membership at 5$ only per month to be able to read all my articles on Medium and those of other writers.

You can also subscribe to get my article into your email inbox when I post.

Thank you for reading, see you in the next!!!


Related Articles