This article will go through step-by-step differentiation of Sigmoid and Cross-Entropy functions. The understanding of Derivatives of these two functions is essential in the area of machine learning when performing back-propagation during model training.

Derivative of Sigmoid Function
Sigmoid/ Logistic function is defined as:

Where is e
is the Euler’s number – a transcendental constant approximately equal to 2.718281828459
. For any value of x
, the Sigmoid function g(x)
falls in the range (0, 1)
. As a value of x
decreases, g(x)
approaches 0
, whereas as x
grows bigger, g(x)
tends to 1
. Examples,

From here, we will now differentiate the Sigmoid function using two methods – Quotient and chain rules of differentiation.
Derivative of Sigmoid Function using Quotient Rule
Step 1: Stating the Quotient Rule

The quotient rule is read as "the derivative of a quotient is the denominator multiplied by derivative of the numerator subtract the numerator multiplied by the derivative of the denominator everything divided by the square of the denominator."
Step 2: Apply the Quotient rule
From the Sigmoid function, g(x)
and the quotient rule, we have

Two things to note:
- The derivative of of a constant is equals to zero. That is why
u'=0
. - The differentiation of exponential function (
e*
) inv
is covered by Exponential rule of differentiation.

By quotient and exponential rule of differentiation, we have

That is the derivative of a Sigmoid function but we can simplify further as shown in the next step.
Step 3: Simplifying the derivative
In this step, we will use some concepts on algebra to simplify the derivative result in Step 2.

Note: In Equation 5, we added 1 and subtracted 1 to the equation so we actually changed nothing.
That marks the end of the differentiation process using quotient rule.
Differentiating Sigmoid Function using Chain Rule
Step 1: The chain rule

Step 2: Rewrite the Sigmoid function as a negative exponent

Step 3: Applying chain rule to Sigmoid function in Step 2
Let,

Then, by chain rule, we will proceed as follows,

At this point, you can proceed to simplify the equation using the same steps we took when we worked on quotient rule (Equations 3
through 8
).
Here is a plot of Sigmoid function and its derivative

Derivative of Cross-Entropy Function
Cross-Entropy loss function is a very important cost function used for classification problems. In this post, however, we will focus solely on differentiating the loss function. Nonetheless, you can read more about Cross-Entropy loss function in the link given below
Cross-Entropy loss function is defined as:

where tᵢ is the truth value and pᵢ is the probability of the iᵗʰ class.
For classification with two classes, we have binary cross-entropy loss which is defined as follows

Derivative of binary cross-entropy function
The truth label, t
, on the binary loss is a known value, whereas yhat
is a variable. This means that the function will be differentiated with respect to yhat
and treat t as a constant. Let’s go ahead and work on the derivative now.
Step 1: Stating two rules we need to differentiate binary cross-entropy loss
To differentiate the binary cross-entropy loss, we need these two rules:


and the product rule reads, "the derivative of a product of two functions is the first function multiplied by the derivative of the second plus the second function multiplied by the derivative of the first function."
Step 2: Differentiating the function
We will use the product rule to work on the derivatives of the two terms separately; then, by Rule 1
we will combine the two derivatives.
Since we have two unknowns – t
and yhat
– we will actually work on partial derivative (a partial derivative of a function of several variables is its derivative with respect to one of the variables, with the other variables regarded as constant).


And therefore, the derivative of the binary cross-entropy loss function becomes

That marks the end of this article. Thanks for reading 🙂
Conclusion
In this article, we worked on the derivatives of the Sigmoid function and binary cross-entropy function. The former is used mainly in Machine Learning as an activation function, whereas the latter is often used as a cost function to evaluate models. The derivatives found here are especially fundamental during a network’s back-propagation process – an essential step during model training.
Please sign up for medium membership at 5$ only per month to be able to read all my articles on Medium and those of other writers.
You can also subscribe to get my article into your email inbox when I post.
Thank you for reading, see you in the next!!!