Exploring the Softmax Function

Developing Intuition With the Wolfram Language

Arnoud Buzing
Towards Data Science

--

Photo by Kalen Emsley on Unsplash

In machine learning, classification problems are often solved with neural networks which give probabilities for each class or type it is trained to recognize. A typical example is image classification where the input to a neural network is an image and the output is a list of possible things that image represents with probabilities.

The Wolfram Language (WL) comes with a large library of pre-trained neural networks including ones that solve classification problems. For example, the built-in system function ImageIdentify uses a pre-trained network that can recognize over 4,000 objects in images.

Image by the author using a photo by James Sutton on Unsplash

Side note: Because of the unique typesetting capabilities of the Wolfram notebook interface (such as mixing code with images), all code is shown with screen captures. A notebook with full code is included at the end of this story.

You can use the underlying neural network directly to access the probabilities for each of the 4,000+ possible objects. Clearly “domestic cat” wins hands down in this case with a probability of almost 1. Other types of cat follow with lower probabilities. The result for “shower curtain” is probably because of the background of the image. Summing up all the 4,000+ probabilities gives the number 1.0.

Image by the author using a photo by James Sutton on Unsplash

When you examine the neural network in detail and look at the layers it consists of, you will notice that the final layer is something called SoftmaxLayer. This layer is very commonly used in neural networks to assign a list of probabilities to a list of objects.

(image by author)

The SoftmaxLayer uses the softmax function which takes a list of numbers as input and gives a normalized list of numbers as output. More specifically, each element of the input list is exponentiated and divided or normalized by the sum of all exponentiated elements.

(image by author)

It is clear from the function definition that the sum of the output elements is always 1. The reason for this is that each element in the output is a fraction where the denominator is the sum of all numerators. What is less clear is how an arbitrary input list relates to an output list, because the softmax function is nonlinear.

To help with this and gain intuition, I wrote a WL function to understand softmax function inputs and outputs. It simply creates two bar charts, one charting the input list, and one charting the output list.

understand[list_List] := Row[{
BarChart[list],
Style[" \[Rule] ", 32],
BarChart[SoftmaxLayer[][list]]
}]

Let’s start with a very simple input of three zeros. In this case, the output has three equal elements as well, and because they have to add up to 1 they are all 0.333…

(image by author)

And this is true for any list where all elements are the same. For example, a four-element list of 7s will yield a result where all elements are 0.25:

(image by author)

Things get more interesting when the input elements are not all equal. Let’s start with a list of linearly increasing elements. The output is a scaled-down version of the exponential function.

(image by author)

Similarly, a list of linearly decreasing elements yields a decreasing exponential function:

(image by author)

A downward opening parabola yields an output “curve” that looks like a normal distribution (it could be exactly that?).

(image by author)

An upward opening parabola gives a much more extreme output, with the endpoint values dominating.

(image by author)

Finally, and mostly for fun, periodic functions maintain their periodicity in some rescaled form:

(image by author)

Exploring this and more in a notebook is very educational. Understanding how the softmax function works helps to understand how neural networks compute their final classification probability assignments. If you want to experiment more yourself, download this notebook from the Wolfram Cloud. If you’re completely new to WL, I recommend reading my recent post titled “Learning Wolfram: From Zero to Hero”.

--

--

I create awesome software at Wolfram Research, makers of Mathematica, Wolfram|Alpha, Wolfram Cloud, and many other products and services.