Hopfield Networks are useless. Here’s why you should learn them.

Published in

Towards Data Science

6 min readApr 16, 2019

Hopfield networks were invented in 1982 by J.J. Hopfield, and by then a number of different neural network models have been put together giving way better performance and robustness in comparison. To my knowledge, they are mostly introduced and mentioned in textbooks when approaching Boltzmann Machines and Deep Belief Networks, since they are built upon Hopfield’s work.

Nevertheless, they provide such a different and alternative perspective on learning systems compared to the current state of deep learning that they are worth understanding, if anything, for the sake of it.

Let’s find out how they work.

A basic Hopfield Net and how it works

At its core a Hopfield Network is a model that can reconstruct data after being fed with corrupt versions of the same data.

We can describe it as a network of nodes — or units, or neurons — connected by links. Each unit has one of two states at any point in time, and we are going to assume these states can be +1 or -1. We can list the state of each unit at a given time in a vector V.

Links represent connections between units and they are symmetric. In other words, the link between node-i and node-j is identical whichever direction you go in the graph. In particular, what is identical is the number representing the strength of the connection between each unit. We can list these numbers in a matrix W.

If we have a network like the one in Fig. 1, we can describe it with:

A state vector and a weight matrix describe the graph at some point in time

How do we build W? We do it by first noticing that there is no connection between a node and itself, so we make this explicit by zeroing waa, wbb, wcc, wdd. We also know that links -or weights- are symmetric, so for instance wab = wba. So W is zero-diagonal and symmetric. The rest of the elements are filled in this way: wab = Va · Vb. The result is matrix W. This turns out to be a smart choice for the connection weights, as it follows the Hebbian rule we will see again later.

The weight matrix fully describes the network connections

Now that we built up the weight matrix we need to define a rule to determine the states of each node. Since we have binary states we can use this law:

By using the weight matrix defined above we have, similarly:

Now we can get an intuition of how Hopfield Networks actually work.

Say we have a corrupt version of V, and we will call it V’ = [1 -1 -1 -1] , such that the last bit was reversed. We can initialize the network states with V’.

Network initialized with corrupted states: d is now -1

We can proceed updating the network states with the rules we established before. For Va we will have:

Va = f(wab · Vb + wac · Vc + wad · Vd) = +1

Note that wab · Vb and wac · Vc have both positive contributions to x, despite Vb and Vc being -1, pushing Va to be +1, while the corrupted bit contributes in the opposite direction. In other words, by building W as before, we force nodes to settle in opposite states when they are supposed to be of opposite sign, and to settle to the same state when they are supposed to be equal. Opposite states repel each other, while same states attract instead. This is one formulation of the Hebbian rule.

Hebbian Rule: Neurons that fire together, wire together

Similarly:
Vb = f(wba ·Va + wbc · Vc + wbd · Vd) = -1
Vc = f(wca · Va + wcb · Vb + wcd · Vd) = -1
Now the corrupted bit:
Vd = f(wda · Va + wdc · Vc + wdb · Vb) = +1

And Va = +1 attracts node d to be positive by having wda · Va>0, while Vc=Vd= -1 repel node d by giving an overall positive contribution wdc · Vc + wdb · Vd >0.

Nodes that were originally the same, are driven to be the same, nodes that were originally of opposite sign repel each other to be opposite.

The original V is magically recovered!

But why does it work?

So far we saw that once we completely define the network -its W- with a state vector V that we want to recover after corruption, we can do it by just updating the network states.

In other words, after initializing the network states with V’ we let the network evolve with the laws we defined before, and it will converge to the states we wanted in the first place. Not only that, but it will remain there no matter how many updates we keep on doing.

Let’s find out why.

We can define a function that depends on the states of the graph and the W matrix. We will call this function the Energy function associated with the network states and denote it with:

Result #1

If a node Vi changes its state from +1 to -1 or vice versa, we will have that:

Now:
If Vi changed from -1 to +1, then dVi = +2
Which means x has to be positive,
And in turn, the Energy delta has to be negative

This means that if we update the network according to our rules, the energy function will always decrease, it is monotonically decreasing, and it will try to reach its lowest point.

Result #2

But is there a lowest point or will the energy keep decreasing to negative infinity?

In other words, we are trying to figure out if the Energy delta can be zero.
To have dEi = 0 we need, for instance, to have dVi = 0, which is true only when Vi(k-1)’ = Vi(k)’, with Vi(k-1)’ being the node state before the update and Vi(k)’ after the update.

Let’s suppose we have Vi(k-1)’ = +1, we want Vi(k)’ = +1, or similarly
xi(k) > 0.
Which is:

But when Vj(k-1)’ = Vj then xi(k) is always positive!

In one shot, we showed that when the states assume the original value (the uncorrupted value) the Energy function will not change anymore. In other words, dVi = 0 and the node will not update to different values — the configuration is said to be stable.

So we proved the Energy is always decreasing until the network reaches a stable configuration of the node states. Even more, the stable configuration is the configuration that corresponds to the restored state vector, a local minimum of the energy function.

There are a number of implementation details that were spared here, but a basic, working Hopfield Network is in this Jupyter Notebook I prepared here.

Hopfield Networks are useless. Here’s why you should learn them.

A basic Hopfield Net and how it works

But *why* does it work?

Result #1

Result #2

Written by Filippo Galli

But why does it work?