Tflearn: Solving XOR with a 2x2x1 feed forward neural network in Tensorflow

Published in

Towards Data Science

4 min readJul 22, 2017

A simple guide on how to train a 2x2x1 feed forward neural network to solve the XOR problem using only 12 lines of code in python tflearn — a deep learning library built on top of Tensorflow.

The goal of our network is to train a network to receive two boolean inputs and return True only when one input is True and the other is False.

Packages

from tflearn import DNN
from tflearn.layers.core import input_data, dropout, fully_connected from tflearn.layers.estimator import regression

Input Data

We define our input data X and expected results Y as a list of lists.
Since neural networks in essence only deal with numerical values, we’ll transform our boolean expressions into numbers so that True=1 and False=0

X = [[0,0], [0,1], [1,0], [1,1]]
Y = [[0], [1], [1], [0]]

The Model

We define the input, hidden and the output layers.
Syntax for that couldn’t be simpler — use input_layer() for the input layer and fully_connected() for subsequent layers.

input_layer = input_data(shape=[None, 2])
hidden_layer = fully_connected(input_layer , 2, activation='tanh') 
output_layer = fully_connected(hidden_layer, 1, activation='tanh')

Why is the input of shape [None, 2]? The network is fed in multiple learning examples at once. Since we’re using two features per leaning example and there are four examples, our data is of shape [4, 2]. But sometimes we like to define our network so it can receive any number of training examples. We can use None for any number of training examples and define the input shape as [None, number_of_features, …]

regression = regression(output_layer , optimizer='sgd', loss='binary_crossentropy', learning_rate=5)
model = DNN(regression)

In the code block above we define the regressor that will perform backpropagation and train our network. We’ll use Stochastic Gradient Descent as optimisation method and Binary Crossentropy as the loss function.

Finally we define our (hardly) deep neural network in ftlearn using, simply, DNN().

Next, we need to train the model. During this process, the regressor will try to optimise the loss function. The end result of training are simply weights (and biases) connecting layer nodes.

Training

model.fit(X, Y, n_epoch=5000, show_metric=True)

After running model.fit(), Tensorflow will feed the input data 5000 times and try to fit the model.

If your output looks something like this (aim for small loss and high accuracy),

>>> Training Step: 5048  | total loss: 0.31394 | time: 0.002s
| SGD | epoch: 5048 | loss: 0.31394 - binary_acc: 0.9994 -- iter: 4/4

your model is 0.999% accurate meaning that it successfully learned to solve the problem.

Note that your regressor will not always yield same results. It might even fail to learn to solve our problem correctly. This is because network weights are randomly initialised every time. Neural networks also need a lot of training data for backpropagation to work properly. Our code is therefore very dependent on how the weights are initialised.

Prediction

To check whether out model really works, lets predict all possible combinations and transform the outputs to booleans using simple list comprehension

[i[0] > 0 for i in model.predict(X)]
>>> [False, True, True, False]

Great! Our model works.

But what logic did the model use to solve the XOR problem? Let’s check under the hood.

Weight Analysis

Unlike AND and OR, XOR’s outputs are not linearly separable.
Therefore, we need to introduce another hidden layer to solve it. It turns out that each node in the hidden layer represents one of the simpler linearly separable logical operations (AND, OR, NAND, …) and the output layer will act as another logical operation fed by outputs from the previous layer.

If we were limited to using only simple logical operations, we could define XOR as

XOR(X1, X2) = AND(OR(X1, X2), NAND(X1, X2))

To understand what logic our network uses to come up with results, we need to analyse it’s weights (and biases).
We do that with model.get_weights(layer.W) to get the weights vector and model.get_weights(layer.W) to get the biases vector.

print(model.get_weights(hidden_layer.W), model.get_weights(hidden_layer.b))
print(model.get_weights(output_layer.W), model.get_weights(output_layer.b))>>> [[ 3.86708593 -3.11288071] [ 3.87053323 -3.1126008 ]]
    [-1.82562542  4.58438063]
>>> [[ 5.19325304]
    [-4.87336922]

The image below shows which node the individual weights belong to (numbers are rounded for simplicity)

X1 and X2 are our input nodes. a1 and a2 are nodes in our hidden layer and O is the output node. B1 and B2 are biases.

What does this tell us? Well, nothing much yet. But by calculating the activations of nodes for individual inputs we can see how a particular node behaves.
We’re using the formula (× stands for matrix multiplication):

activation = tanh(input × weights + biases)

Note that we’re using tanh() meaning that activations will be in range [-1, 1].

Rounded node activations for individual input combinations for acquired XOR neural network

a1 is True (1) when there’s at least one 1 supplied in the input. a1 node therefore represents OR logical operation
a2 is True always apart from when both inputs are True. a2 node therefore represents NAND logical operation
output node is only True when both a1 and a2 are True.

The output node can be rewritten as:

O(X1, X2) = AND(a1(X1, X2), a2(X1, X2)) = AND(OR(X1, X2), NAND(X1, X2))

The trained network is therefore an AND operation of OR(X1, X2) and NAND(X1, X2)

Note that results will vary due to random weight initialisation, meaning that your weights will likely be different every time you train the model.

Full source can be found here: