
Deep learning is a branch of machine learning whereby you feed a machine with data and answers, and the machine figures out the rules by which the answers are derived. The answers are the labels for which the data represents for example for data about house prices, the label is the price and the data is the various aspects of a house that affect the price. Another example is image-data about cats and dogs, and the labels are whether an animal is a cat or a dog.
Defining key terminologies
Artificial neural networks, or ANNs, are the building blocks of deep learning. ANNs were first introduced in 1943, and their application has recently taken off due to vast amounts of big data, a massive increase in computing power, and a lot of attention and funding directed towards their progress.
Parts of a neural network
The image below is a basic representation of a neural network.

Neuron – An artificial neuron is a unit or node which has one or more inputs and one output. Each input has an associated weight which can be modified during training. The smiley circles above each represent a neuron.
Layer – This refers to a collection of neurons operating together at a specific depth in a neural network. In the figure above, each of the columns represents a layer and the network has 3 layers; the first layer (input layer) has 3 neurons, the second layer (hidden layer) has 2 neurons, and the third layer (output layer) has 1 neuron.
Dense Layer – This refers to a set of fully connected neurons where every neuron in a layer is connected to every neuron in the next layer. Dense layers are the most common types of layers.
Deep neural network (DNN) – This is when a neural network contains a deep stack of hidden layers (several of the middle columns). Deep learning is therefore a field that studies models containing deep stacks of computations.
Tensorflow – This is an open-source platform that contains many of the common algorithms needed for Machine Learning. Here, you can create and use machine learning models without having to learn all the underlying math and logic going on behind the scenes.
Keras – This is a simple and flexible high-level deep learning API for building, training, evaluating, and running neural networks. Tensorflow comes bundled with its own Keras implementation called tf.keras
.
Loss Function – this measures the network’s output error by comparing the desired output (y) to the network’s actual output (ŷ) and returns the error. Examples of loss functions are mean squared error and cross-entropy.
Optimizer – This is a technique for modifying the attributes of the neural network such as the weights and the learning rate so as to reduce the loss. Examples are Stochastic gradient descent and Adam.
Training a model
This is where a machine uses a set of algorithms to learn about inputs from a given set of data and can identify the underlying patterns that distinguish them. Let us take a simple example of data with 2 columns, x and y, and 6 rows, and the relationship between x and y are such that y=2x+1. Therefore x is the input data and y are the labels.
First, the neural network makes a wild guess, perhaps y=9x+9, and compares the actual output of Y to the guess by calculating the Loss function (such as mean squared error). The Optimizer function (such as stochastic gradient descent) then comes in to make another guess by minimizing the loss, maybe by coming up with something like y=4x+4. It will then repeat this for the given number of Epochs (repeats of the above process). We will implement this in the code below.
A simple implementation of a neural network
The data
The table below represents the data described above. We have two columns, x and y, and there is a relationship between the two sets such that for any given row, the value of y =2x+1.

We will feed our neural network with the data and have it determine the relationship between the 2 sets. X is the input data (or input feature) and yare the labels. So here we are feeding the neural network with the data(x) and the answers(y), and its task is to identify the rules(formula) that map x to y, so that it can be able to predict the value of y given a value of x that it has not previously seen.
Install the packages
Since Keras is bundled with Tensorflow 2.0 and above, we can just install Tensorflow. If you have a python environment installed on your machine, you can install the CPU version using the line below. Omitting the -cpu
will install the GPU version.
pip install tensorflow-cpu
Follow this tutorial to install Tensorflow and Keras on anaconda.
Another option that is great for getting started is to write and run your code on Google colab, which is a free browser-based Jupyter notebook that runs on the cloud and requires no setup as most packages, including Tensorflow, are pre-installed.
Import the libraries
Next is to import the libraries that are needed for the project.
import tensorflow as tf
import numpy as np
from tensorflow import keras
Define the model
To define our model, we call the Sequential constructor. Sequential groups a linear stack of layers and specifies what each layer looks like. You can have multiple dense layers as a list inside one Sequential constructor.
model = tf.keras.Sequential([
keras.layers.Dense(units=1, input_shape=[1])
])
The code above constructs a very simple neural network that has only one dense layer with only one neuron, and since this is the first layer, we have to include the shape of the input data is. In this case, for every instance (row), x is a single value hence input-shape=[1]
.
Including an input shape in this step is important because all layers in Keras need to know the shape of their inputs to create their weights. Failing to include input-shape
results in a model built with no weights, which are only initialized when the model first sees some input data. After this, the model is built, and you can call the model.get_weights()
and the model.summary()
methods to display its contents.
Compile the model
The next step is to compile the neural network, and here we have provided two functions; a loss
function and an optimizer
. The loss function evaluates the set of weights by how well the model predicts, while the optimizer modifies the weights to reduce the loss. 'sgd'
below stands for stochastic gradient descent.
model.compile(optimizer='sgd', loss='mean_squared_error')
This line basically defines the type of mathematical computations that will take place in the back-end when training the model in the step below.
Training the model on the data
This is where the model learns the relationship between X and Y.
x = np.array([-1.0,0.0,1.0,2.0,3.0,4.0,5.0],dtype=float)
y = np.array([-1.0,1.0,3.0,5.0,7.0,9.0,11.0],dtype=float)
model.fit(x,y,epochs=500)
We call the model.fit(x,y,epochs=500)
to train the model. Check out this Keras documentation for other optional parameters. We provide x and y as Numpy arrays as expected by the model. X is the input data, and y is the label. Epochs define the number of iterations. The process of guessing, reducing the loss, and using the optimizer to make the next best guess happens in this step. Below is a screenshot of the output displayed when training the model in a Jupyter notebook. Note how the loss is decreasing as the epochs are closing in on 500/500.

Making predictions
Now that a model has been trained on the data we provided, we can now predict by calling the model.evaluate([value])
method to predict y for a given value
of x. The value is enclosed in square brackets because the expected parameter is an array.
print(model.predict([10.0]))
###Results
[[21.006134]]
We know from the y=2x+1 relationship that the answer should be 21, but here we see the results are slightly different than the expected value. This is due to the fact the model was trained on a small dataset with only 6 rows, and also because the loss was not exactly 0. Note that your value may be different because of the initial random weights by the model.
Summary of the model
Each neuron in a model learns a weight and a bias. Our model above has only one single neuron and therefore it will learn weight and bias in the relationship as y = Wx + b
, where W is the weight and b is the bias. With Tensorflow, these are the values that the model learns and it is possible to view them by calling model.get_weights()
.
model.get_weights()
###Results
[array([[2.000743]], dtype=float32), array([0.9972024], dtype=float32)]
This returns two arrays containing the weights **** such that the relationship that was learned is y = 2.000743 x + 0.9972024 which is very close to the true relationship of y=2x+1.
Wrap-up
That was a basic introduction to deep learning, Tensorflow, and Keras where we used a neural network to determine a linear relationship between two sets of values. Although this may seem like a very simple problem, the concepts and workflow we learned are essential for tackling more interesting and complex problems in deep learning. Check out the references below that I found very useful for beginners starting in the field.
References
Neural Networks and Deep Learning by Andrew Ng on YouTube.
MIT Introduction to Deep Learning 6.S191
Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: 2nd Edition