Neural Networks in Swift

Creating Neural Network framework in Swift from scratch

Eugene Stsefankou

Published in

Towards Data Science

8 min readMay 10, 2021

Introduction

We live around great technologies like self-pilot cars, voice assistants, image analyzers. Most of them use neural networks as the basis for their algorithm. We’re going to build a basic neural network from scratch using Swift.

Some theory

Before we can develop an application, we must learn the theory of neural networks. Neural networks are made up of input, output and hidden layers. Full-connected simple layers are composed of neurons. Synapses connect neurons to each other. Each synapse has its own weight value. During the generation of the neural network, it fills the weights with random values. They will be changed while training (in the animation below, red indicates high weight and blue indicates low weight).

Training neurons. Image by Yauheni Stsefankou

The training process is divided into several epochs. We process all data samples in the dataset at each epoch. We fill the input layer with data sample’s input. Each layer then processes the input through the activation function and sends it to the next layer through synapses. When a value passes through a synapse, it multiplies the value by the synapse weight. When the neuron’s value is calculated, it sums up all the input values and bias.

Forward propagation. Image by Yauheni Stsefankou

Here is an example of a neural network. It gets an unsigned 4-bit number and returns whether the number is even or not. Note that all binary representations of even numbers end with 0. This means that prediction depends only on the last bit (the fourth neuron from the input layer). This is why I’ve highlighted the synapses that are connected to the last neuron in the input layer. Other synapses from the input layer make no sense and their weights approach zero during training.

Neural network example. Image by Yauheni Stsefankou

Short review

Our neural network framework will not be fast due to the lack of very tight optimizations and GPU compatibility. Its only optimization is multithreading by DispatchQueue’s concurrent performing instead of loops.

Concurrent performing vs For loop. Image by Yauheni Stsefankou

This implementation consists of a Dense layer, sigmoid function, training and prediction methods.

Let’s start!

Dataset structure

We have to implement Dataset that contains all the samples we want to train on.

Here is a Dataset model.

Dataset model. Image by Yauheni Stsefankou

Note that the dataset consists of data samples. Let’s implement the structure in Swift.

Our second observation is that data samples include input and output data.

And the data pieces are composed of a body and its size for convenience.

The size of the data must have a width, optional variables are height and depth (depends on the type of data size). This is why we provide initializers for all data size types.

Data piece can be 1D, 2D or 3D.

NN model class

We have implemented a dataset system in our neural network. Now we need to create a neural network class. It includes layers, learning rate, number of epochs per training, number of samples in each training batch.

Now we need to implement the Layer class. Each layer includes several neurons (except for convolutional layers) and an activation function for them. In addition, we will keep the output cache for backpropagation.

Each neuron stores weight for each synapse that goes to it and the bias. In addition, we keep a cache to update the weights to support batch gradient descent types. There are 3 gradient descent types: batch, stochastic and mini-batch. With batch and mini-batch gradient descent, the weights are updated after multiple propagations. In addition, we keep the delta for backpropagation.

Neural networks use many activation functions such as sigmoid, ReLU. We need to define these activation functions as a protocol. Each activation contain transfer (for forward propagation) and derivative functions (for backward propagation). We also store ID (rawValue) for enumerating functions.

And now for this protocol it is possible to create structures of activation functions. Let’s make a sigmoid function.

Sigmoid function

I think you’ve seen the graph of the sigmoid function. Here it is:

The graph of the sigmoid function. Image from Wikipedia by Qef

The sigmoid function is suitable when we need to get the probability at the output. For example, the probability of even numbers.
The sigmoid derivative function is the product of the output and the difference between the output and 1.

Our sigmoid structure contains its identifier (rawValue), transfer and derivative functions.

In addition, we need to create an enum for our functions and a method to get them from the identifiers.

Layer functions

Each type of layer should have individual propagations. Here they are:

Training algorithm

The first thing we need to do is divide a dataset into batches. We can do this by shuffling data samples and then dividing them into groups with a size equal to the preferred batch size.

Dividing into batches. Image by Yauheni Stsefankou

Then, for each sample in a batch, the algorithm performs forward and backward propagations. After using all samples in batch, the algorithm updates the weights for all the layers.

Propagations executing

Forward propagation sends the output of neurons through synapses to the next layer.

Backpropagation spreads a prediction error by using forward propagation cache and pushing it toward the input layer. For the first propagation, sample’s expected value is used as input.

The delta weights method calculates the change in weights and stores them in the cache. It works like a forward propagation.

Dense layer

The first Layer subclass in our neural network is a Dense layer. We considered a Dense layer in the section Some theory. This is the simplest layer of neural networks.

OK. We have initialized a Dense layer, now we need to write propagation functions for it. Instead of loops we use concurrent performing from DispatchQueue. Forward propagation generates an output of each neuron by multiplying the weight of each neuron and the inputs passed to it. It also adds bias to the result. And the last operation with the result is the use of the activation transfer function.

Backward propagation uses the output of forward propagation to spread the error across all neurons. It spread errors using the activation derivative function. Multithreading is not optimal here, so we don’t use it.

The delta weights method is using concurrent performing. When building neural network model, you should find a sweet sport for the learning rate. Also, bias is updating during this method execution.

After all the items in the batch are used, the neural network updates neurons’ weights with propagation cache. And, of course, it is optimal to use multi-threading concurrent performing here.

And the second important function of a neural network is predicting. To predict the result, the function receives the forward propagation result.

Using written framework to implement neural network model

Hooray! We have implemented the foundation of our neural network framework, sigmoid activation and a Dense layer for it. Now we can check this with a simple example.

The task of this example is to classify a number by parity. This is an unusual task for neural networks, but easy to understand.

To initialize a new neural network model, we need to put this in code:

After that, we can change some parameters of our model (batch size, number of epochs and learning rate):

The most important part when implementing a neural network model is setting up the layer structure. Here is the layer structure example:

After that, our model is ready for training. It all starts with feeding a dataset into a model.

We put the input number as its binary and get the probability that it is even. The first sample from our dataset is [0.0, 0.0, 0.0, 1.0], and its expectation value is 0.0. This means that binary representation of input number is 0001. In decimal representation, it will be 1. It is an odd number. Therefore, we expect 0.0 (zero probability that the number is even).

After training the neural network, you can start using it for predictions. You can get a prediction by feeding data sample to model.

The result of training with 30 epochs. Image by Yauheni Stsefankou

The result is 0.45, which means that the answer is 0 (odd) with 55% probability or 1 (even) with 45% probability.

The result of training with 1000 epochs. Image by Yauheni Stsefankou

When we change the number of epochs to 1000, training takes thirty times longer, but the probability of the evenness of 1111 changes to 3.8%. This means that our neural network model has become more accurate.

Model saving

To save the model, we need to prepare all structures and classes. They must be compatible with the Codable protocol for JSON encoding. Almost all classes and structures only need to add the Codable protocol to the declaration:

We implemented the layer as a class, not a structure. The class requires encoding and decoding methods to conform to the Codable protocol. Encoding and decoding methods require keys for all the variables.

After adding it, we have to add the initializer from the decoder to our Dense layer.

We need to write a wrapper for the layer because Layer class is not final and it can have subclasses. It encodes not only the subclass of the layer, but also the type of the subclass.

After writing the wrapper, we need to replace the encoding and decoding methods of the neural network model with our own. Rather than encode and decode the layers as such, we encode the wrappers of the layers, because we need to store the type of the Layer.

Now we can write a method to save the model. First, we have to initialize JSON encoder and use it to encode the model. We then get the URL of the current application directory and write the encoded data there.

Reading a model from a file is no more difficult than saving it. Now, instead of initializing the JSON encoder, we are initializing the JSON decoder. Then we get the data from the file and decode it with our decoder. And the last thing we need to do is initialize the model from the decoded model variables.

Conclusion

We have created the basic neural network framework and now we can add any types of functions and layers. Here is a link to a git repository with all the source code and some extra features (Dropout, Flatten and Convolutional 2D layers, ReLU activation): https://github.com/stefjen07/NeuralNetwork.

Thanks for reading. If you have any tips or you found mistakes in this article, please leave your feedback in the comments. I am ready to improve the experience of my readers.