Evolutionary Computation Course
Hello and Welcome back to this full course on Evolutionary Computation! In this post we will wrap up Unit 3 with the much anticipated application of evolving the weights of a Neural Network for Time Series Analysis!
The concepts and material you need to know to best understand this material are the foundations on how neural networks work, a thorough overview of evolutionary computation, and how genetic algorithms work. If this is your first time seeing this series, please read these two articles I’ve previously written so that you can best understand how I am developing the algorithm I am going to detail shortly:
Table of Contents
- Time Series Problem
- Review of Neural Networks
- Implementing our Neural Network
- Design of the Neural Network
- Training, Validation, and Testing Data Sets
- The Genetic Algorithm
- Evaluating our Algorithm
- Comparison by Neural Network Trained through Back-Propagation
- Final Code and Video Implementation
- Conclusion
Time Series Problem
Time series analysis refers to any series of data where the data are sequential in terms of order of time. Time series analysis is extremely applicable in many settings, from economic to weather projections and forecasting. The problem we will be analyzing will be a Sunspot Time Series dataset. Sunspots Cycle’s are solar magnetic activity that occurs on the sun. The monthly mean number of sunspots has been recorded since 1749. Our problem is to see how well we can model this time series problem and predict future monthly mean numbers. Here we have a view of what the time series data set looks like over its entire lifespan:

You can find the data here on this Kaggle site:
Review of Neural Networks
We will be attempting to solve this problem through a Feed Forward Neural Network. Artificial neural networks were derived with biological inspiration from biological neural networks, the basic unit of the brain. These models have the ability to perform classification, regression, and other techniques. Neural Networks have become extremely common in Machine Learning and Artificial Intelligence due to their unexpected success in many problem types. Because we will heavily be dealing with neural networks, I am going to give a brief overview on how they work. If you have any experience working with neural networks in detail, feel free to skip this section as it will mostly be review.
The architecture of a neural network is split into three components, the number of hidden layers, the number of nodes per layer, and the activation function per layer. The weights between each layer is represented by a matrix. We can see in this elementary example below, we have three input nodes (meaning three variables), four nodes in the first hidden layer, five nodes in the second hidden layer, and one output.

We represent each connection between two neurons by a weight, which can be combined to form a matrix. As such, our input is an [N, 3] matrix, N representing the number of observations we want to pass and 3 being the number of variables. A forward pass through our network is simply multiplying the matrices together, applying our transition function at each layer, and then getting our result. One critical thing not listed in the diagram is the bias vector at each layer. The bias is an [1,M] vector added to each observation after the activation function, where M is the number of nodes for that layer.
Neural Networks are commonly trained through a form of backpropagation; where the current error, either MSE or Cross-Entropy, is propagated back from the output layer throughout the hidden layers to adjust the weights such that they minimize the error. However, in our application, instead of using backpropagation we will use a genetic algorithm to train the weights.
Design of the Neural Network
For our problem, how can we use a Neural Network to predict a Time Series problem? For one dimensional time series problems, you only have a single variable and a time index, how can you predict that variable based off time? The most common implementation for solving time series problems in neural networks is by using what is known as a recurrent neural network. A ** recurrent neural network is a network** where there is a recurrent layer from the output that is fed back into the input layer for the next value at the subsequent time index. However, an easy way to get around this implementation is to simply make the input layer a ‘window’ that spans some value of indices before the current value.

Take for example the figure above, here we have a subset of the time series problem. Our goal is to predict the value in the black boxes. For the first row, we want to predict the value 110.5, which we can do by feeding 141.7, 139.2, and 158 as the input to our neural network. Note in this example above, our ‘window’ size is three. We will use the previous values up to the current, non inclusive, as input variables for prediction. The exact size of the window is problem dependent and usually has to be tested. For our problem, we will test various window sizes. To keep things simple, our neural network will have three hidden layers, each with 5 hidden nodes and the ReLU activation function. However, the number of inputs will change dependent upon the ‘window’ size.
Implementing our Neural Network
Because we will be evolving the weights of a neural network, we need to implement one from scratch as it would require a lot of maneuvering to do so in common Python implementations. We will implement our network to accommodate an arbitrary number of layers and nodes per layer, but will only have the ReLU activation function for each layer. I would try to explain the code below but it deals with the fine details of matrix multiplication for neural networks, which can get extremely intense fast; so for now, don’t worry about it if you don’t understand how exactly it works as it is not the focus of the post.
Training, Validation, and Testing Data Sets
The most common downfall for all machine learning models is known as over-fitting. This occurs when our model has so many tunable parameters that it starts to memorize the input. As a result, it performs worse when predicting a value that it has never seen before than the values it was trained upon. Below we have an example of over-fitting where we can artificially minimize the error of our residuals by simply increasing the degree of the polynomial; however, this prediction is extremely hyperparameterized for the data and does not accurately represent the actual trend, thus it is extremely erroneous for data that it has not seen before.

The goal of all of Machine Learning and Data Science is to create simple but yet powerful models that can adapt and can interpolate new values accurately. There are many different ways to prevent overfitting in neural networks, from dropout layers to cross-validation; however, the most common is simply early stopping. To accomplish this, we can split our dataset into two main parts, training and testing data sets. We use our training dataset to train our model and then test its performance using the testing dataset. In this way, we train on our training set and once we start to see overfitting, meaning the error in our testing set starts to increase, we stop our algorithm. This is known as early stopping. As we can see below, our model starts to overfit the training data as the training error still decreases but the testing data stagnates and start to slightly increase. We need to perform early stopping after our testing data error increases by some value after a few iterations to ensure that we prevent over-fitting.

However, there is a critical problem with this, we are stopping training when our model starts to create worse error values on our testing dataset; in this way, we’ve actually just trained our model on both our testing and training dataset as we’ve stopped when our model performed worse on the testing set. This is exactly what we wanted to get rid of to begin with… To get around this, we can introduce the validation dataset, which replaces the testing dataset in early-stopping. We now split our dataset into three parts, the training, validation, and testing sets. We train our model on the training sets, then we perform early stopping, parameter tuning, and model comparison using the validation set. After, and only after, we’ve chosen our final model, along with its final architecture, do we evaluate the testing dataset. By doing this, we ensure that our model has never seen the values in the testing dataset in order to get an accurate evaluation of our model.
These are important concepts that we will come back to when we implement our genetic algorithm for our neural network to prevent overfitting.
The Genetic Algorithm
Now it’s time to discuss the exact implementation details of our evolutionary algorithm. As discussed in the previous posts, each individual is made up of its genotype and phenotype. The genotype represents the actual genetic code of the individual and the phenotype represents the individual in the environment. In our problem, the genotype is made up of the weight and bias matrices, where each matrix is a gene in the genome. For the phenotype, the weight and bias matrices compiled together form a neural network in the environment. Now that we know how we will encode our chromosome, it is time to discuss the reproduction operators. For selection, we will use roulette wheel selection, which works by creating a cumulative distribution from the proportion of an individual being chosen based off its fitness value:
For crossover, we will implement the averaging technique, which takes a linear combination of the parent values. For our problem, the offspring weight and bias matrices will just be a linear combination of the parent’s matrices. To do this, we first instantiate a new EvolvableNetwork, except this time with initialize equal to False, because we do not want the weight and bias matrices to be initialized to random values as we will create them from the parents:
For mutation, we will simply add some small random value to each entry of the weight and bias matrix for all matrices:
Unlike in our previous Genetic Algorithm implementations, this particular implementation will not have hyperparameters for probabilities of mutation, crossover, or elitism. Instead, we can reduce the complexity of tuning our algorithm by getting rid of these parameters. To do this, our set of parents will create a set of four children, where the children will be pooled along with their parents and the individual with the best fitness will be chosen to survive. For the offspring, all four will be created through crossover by different coefficient values, and then the last two cross-over offspring will be mutated by different random values. By doing this, we guarantee our algorithm will converge as the best individual from the offspring and parents will be chosen to survive, and we reduce the need for tuning the algorithm as the set of offspring will contain both crossover and mutated individuals. Here is what our reproduce function will look like for two parents:
Now it is time for our fitness function, it will take the Mean Sum of Square Errors between the predicted and the actual values for our time series problem:

Because we are wanting to minimize the MSE error function, we need to scale our fitness values such that smaller values yield larger, and larger values yield smaller and then maximize that scaled fitness. We can do this through the following function:
For a visual representation on how this works, see the graph below where the x-axis is the original fitness value and the y-axis is the scaled fitness value. After scaling, values near 0 will tend towards 1, to which then perform standard maximization upon.

Next, we need to split our data into the training, validation, and testing sets. We will train our algorithms using the training dataset and use the validation set to compare our models and to prevent overfitting by early stopping. Lastly, after we’ve chosen our final model, we will evaluate it’s accuracy through the test dataset. We will pass our training and validation data to our evolution algorithm. As stated before, we will test different window sizes so we will loop over all possible window sizes, create the data, and test our algorithm:
Now that we’ve defined all our auxiliary functions, it is time to define the body of our evolutionary algorithm. Our algorithm will work by training on the training data, and early stop if the mean error for the validation data for the current generation increases for three straight generations. After converging, or early stopping, the validation data again will be used to select the best model from the current generation.
For evaluation, we will find the model with the best validation score from each window size and then recreate the data based off that window size to evaluate our test MSE score.
Evaluating our Algorithm
FINALLY, it is time to test our genetic neural network. After compiling the scripts given thus far, the results are the following after running at each window size from 3 to 10 for 200 generations with a population size of 100 individuals, max mutation value of 0.1, and network architecture of [5,5,5]:

From the results of the best models for each window size, the model for window size 5 has the smallest MSE for the validation data set, so we choose it to be the final model. After evaluated the final model on the Test Data Set, we get an error of 616.378. The mean validation error hovered around 618 with a standard deviation of 36. For visuals, here is the overall prediction of the entire dataset using the best model with window size of 5:

Testing Neural Network Trained through Back-Propagation
Now that we’ve evaluated our algorithm, it is time to test it against a neural network trained through back-propagation to see if all this trouble was even worth it. Because at the end of the day, if a neural network trained through backpropagation can outperform our genetic algorithm, then why waste time doing so? Because most people have scikit-learn installed, I will be using the neural network implementation in scikit-learn, called MLPRegressor. As with our genetic algorithm, I will test each neural network at different window sizes and choose the best model for each run based off the best validation dataset error to be the final model. To make ensure a fair comparison, the neural network trained through backpropagation will work with the same training, validation, and testing data sets. Unfortunately, the early-stopping mechanism in MLPRegressor creates the validation dataset from the given training data instead of passing in the specific data. Where is what it looks like to train an MLPRegressor in scikit-learn:
As a result, the validation data was only used to compare the models. Here are the results:

From the results we can see that the window size of 3 yielded the smallest validation error, which is extremely different from the genetic algorithm; however, the validation score for the window size of 3 was comparable to that of size 5 for the genetic algorithm. For a comparison between the two frameworks, we can juxtapose by usage of the mean and standard deviation for the validation errors. The genetic algorithm had a better mean validation error by 4.65% and a smaller standard deviation by 25%. In addition, its Test MSE was better by 4.18%. Therefore, it is rightful to conclude that our genetic algorithm outperformed backpropagation for training neural networks for this particular problem. Below is the prediction of our entire time series dataset for the neural network trained through backpropagation.

Final Code and Video Implementation
There’s been a lot of code flying around in this post, here is a link to my GitHub here the entire script is intact and is actually the same as what I used here. Just run it and watch the magic happen!
For any one who is wanting to understand the code more in depth, I’m going to make a video implementation video here shortly and place it here. (Meaning if you see this and the url is not posted, wait a week or so)
Conclusion
In this post we applied the knowledge we’ve learned over the past couple articles to evolve the weights of a neural network for Time Series Analysis. By splitting up our data into a training, validation, and testing set, it allowed us to accurately train, compare, and evaluate our models. The results showed that the genetic algorithm outperformed back-propagation for training the neural network for the given time series problem. These results are extremely promising.
In the realm of Computational Intelligence, applying genetic algorithms to neural networks is actually a sub-field known as Neuro-Evolution. Neuro-evolution can come in many different shapes and sizes. In this paradigm, not only are the weights evolved, but also the number of layers, number of nodes, activation functions, and neuron connection directions. Neuro-evolution ** is commonly applied to reinforcement type problems when the fitness function is non-differentiable, thus backpropagation cannot be applied; or when the neural network size is small, as in this example. Unfortunately, Neuro-evolution is not a competing field for training Deep Neural Network*s as these networks have thousands upon thousands of parameters for weight connections, which can make crossover and mutation in standard genetic algorithms extremely time expensive as it takes O(nm) to evaluate each weight matrix. However, Neuro-evolution is a growing field of interest not for evolving the weights of Deep Neural Networks but the architecture and hyperparameters, which are then trained using standard numerical methods such as back-propagation. An example of this might be evolving the number of nodes per layer, layer activation functions, inclusion of drop-out or convolution layers, and algorithm hyperparameters for back-propagation such as momentum and learning rate. This is known as Auto-Machine Learning, getting rid of the need for a data scientist to ‘tune’ the algorithm; instead, utilizing Genetic Algorithms to find the best set of hyperparameters and architecture for the given neural network.
In conclusion, this post wraps up Unit 3) Genetic Algorithms. Even though this unit ran a little long, four posts now, I hope it has been extremely beneficial for you in your understanding on how genetic algorithms work and how they can be used in practice. In the next post we will start Unit 4) Genetic Programming, which will be extremely short and tackle the same time series problem here!