Evolutionary Computation Course
Hello and Welcome back to this full course on Evolutionary Computation! In this post we will start and finish Unit 4, Genetic Programming, of the course. In the previous post we finished up Unit 3, Genetic Algorithms, by applying an algorithm to evolve the weights of a Neural Network For Time Series Analysis. I would highly suggest reading that post first as we will trying to solve the same problem using Genetic Programming:
Unit 3 Application) Evolving Neural Network for Time Series Analysis
In this post we will cover a brief overview of Genetic Programming and how it is different from standard genetic algorithms in terms of representation of the chromosome.
Table of Contents
- Difference of Representation
- Sante Fe Ant Trail
- Setting up the Problem
- Crossover Operators
- Mutation Operators
- Application – Time Series Analysis
Difference of Representation
Genetic programming is actually a subset of genetic algorithms; however, the main difference between the two is the representation of the chromosome. Standard genetic algorithms deal with optimization problems where the phenotype is a point or vector, but now the phenotype in genetic programming is a tree based grammar.
If you did not know, all programming languages are built off a Backus-Naur Form (BNF) Grammar, which if you haven’t taken any automata theory classes simply means that each language has a specific set of rules for setting up instructions that the compiler uses to determine whether or not the code can run. Think of it like in standard English, the English grammar tells you how to gather words together and structure sentences. For example, "Tom is …", we have "noun verb …", which from the English grammar tells us that we cannot have a verb after the verb ‘is’, but an adjective, noun, or adverb. In programming, our IDE’s tell us when we have a ‘syntax error’ when we forget a keyword because it know what needs to come before or after based off this language-specific BNF Grammar. Because there is a certain structure to programs, the mix between genetic algorithms and grammar gave birth to genetic programming, which first appeared on the scene for evolving sets of instructions, called programs.
In addition to the difference of phenotype between genetic programming and standard genetic algorithms, each individual has the ability to grow or shrink its genotype by adding more terminals and instructions. As stated before, genetic programming is tree based, meaning it needs its own grammar based off computational theory, such as a terminal set, function, and transition rules. As a simple example, below we have the XOR truth table with inputs and target outputs.

Given the grammar, the operators and terminals, we want to evolve a tree based set of instructions to solve this problem. Here below we have a possible solution to this problem using a tree based structure evolved through genetic programming.

Note how similar the tree structure is to a Decision Tree, one of the many applications of genetic programming is to actually evolve decision and behavioral trees for classification or game playing.
Sante Fe Ant Trail
One of the first and most well known examples of genetic programming is in the field of robotics. The problem was known as the Sante Fe Ant Trail. The problem was that given an environment with food placed in a trail around a map, as can be seen below, maximize the amount of food collected with a limited amount of energy, steps taken. The problem was solved using genetic programming by evolving a set of instructions for the ant to take.

The genetic programming model was extremely successful in evolving a tree based if and else grammar tree for taking directional steps based off certain inputs.
Setting up the Problem
In all of evolutionary computation, there are two main preliminaries that need to taken care independent of the algorithm chosen:
- Creation of Initial Population
- Designing Fitness Function
Unlike in standard genetic algorithms, genetic programming cannot create the initial population uniformly randomly from the domain. Instead, it needs to follow the problem dependent grammar structure. To do this, we first need to define our BNF Grammar for the problem. Then we need to set a min and max depth of our tree, where depth denotes how many layers are in a tree. Defining a min and max only affects the initial population, meaning that all initial generation individuals will be between min and max depth layers while latter individuals can shrink or grow.
To create the tree for each individual we randomly select a function element from the grammar and branch out by randomly choosing terminal or function elements. Like in genetic algorithms for constrained problems, randomly generating individuals using this procedure might lead to some initial individuals that are unwanted, namely those with below min or larger than max depth. For this, we discard these individuals and keep sampling until we get an individual within this range.
Depending upon the problem at hand, the fitness function can have a range of ways to determine fitness values. In terms of robotics, there might be some test cases that need to be evaluated for evaluating how well the agent performed or for behavioral trees for game AI you might use coevolution. As with standard behavioral and decision trees, we want to reduce model complexity so that it can generalize well to new inputs, therefore we need a way to penalize overly complex models. In genetic programming, we can prevent complex models from evolving by assessing the bloating of the model, which refers to when a tree adds more terminal nodes and depth while only slightly decreasing the fitness values.
Crossover Operators
Crossover operators in genetic programming are quite intuitive. As we can see from the first picture below, given two parents, we randomly select a point in each parent and crossover that subtree at that point to create an offspring. This can either be done for one offspring, more intuitively for two offspring, where you get the first half of one parent and second half of the other.

In the early stages of evolutionary computation, crossover in genetic programming was usually never performed due to the Permutation Problem. The permutation problem is like that selecting mating discussed in Unit 3 Part 2, Advanced Concepts. You might have two parents that have similar fitness values but their architectures are so different that crossing them over leads to extremely poor individuals. This was why in the early years of evolutionary programming crossover was never performed, only mutation. However, as the field has increased, solutions to these problems have arisen. One of the most common solutions is to only crossover individuals who are similar; but now the problem is determining who is similar as you can’t use Euclidean distance. One way is to implement a genetic marker where individuals who have similar genetic markers or subtrees are allowed to reproduce.
Mutation Operators
Like crossover, mutation for genetic programming is extremely intuitive. There are many different ways to perform mutation, as we can see below. In the top left we have our original tree. Our first mutation is function node switching where we switch a random function node to another viable node. Next, we have terminal node switching where we switch a random terminal node with another viable node. Next, we have swapping arguments of a terminal term. Truncation refers to where we shrink a tree by randomly deleting a single function node. We can also add random gaussian noise to existing numeric values. Lastly, we can grow our trees by randomly introducing a single new function node.

Application – Time Series Analysis
Because the phenotype of genetic programming algorithms are tree based, this fits perfect for evolving the behavioral or decision tree in many classification or game AI problems. However, it can also be used for performing regression. In our application we will assess the same time series problem given in the previous post. To save room from repeating myself on the problem statement and dynamics, please read my previous post:
Unit 3 Application) Evolving Neural Network for Time Series Analysis
Our problem setup will be the same as before, loop over all possible window sizes and choose the best model from each window size from the validation data as the final model to be evaluated on the test set data. Now, programming a genetic programming model from scratch requires a lot of extracurricular preliminaries, such as automata theory, I will not be performing the algorithm from scratch. Instead, I will be using gplearn, a free python library designed specifically for genetic programming algorithms for both classification and regression.
Here is how we would import and run the algorithm, there are many other hyperparameters that we could use as well but to keep things simple I’ve limited it to the following:
I noticed the success of the algorithm was extremely dependent upon the initial population, so I ran the algorithm four times per window size and kept the best model from the runs. Unfortunately, gplearn does not have an early-stopping mechanism so the validation data was only used to determine the final model; however, to simulate early stopping, the maximum number of generations for evolution was limited to 100 in order to prevent overfitting of the training data. After running the script, here are the final results:

From the results we can see that the best model had a window size of 8. For comparison, we will utilize the results from the previous post which used a genetic algorithm and backpropagation to train a neural network for the same problem. Please see the previous post linked at the top for more info. The best model window size for the genetic algorithm and standard backpropagation was 5 and 3, very different than 8 for genetic programming. In terms of validation error from the best model per window size, the mean MSE value is larger than both the genetic algorithm and backpropagation; however, the standard deviation is much lower, 57% smaller than backpropagation and 42% smaller than the genetic algorithm. In terms of the testing set error, the MSE was larger than both the genetic algorithm and backpropagation; however, it is acceptable in terms of adequacy, only 10% worse than the genetic algorithm, just not preferable over the previous two methods.
For visuals, here is the overall prediction of the entire dataset using the best model with window size of 8:

Here is the best model’s actual regression tree:

Final Code
The code used in for this post is relatively short compared to the rest of the posts; however, here is my GitHub page for the full script and picture of the final regression tree:
Conclusion
This concludes the entire unit over genetic programming! It was definitely the shortest out of the series as it required knowledge of automata theory to develop from scratch, so I had to use an outside library.
The main differences between standard genetic algorithms and genetic programming is the representation of the chromosome, both phenotype and genotype. The phenotype of genetic programming models are tree based graphs where the genome has the ability to shrink or grow by adding new terminal nodes and functions.
For application, we covered the exact same time series problem as in the previous post, except using the gplearn python **** library to evolve the underlying equation of the dataset. The results showed that the algorithm was successful in predicting time series dataset; however, after comparing the results with the two methods in the previous post, it is not preferable over them. Stay tuned for the next post where we will start Unit 5) Evolutionary Programming!