The world’s leading publication for data science, AI, and ML professionals.

Looking to Nature for Optimization

Using what the universe has always known

Photo of Annealing by Francisco Fernandes on Unsplash
Photo of Annealing by Francisco Fernandes on Unsplash

Thoughts and Theory

Optimization problems are ubiquitous with modern computing and economics. They work to find the "best" solution in a complex space with many possibilities. Nature is incredible at this! For as long as things have existed, nature has been finding ways to do things in the best way possible. Let’s see how we can use techniques found nature to solve modern problems.

To formalize an optimization problem, a solution space is considered, of possible ways to solve the problem. Each solution has a value K that we are seeking to maximize. However, this can be incredibly challenging when the solution space is so large that it would take too long to check every solution.

A common example of optimization is the Travelling Salesman Problem. In the most basic version, you have a number of random points on a map. You must find the path of lines connecting them that has the smallest overall distance. In this slight variation, we are seeking to minimize K (or we can maximize -K).

Let’s see a simple solution space to help us visualize!

Figure 1: Example Solution Space, made in Desmos
Figure 1: Example Solution Space, made in Desmos

Figure 1 has different possible solutions to our problem. I didn’t include a scale or units because they are arbitrary in this example. Each point represents a different configuration. Points near a solution represent a neighborhood of similar solutions. This is done so that moving from a point to the point next to it only requires a minimal change in the solution. For the Travelling Salesman Problem, a solution neighboring another means that they are identical except two adjacent "stops" on the route have their order switched. We want neighboring solutions to have only slight differences so that their K‘s are similar.

On the y axis of Figure 1, we have K. This is the variable that we are trying to maximize. A quick glance at the graph reveals that point B is clearly the best answer. There is also a local maxima at point D that we will have to consider. Remember, this is heavily simplified! Many optimization problems will have trillions of solutions, and the solution space will have more than one dimension.

Now that we have a solution space, let’s see how two different Algorithms would handle this problem.


Hill Climbing

Our first method is not one found in nature. However, this is by far the easiest algorithm to understand, and will serve as a starting point for tackling this issue. I will present it using pseudocode, as well as a written explanation. This code will be called HillClimb for short.

point = RandomPoint
while True
   newpoint = Blank
for p in point.Neighborhood
      if point.k < p.k and newpoint.k < p.k
         newpoint = p
if newpoint != Blank
      point = newpoint
   else
      break

To start HillClimb, pick a random solution S. Then, check the K of every point in S‘s neighborhood (neighborhood can be defined in many different ways depending on the problem). If any points in S‘s neighborhood have a higher K than S, set that point to be your new S. If multiple points have a higher K than S, pick the highest one. If there is a tie, pick a random point from the highest points presented. Continue doing this until the neighborhood of S has no points with a higher K than S itself.

HillClimb has the advantage of being incredibly straightforward. It also works well for solution spaces with only one maximum. However, it has no method of handling spaces with many different maxima, such as the two maxima at B and D in Figure 1.

Let’s see how HillClimb would handle different points on Figure 1.

Say we happen to start with the best solution: point B. HillClimb would check both points next to it and see that neither point has a higher K and quit. We found the correct answer by pure luck!

But what if HillClimb started on point D? Neither point next to D has a higher K, so it would stop and say that D is the best solution, which is clearly not true based on the solution space. This illustrates the problem with HillClimb: it has no way to handle local maxima. Instead, HillClimb must rely on luck to start in a random point that will lead it to the true maximum, such as point A.

Based on this presentation of HillClimb, what "best" solution would it give if it started with point C in Figure 1? The answer is given at the end of this page!

How much of an issue this is depends on the nature of the problem. If it is not absolutely crucial to find the highest K, but instead just find a solution with a relatively high K, HillClimb can work. This algorithm can also be run repeatedly to increase the likelihood of finding a better solution since it is so dependent on the starting point. HillClimb really struggles with solution spaces that are very "jagged," like in Figure 2 below. This is because it will usually find a "best" solution very quickly and not search for a better one due to the abundance of local maxima.

Figure 2: A jagged Solution Space
Figure 2: A jagged Solution Space

HillClimb is nice, but it definitely has some problems. Mostly that it cannot handle local maxima, and is heavily reliant on the initial starting point. How can we fix this? A possible method would be to introduce some randomness into our algorithm. What if, sometimes, our path did not bring us to better points in the neighborhood but one that had a lower K? We need to be careful with this, we can’t randomly move from point to point and pick one. Our new algorithm should have to ability to be random, but also still end on a reasonable answer. Let’s see how this could work!


Simulated Annealing

We can look at a natural process for a way to modify HillClimb. Annealing is a process that occurs in metal as they cool from a heated state. When heated, the atoms in the metal are free to move and will seek out more energy efficient arrangements as the metal cools. A high Temperature (T) mean that the atoms are more likely to move around, temporarily becoming less energy efficient to possibly find better arrangements. As T lowers, the atoms become less likely to make this gamble. This final state will be more energy efficient than when the metal was heated initially. Nature has a beautiful method to optimize the arrangement of atoms within the metal.

How can we add this to HillClimb? Let’s have a variable, T, that indicates how likely it is our algorithm to pick a neighboring solution with a lower K. It should also depend on the difference in K between our current point and the new one. In other words, our algorithm needs to be less likely to pick a worse point if it is much worse than what we have now. I’ll show what this could look like with pseudocode below for the algorithm Anneal. Don’t worry, a text explanation will follow!

point = RandomPoint
T = StartingTemperature
delT = ChangeInTemperature
while True
   newpoint = Blank
   T = T - delT
for p in point.Neighborhood
      if point.k > p.k
         if random[0,1] < exp((p.k - point.k) / T)
            newpoint = p
            break
      elseif point.k < p.k and newpoint.k < p.k
         newpoint = p
if newpoint != Blank
      point = newpoint
   else
      break

Let’s walk through the pseudocode for Anneal very slowly. First thing you should notice are the two new variables at the beginning. We have T, the starting temperature, and delT, which tells us how much to decrease T by in each step. The other major difference occurs while looping over points in the neighborhood of our current point S. If a point we’re checking has a lower K than S, we run another check. Pick a random number between 0 and 1, and then check to see if it is below the following expression:

I will refer to this as prob. Remember that prob refers to the probability that we choose to go to a solution that is less favorable (lower K). If we pick a less favorable point, we stop searching for new points and move to that one, then start again. The rest of Anneal works the same as HillClimb. Eventually, there won’t be points with higher K nearby and prob will be so low that it will not move to points with lower K.

prob will always be less than 1, since p.k < point.k which makes the exponent negative. A large difference between p.k and point.k leades to a more negative exponent, making prob smaller. A large T leads to a less negative exponent, which makes prob larger. As T becomes smaller (but remains positive), the exponent values will tend to be more and more negative, making prob smaller.

This is complicated, so I’d encourage you to take time and make sure you understand this algorithm. There is an example below.

We have a lot of control over how Anneal tends to act. There is a lot of randomness here, but we have some room to adjust it. Mainly, we can alter how T changes. Anneal decreases T by delT every iteration. We can adjust both the size of T and delT. This will impact how long Anneal will run for, as well as how free it is to move to less favorable solutions. Another possibility is to have T decrease with an exponential function, as long as it is decreasing. Remember that T must always remain positive. I didn’t include it in the pseudocode for Anneal, but there should be check that keeps T positive if it gets too small.

Let’s reexamine Figure 1, which I’ve copied here so you don’t have to scroll back and forth so much.

A copy of Figure 1
A copy of Figure 1

First we look at what happens when we start at point B. Early on in Anneal, prob is very high. We will likely move away from B, then as prob decreases, move back to B. However, there is always the possibility that we get too close to D and once prob gets very low, Anneal will get stuck in that area and end on D. This is the tradeoff of Anneal. While we are more likely to get a final point with a higher K, we are also more likely to skip over points with really high values of K. In a sense, Anneal centers our final K‘s relative to HillClimb. We get less K‘s that are very low and very high. Interestingly enough, our algorithm with more randomness, Anneal, results in less randomness in final points.

Starting at A will usually result in a final answer of B, but there is still always the chance that it shifts too far the right and ends at D.

Let’s look at C, and this will reveal a flaw in the Anneal **** pseuodcode I presented. _Annea_l will always shift the right when at ** C, no matter what prob is. This is becaus_e Annea_l only checks against prob if the K of a neighboring point is lower than our current point. If all neighboring points have a higher K, the_n Anne_al just picks the highest one to move too. This also means that starting at the point D can never result in ending at** B.

See if you can write some pseudocode that solves this problem! Here’s a brief description of a potential solution. If we generate a random number that is less than prob, always move to the neighboring point that has a lower K than the other neighboring point. If it it a tie, pick random one. There! Now our algorithm wont get stuck on C and might move to B if it starts at D.


There are many other approaches to solving this problem. A lot of them take their inspiration from other natural processes as well! For example, Quantum Annealing use the prob function as see here:

prob for Quantum Annealing
prob for Quantum Annealing

This function also takes into account the width of the maximum separating these two points. Quantum Annealing is more likely to move through local maxima that are very tall, but also very thin. There are other variations of this algorithm as well!

My personal favorite are the set of Genetic Algorithms. These algorithms take inspiration from the natural evolution of species. Instead of starting with one random solution, this process will start with a large amount of random solutions. This is called the first generation. Each solution has a K, which we will think of as "fitness." There is a probability function reproduce that increases with K, and decreases with time. For each "step," check every solution against reproduce in the current generation. So solutions that are more "fit" have a higher chance of reproducing. With the solutions that pass, randomly pair them together. For each pair, somehow combine the solutions to produce an offspring that has traits from both parents. Also, check each offspring against a mutation probability that makes a random change to it if passed. Now repeat the process over again with this new generation of offspring!

The Genetic Algorithm has a lot of variety to it. While it can be very effective, there are a lot of steps and can take a much longer time that Climbing or Annealing. There is also the question of how to combine solutions. Some problems have obvious combinations, while others are much more ambiguous.

Some other fun approaches include Ant Colony Optimization, Particle Swarming, and many more! Have fun exploring them!


Thanks for reading! Leave a comment if you have any thoughts or questions about this article.

I hope you learned something! If you like my work, then consider signing up to become a medium member using this link to support me! You can also follow me for more stories like this! I publish weekly about math and science.

The solution to the HillClimb problem is point D. Despite the fact that both points adjacent to C are higher, it will always move to the right since that is the highest. The algorithm will eventually settle on D.


Related Articles