Humans like to mimic many things in nature.
We mimic frogs in swimming. We mimic birds by installing wings on planes to provide lift. We mimic the crane/ snake/ mantis in martial arts. We mimic termites to build structures with efficient temperature control (see Eastgate Centre).
This extends to math algorithms as well, where you would have heard of Artificial Bee Colony, Ant Colony Optimization, Cuckoo Search, and Firefly Algorithm. I’ve also previously talked about Evolutionary Algorithm, which works following natural selection.
Today, I will talk about PSO – Particle Swarm Optimization. At the end of this article, you will have the code which enables you to implement the solution as well as generate a gif to visualize the search process.
Use-Case
Searching for an optimal solution in high-dimensional space is difficult. Students picking up ML would probably have heard of the term ‘curse of dimensionality’ within their first week.
High-dimensional space is not just an abstract mathematical concept. Consider a supply chain problem. A company has to decide where to locate its production factory, warehouse, distribution centers, and retail stores. For simplicity, let’s just assume there is only one of each. This already makes the solution we are searching for 8-dimensional – (_x_₁, _y_₁, _x_₂, _y_₂, _x_₃, _y_₃, _x_₄, _y_₄).
The absolute location (_x_₁, _y_₁, _x_₂, _y_₂, _x_₃, _y_₃, _x_₄, _y_₄) influences the relative locations between each facility. Both the absolute as well as relative locations would influence the operational costs as well as expected revenue, and hence the profits. While not perfect, we can approximately say that the search space and impact on the objective function is continuous.
Formulation of Problem
In reality, the underlying function which maps the inputs (candidate solutions) to output (objective) is a black box and cannot be represented mathematically.
If we could, an analytical solution can be obtained directly. However, in dealing with a black box, we will perform sampling. A naïve approach would then be to perform a grid search. At the end of this article, you will have the tools to do much better, and more importantly, understand why it works.
Let’s translate the supply-chain problem and black box into the following mathematical equation (to have something to work with). While at it, we will import all the required libraries.
import math
import numpy as np
import random
import matplotlib.pyplot as plt
from tqdm import tqdm
def blackbox(x1, y1, x2, y2, x3, y3, x4, y4):
return math.sqrt((x1+2)**2 + (y1-3)**2) *
math.sin(
math.atan2(2*(y2-4), 3*(x2+1)) * (
math.cos(x3-x1) + math.sin(y3-y1)
+ math.cos(3*(y4+3))
+ math.sin(2*(x4-2))
)
)
Notice how everything is deliberately intertwined. This is because if a simple function where components are added (for example, below) had been used, a grid search can be performed with order O(n²) instead of O(n⁸). This would also not be consistent with reality, because we cannot naively solve for an optimal warehouse location independent of where the production facility and retail stores are located.
def blackbox_addition(x1, y1, x2, y2, x3, y3, x4, y4):
return np.sin(x1) * np.cos(y1)
+ np.cos(x2) * np.sin(y2)
+ np.sin(x3) * np.sin(y3)
+ np.cos(x4) * np.cos(y4)
The objective of the problem will be to find (_x_₁, _y_₁, _x_₂, _y_₂, _x_₃, _y_₃, _x_₄, _y_₄), such that the output given by blackbox
is as high as possible, without knowing the underlying equation. To make the problem realistic, we add noise to the sampling process. Of course, we could simply repeat the measurements at each point and take the average, but a robust search algorithm ought to perform well even without doing so.
class Task:
def __init__(self):
pass
def score(self, x_arr, with_noise=True):
if with_noise:
return blackbox(*x_arr) + 0.1*np.random.randn()
else:
return blackbox(*x_arr)
Formulation of Solution
Let’s start with the basic building block of PSO – a particle. A particle is a candidate solution vector, and carries its personal best and a set of vectors which determines the extent of personal influence and social influence. On each iteration, each particle i explores a new solution based on its velocity:
This velocity is governed by its inertial, along with an inclination towards both its personal best and the population’s global best solution. [1]
w is the propensity for the particle to continue along its previous trajectory, while _ϕ_₁ and _ϕ_₂ can be seen as the learning rate towards the personal and global best. Having inertia is a good idea, because if we are already heading in a supposedly good direction previously, it makes sense to keep up at it. In fact, this concept of momentum is also present in Adam optimizer used in updating the weights of neural networks during gradient descent.
The personal best, and even global best, are based on what had been found thus far. Chances are, some aspects (ie. dimensions) of those are good, at least relatively speaking, but not all. However, we do not know which dimensions should be retained, and therefore simply try various combinations. In the same spirit of Evolutionary Algorithms, what matters is the best particle and not the average population, hence there is only upside potential when it comes to exploration.
To put this concept into mathematical equations, we have vectors _U_₁ and _U_₂ randomly drawn from a uniform distribution, multiplied by the respective learning rates _ϕ_₁ and _ϕ_₂. Each particle has an affinity towards its personal best and global best, to varying degrees along the different dimensions. Collectively, we can write the following for a Particle
class.
class Particle:
def __init__(self, idx, n_dim, param_limits=(-1,1)):
self.idx = idx
self.low = param_limits[0]
self.high = param_limits[1]
self.pos = np.random.uniform(
low=self.low, high=self.high, size=(n_dim,)
)
self.w = np.random.uniform(0.1, 0.5)
self.r1 = np.random.uniform(0, 1, size=(n_dim,))
self.r2 = np.random.uniform(0, 1, size=(n_dim,))
self.vel = np.random.normal(size=n_dim)
self.pbest = np.zeros(n_dim,)
self.pbest_fitness = -1e8
def update(self, gbest, task):
self.vel = self.w * self.vel
+ self.r1 * (self.pbest - self.pos)
+ self.r2 * (gbest - self.pos)
self.vel = np.clip(self.vel, 0.2*self.low, 0.2*self.high)
self.pos = np.clip(self.pos + self.vel, self.low, self.high)
fitness = task.score(self.pos)
if fitness > self.pbest_fitness:
self.pbest_fitness = fitness
self.pbest = deepcopy(self.pos)
return self.pos
Notice that I had blended _ϕ_₁_U_₁ and _ϕ_₂_U_₂ into _r_₁ and _r_₂, respectively, for (very slight) computational savings. Essentially, I had prescribed the learning rate to be of order 1, to save time for this simple problem. You could also use random.uniform(0,0.1)
or some other range as you deem fit.
Moving on, a swarm comprises a large number of particles. Each particle is created independently, and has its own unique affinity towards different dimensions of the personal and global best. In each iteration, all particles search a nearby solution as determined by its ‘velocity’, and keep track of its velocity as shown at the beginning of this section.
class Swarm:
def __init__(self, n_dim, n_population, param_limits):
self.n_dim = n_dim
self.particles = [Particle(i+1, n_dim, param_limits) for i in range(n_population)]
self.reset()
def reset(self):
self.gbest = np.zeros(self.n_dim,)
self.gbest_fitness = -1e8
self.history = {}
def update_all_particles(self, task):
for particle_Obj in self.particles:
particle_Obj.update(self.gbest, task)
def solve(self, task, num_generation):
self.reset()
for t in tqdm(range(num_generation)):
self.update_all_particles(task)
particles_pbest_fitness = [task.score(p.pos) for p in self.particles]
self.history[t] = dict([(p, deepcopy(p.pos)) for p in self.particles])
if np.max(particles_pbest_fitness) > self.gbest_fitness:
self.gbest_fitness = np.max(particles_pbest_fitness)
self.gbest_index = np.argmax(particles_pbest_fitness)
self.gbest_particle = self.particles[self.gbest_index]
self.gbest = self.gbest_particle.pos
return self.gbest
The implementation can be completed as follows. Here, the population comprises 2000 particles as an illustration. Each particle has a noisy measurement, but noise is removed from the final solution to get an accurate measure of the algorithm’s performance. Notice that gbest
need not be updated in every iteration. This is because there is no guarantee that the solution improves each time.
position_min, position_max = -5, 5
swarm = Swarm(
n_dim=8, n_population=2000, param_limits=(position_min, position_max)
)
solution = swarm.solve(task, num_iteration=200)
print(solution)
print(task.score(solution, with_noise=False))
Results
A solution is found in just a couple of seconds. For fairness, I repeated the experiment ten times, deleting swarm
and reinitializing everything in each of the loop. The results inevitably vary, given that each particle had been initialized randomly with emphasis on different components.
Without knowledge of PSO or another decent search algorithm, your alternative is to do a brute force grid search. Let’s see how this approach compares with PSO.
best_score = -np.inf
solution = None
intervals = 11
for x1 in tqdm(np.linspace(position_min, position_max, intervals)):
for y1 in np.linspace(position_min, position_max, intervals):
for x2 in np.linspace(position_min, position_max, intervals):
for y2 in np.linspace(position_min, position_max, intervals):
for x3 in np.linspace(position_min, position_max, intervals):
for y3 in np.linspace(position_min, position_max, intervals):
for x4 in np.linspace(position_min, position_max, intervals):
for y4 in np.linspace(position_min, position_max, intervals):
sample = task.score(np.array([x1,y1,x2,y2,x3,y3,x4,y4]))
if sample > best_score:
best_score = sample
solution = np.array([x1,y1,x2,y2,x3,y3,x4,y4])
print("solution: ", solution)
print(task.score(solution, with_noise=False))
Even with an extremely coarse search of just 11 points per dimension (ie. splitting into tenths), it requires 10⁸ computations of the black box function. On my computer, it takes more than 20 minutes. Note that this is under ‘best performance’ mode, and nothing else is running in the background; not even music (I went for a coffee break while waiting). In real life, each computation could involve a simulation which takes much longer.
Using just a tiny fraction of the computational budget, PSO solved the problem (median 10.61, max 10.63) just as well as the brute force grid search, which settled on 10.61. It is important to note that we are only dealing with 8 dimensions here. If we are dealing with the likes of 1000 dimensions, grid search is simply not feasible, and the benefits of PSO will be amplified.
Visualization
Let’s take a look at how the particles are distributed through the iterations. Although it is impossible to visualize things in 8 dimensions, we can focus on two dimensions each time, while neglecting the other components.
In the plot below, the heatmap is the objective value as we vary only two dimensions while keeping all 6 others fixed at the value of swarm.gbest_particle
for that particular iteration. Therefore, each snapshot truly applies to only one single particle (and there is no practical way to present 2000 heatmaps concurrently). It gives some idea nonetheless, and is better than an empty background.
x_range = np.linspace(-5, 5, 100)
y_range = np.linspace(-5, 5, 100)
X, Y = np.meshgrid(x_range, y_range)
vectorized_blackbox = np.vectorize(blackbox)
for t in [1,2,3,4,5,6,8,10,12,15,20,30,50,100,200]:
data = np.array(list(swarm.history[t-1].values()))
ref = data[swarm.gbest_particle.idx-1,:]
heatmap = {
1: vectorized_blackbox(X, Y, ref[2], ref[3], ref[4], ref[5], ref[6], ref[7]),
2: vectorized_blackbox(ref[0], ref[1], X, Y, ref[4], ref[5], ref[6], ref[7]),
3: vectorized_blackbox(ref[0], ref[1], ref[2], ref[3], X, Y, ref[6], ref[7]),
4: vectorized_blackbox(ref[0], ref[1], ref[2], ref[3], ref[4], ref[5], X, Y)
}
fig, axs = plt.subplots(2, 2, figsize=(10, 10))
for i in range(2):
for j in range(2):
c = axs[i,j].contourf(X, Y, heatmap[2*i+j+1], vmin=0, vmax=12, levels=50, alpha=0.5)
axs[i,j].scatter(data[:, 2*i+j], data[:, 2*i+j+1], s=2, alpha=0.4)
axs[i,j].scatter(data[swarm.gbest_particle.idx-1, 2*i+j], data[swarm.gbest_particle.idx-1, 2*i+j+1], c='w', s=5, alpha=1)
axs[i,j].set_xlabel("x_%d"%(2*i+j+1))
axs[i,j].set_ylabel("y_%d"%(2*i+j+1))
axs[i,j].set_xlim([-5, 5])
axs[i,j].set_ylim([-5, 5])
fig.colorbar(c, ax=axs)
plt.suptitle("Distribution of particles at iteration %d"%t)
plt.savefig("plot_%03d.png"%t)
Using the imageio
library to create gif as shared previously, we can put all the figures together to form the following gif.
The white dot represents the position of swarm.gbest_particle
, while the other 1999 particles are represented by the smaller blue dots.
It might be tempting to jump to conclusions and claim that the PSO agent is "stupid" and could simply have moved slightly towards a brighter region on the map. Such an argument would be flawed. The heatmap which we see is only a slice of the multi-dimensional solution space, and as seen in the gif, can vary substantially when the other dimensions take different values. It is not a matter of just greedily optimizing component dimensions piecewise. Had different combinations of the representation been taken, say for example, _x_₁ with _y_₃, or _x_₄ with _y_₂, the story would be different. There is no one-size-fits-all technique to make everyone happy simultaneously.
Ultimately, the results speak for itself, and the fact that PSO solved the problem within seconds is evident of its usefulness.
Conclusion
In this article, we saw how PSO can be implemented to search for an optimal solution to an unknown black box in a efficient manner, along with an appreciation of what goes on behind the scenes to make this possible.
I envision continuing with a series of these nature-inspired algorithms. After all, nature is beautiful. Math is beautiful. Nature in Math? It has to be the best.
In my next article, I will explain the implementation of ABC (Artificial Bee Colony), as well as compare it against PSO to look at the types of problems in which ABC fares better. Stay tuned.
References
[1] A. E. Eiben and J. E. Smith, Introduction to evolutionary computing (2015), Springer-Verlag Berlin Heidelberg