Remember that feeling after watching movies like EuroTrip, where the characters whisk through picturesque European cities on an adventure of a lifetime? It’s captivating. Yet, reality promptly reminds us: that orchestrating a journey across numerous destinations is no simple task. But here’s the exciting twist – armed with programming expertise and a grasp of genetic algorithms, I embarked on developing a solution. Imagine being able to optimize complex routes spanning dozens of locations with precision. This is where the world of Data Science intersects with the art of adventure planning. In this article, I unveil an algorithmic script that elegantly tackles the intricate Traveling Salesman Problem (TSP), promising to aid travel planning and enhance our understanding of optimization in data science.
Reading this article will provide you with a clear understanding of how the synergy between Python, Google Maps API, and genetic algorithms unlock data-driven solutions for non-trivial tasks.
Understanding the Traveling Salesman Problem
Setting out on a journey often ignites a sense of adventure, but as we contemplate the intricacies of travel, the excitement can be accompanied by logistical challenges. One such challenge that has captured the attention of mathematicians, computer scientists, and logistics experts for decades is the Traveling Salesman Problem (TSP). At its core, the TSP poses a seemingly straightforward question: Given a list of cities and the distances between them, what is the shortest possible route that allows a salesman to visit each city exactly once and return to the starting point? While the problem’s statement is concise, its implications extend far beyond its surface simplicity.
In the world of optimization and logistics, the TSP is more than a theoretical curiosity; it holds immense practical significance. Consider delivery services, where minimizing travel distances translates directly to reduced fuel costs and faster service.
Underneath this seemingly straightforward problem statement resides a profound level of complexity. The TSP’s combinatorial nature arises from the exponential growth in potential solutions as the number of cities increases. The quantity of possible routes swiftly skyrockets beyond any computing feasibility, rendering traditional brute-force methods impractical for larger instances. The number of possible routes is equal to
where n represents the number of cities – a factorial explosion that quickly becomes overwhelming. With just 50 cities, the number of possible routes equals 3*10⁶², which is just about the number of atoms in the Milky Way.
The TSP stands as a quintessential example of the intriguing intersection between mathematics, computer science, and real-world logistical challenges. As the city count escalates, unveiling the shortest path demands innovative strategies that transcend conventional computational approaches.
The quest for efficient solutions to the TSP has driven researchers to explore a variety of methodologies. Among them are genetic algorithms, a class of optimization techniques inspired by the process of natural selection. Genetic algorithms excel at navigating complex solution spaces, making them a natural fit for tackling problems like the TSP, where brute-force methods quickly become infeasible as the number of cities grows.
The purpose of this article is to navigate the union of these two domains – the Traveling Salesman Problem and genetic algorithms. Specifically, we dive into a practical application: a Python script designed to exploit the power of genetic algorithms for solving the TSP. Our exploration will highlight how this algorithmic fusion has the potential to improve travel planning, logistics, and optimization challenges across industries. As we understand the inner workings of our genetic algorithm-based solution, the world of data science and algorithmic innovation will converge, promising new insights and efficient pathways through even the most labyrinthine of routes.
Introduction to Genetic Algorithms
At its core, a genetic algorithm (GA) is a heuristic search technique inspired by the elegant process of natural selection and evolution.
The inspiration behind genetic algorithms harks back to Charles Darwin’s theory of evolution. GAs mimic the process of natural selection by iteratively evolving a population of potential solutions. In this digital melting pot, solutions that exhibit favorable traits survive and procreate, passing on their "genes" to the next generation. This generational evolution continues until an optimal or near-optimal solution is achieved.
Genetic algorithms are characterized by four fundamental components:
- Selection: Just as in nature, selection mechanisms identify and favor solutions with higher fitness, mirroring the concept of "survival of the fittest."
- Crossover: Solutions, or "chromosomes," exchange genetic material to create offspring with a blend of their parent’s traits.
- Mutation: To introduce diversity and prevent premature convergence to suboptimal solutions, genetic algorithms incorporate a mutation operator. This operation randomly alters certain elements of a solution, similar to genetic mutations in nature.
- Fitness Evaluation: It is the determination of each solution’s fitness, which quantifies how well it performs the task at hand. The fitness function guides the selection process by assigning a higher probability of reproduction to superior solutions.
Genetic algorithms exhibit remarkable versatility when it comes to tackling optimization problems. Their ability to explore solution spaces in a non-linear, multidimensional manner makes them well-suited for complex, real-world challenges. Whether it’s optimizing complex engineering designs, fine-tuning neural network parameters, or, as we’ll soon see, solving the TSP, genetic algorithms excel in scenarios where traditional algorithms fail.
Applying Genetic Algorithms to the Traveling Salesman Problem
Now, let’s delve into the application of Genetic Algorithms (GAs) to solve the Traveling Salesman Problem (TSP).
At its core, GAs approach the TSP by considering each potential route as an individual within a population. This population of routes evolves over generations, with each route representing a unique itinerary for the traveling salesman.
To facilitate this genetic evolution, we represent each route as a chromosome – a sequence of cities defining the order of visitation. For example:
The fundamental task is to discover the optimal chromosome, the sequence that minimizes the total travel distance. The fitness of each chromosome is quantified by evaluating the total distance it covers when visiting cities in the order specified. Lower distance equates to higher fitness, mirroring the goal of finding the shortest path.
Implementation in Python
Now, let’s follow the step-by-step high-level implementation of the Python script designed to tackle the TSP. The complete code is available in my GitHub repository.
Getting the data
The first step consists of choosing the destinations. For this example, I chose to pick the 50 most visited cities in Europe. Once defined the destinations, I needed the travel distance and times between each couple of cities. For this kind of query, Google Maps API represents the state-of-the-art solution. After setting up an account here, you can request your personal API key, needed to authenticate you.
The requests to the Google Maps API are sent in this way:
Initialization
The process begins by generating an initial population of routes. These routes are typically created randomly or through a heuristic method.
Fitness Evaluation and Selection
In each step, after generating offspring and mutating some routes, the total distance is calculated for each route to evaluate their fitness. This step ensures that the algorithm maintains its focus on selecting the shortest paths.
In the spirit of natural selection, routes are chosen for reproduction based on their fitness. Routes with shorter total distances – those closer to the optimal solution – are more likely to be selected, allowing individuals with advantageous traits to be more likely to reproduce.
Crossover and Mutation
For the particular features of this problem, the classical crossover operation is not performed. I opted for two kinds of mutation:
- Single-point mutations: To maintain diversity and introduce novel solutions, the algorithm introduces small, random changes to selected routes. This emulates genetic mutations, introducing slight variations.
- "Crossover-mutations": Mutates a solution by slicing a random subset of its genome and appending it to another position. To recall biological terms, it is a sort of asexual reproduction.
Iteration
The steps above are repeated for a set number of generations, allowing the population to evolve over time. Each iteration brings the algorithm closer to an optimal or near-optimal solution.
The algorithm continues iterating until a termination criterion is met. In this case, the termination criterion consists of the reaching of a predetermined number of generations.
Results and Conclusion
In this exploration, I employed a GA with a population size of 200 individuals and ran it for 1000 generations to tackle the TSP with 50 cities. The journey from the initial generation to the final outcome reveals the remarkable efficiency of the GA-based approach.
At the outset, in generation zero, the first solution emerged with a fitness of 70,755 kilometers:
('Sofia, Bulgaria', 'Nice, France', ..., 'Naples, Italy', 'Luxembourg City, Luxembourg')
This initial solution, as expected, represented a random arrangement of cities, signifying the algorithm’s starting point. However, as the GA traversed through successive generations, we observed a remarkable transformation in the quality of solutions.
After 1000 generations, the GA found its routes. The endpoint was a solution with a fitness of 21,345 kilometers – a significant reduction in travel distance compared to the initial random solution. This remarkable improvement of nearly 49,410 kilometers underscores the GA’s effectiveness in optimizing complex routes like the TSP.
I performed 4 trials changing the population size. The overall better results are obtained with the larger population, but the computation time was obviously longer. We can see how, for each trial, the fitness value rapidly decreases over the first iterations, and settles to a plateau value later. This is typical behavior of a converging algorithm.
While the TSP remains an NP-hard problem, meaning that finding the absolute optimal solution can be computationally challenging for larger instances, the GA’s ability to approach near-optimal solutions proves invaluable in practical applications. This accomplishment opens doors to more efficient travel planning, streamlined logistics, and enhanced optimization across diverse industries. This experiment highlights the symbiotic relationship between data science and innovative algorithms. It underscores how evolutionary computation, inspired by nature’s selection mechanisms, can elegantly address intricate problems in the real world.
If you liked this story, consider following me to be notified of my upcoming projects and articles!
Here are some of my past projects:
Building a Deep Neural Network from Scratch using Numpy
Building a Convolutional Neural Network from Scratch using Numpy
Choose the Right Optimization Algorithm for your Neural Network
References
[1]: Tri-Objective Optimal PMU Placement Including Accurate State Estimation: The Case of Distribution Systems [2]: Analyzing the Performance of Mutation Operators to Solve the Travelling Salesman Problem [3]: Probabilistic model with evolutionary optimization for cognitive diagnosis [4]: Simulated Binary Crossover for Continuous Search Space [5]: A new mutation operator for real coded genetic algorithms [6]: Computing the optimal road trip across the U.S.