![Instead of benchmark optimization- and machine learning algorithms against each other, we should consider how they can strengthen each other [Photo by Wedding Dreamz on Unsplash]](https://towardsdatascience.com/wp-content/uploads/2023/12/0zj0ikUUckUyLtVwF-scaled.jpg)
Although most of us don’t see it, optimization algorithms (OAs) are at work everywhere. They plan shelve stocking for our grocery stores, create airport schedules, and give us the shortest route to our holiday destination. Exact algorithms in particular do very well at exploiting known structures – e.g., convex structures – finding solutions even in massive decision spaces with many constraints. Over the past decades, the combination of hardware- and algorithmic improvements yielded massive speed-ups in the order of millions. A planning task that might have taken a computer months to complete in the 90’s could just take a second today.
Similarly, machine learning (ML) has taken an incredible flight in the last decade or so. MuZero showed the capability to learn superhuman game-playing policies without knowing the games’ rules, Graph Neural Networks learn complex relations unperceivable to the human eye, and Transformers gave rise to ChatGPT and its competitors. The commonality is that these algorithms are all able to detect patterns from their environment, be it text databases or video games. Novel and highly complicated architectures are introduced on a regular basis, often solving new problems and offering unparalleled performance. Despite all successes and breakthroughs, for many real-world problems, end-to-end ML struggles to achieve competitive results. Tailored OAs often still beat ML, but may require substantial computational time.
There is no need for the two approaches to compete though. Interestingly, optimization algorithms excel at exploiting patterns, whereas machine learning shines at detecting patterns. Instead of pitting them against each other as benchmarks and see which one outperforms the other, wouldn’t it make sense to marry the two complementary halves instead?
When merging optimization and machine learning, it often boils down to statistical learning being used to improve optimization routines in one form or another. This way, we can speed up the search by exploiting patterns that we learned. The development of such integrated solutions has become an emerging research field in recent years, with many exciting possibilities ahead.
How can we integrate machine learning and optimization algorithms?
We established that OA is good at exploiting structures and ML is good at detecting them, so there is a natural synergy. Still, what concretely would constitute a marriage between OA and ML? Broadly, we might classify in the following four categories:
I. OA provides input to ML. OA may offer a heuristic solution that can further be improved using ML, or it might perform a computationally intense preprocessing step in the algorithmic pipeline.
II. ML provides input to OA. For instance, ML may suggest a starting solution for a warm start, or learn problem structures for OA to exploit.
III. ML is used to accelerate OA. ML may be used to iteratively detect structures, which aid OAsolvers in finding solutions more quickly.
IV. OA solves subroutines in ML. Routines such as tree search or action space evaluation may be efficiently performed using OA.
Let’s offer a bit of practical interpretation to this. In part due to increasing logging and utilization of data, algorithmic pipelines tend to get increasingly complex. They often are composed of multiple tasks, some more suitable for OA, others for ML. Thus, it has become very natural for one paradigm providing input to the other.
Furthermore, companies often repeatedly solve variants of a specific problem. For example, a transport company might solve a vehicle routing problem daily, dealing with variable yet comparable instances in terms of customers, time windows and load sizes. If general-purpose solvers would take advantage of these similarities, the OA algorithms could run more efficiently.
Finally, since ML as an end-to-end paradigm is often not yet competitive with OA, it often helps to optimize some subroutines. Typically, this speeds up the ML algorithms and enhances there competitiveness.
In general, combining OA and ML makes sense when:
- OA (or human expertise) is too slow to offer solutions within reasonable time
- There is room to improve upon heuristic solutions, or it is unknown how good (or bad) they actually are
- Good solutions still need to be identified
- There is need for a fast approximative solution
- The algorithmic pipeline involves elements of both pattern detection and pattern exploration
We proceed to contextualize optimization algorithms and machine learning, before offering some illustrative examples.
Optimization algorithms
A brief introduction to optimization algorithms is in order. To align with the ML integration, we crudely categorize in exact algorithms (optimal solutions but slow) and heuristics (suboptimal solutions but faster).
Exact algorithms
Exact algorithms are powerful techniques employed in optimization problems, capable of finding provably optimal solutions within a feasible solution space. Commonly, problems are formulated as mathematical programs (e.g., linear or quadratic programs), which can be solved using optimization software such as CPLEX, SCIP or Gurobi. These solvers systematically explore the solution space and guarantee that the best solution is identified.
Although powerful out of the box, there is a limit to the solution spaces that can be handled. For very large problems, we often need to put substantial design effort into designing decompositions and branch-, price- and/or cut schemes to reduce search complexity and efficiently navigate through the solution space, while preserving optimality guarantees.
Heuristics
Heuristics offer an alternative problem-solving approach, sacrificing optimality guarantees for computational efficiency. Heuristics are particularly useful for large-scale or complex problems, for which finding optimal solutions is impractical within a reasonable timeframe.
Basic heuristics provide quick, rule-of-thumb solutions. They typically yield suboptimal results, but often return decent solutions in limited computational time. Examples include nearest-neighbor insertions or 2-opt swaps.
Metaheuristics employ higher-level strategies that guide the search for solutions, often incorporating elements of randomness and adaptability. Examples include genetic algorithms, simulated annealing, and particle swarm optimization. Metaheuristics are more intense in terms of tuning effort and runtime, but tend to offer superior results compared to basic heuristics.
Machine learning
We discuss two paradigms of machine learning; demonstration learning and experience learning. Although strongly related to existing classifications in terms of (un)supervised- and reinforcement learning, we here aim to connect more to the optimization context.
Demonstration learning
Demonstration learning minimizes the distance between a provided expert policy and predicted policy (supervised). It can be seen as a teacher-student model, in which the student (the ML model) aims to mimic the teacher (e.g., an exact algorithm, a tailored heuristic, or human-crafted solutions). Note that this form of learning therefore requires theoretical or empirical knowledge in some form.
The differences between the solution predicted by the ML model and the known expert solution allows to compute a loss function, which the algorithm subsequently aims to minimize. If an ML algorithm is shown many examples of high-quality solutions, we hope it can ultimately replicate those.
The downside of demonstration learning is that it cannot beat the expert policy. If the expert policy is suboptimal, so is the predicted policy. Therefore, demonstration learning only makes sense if acquiring the original solutions takes too long, and one strives to get the same solutions in a fraction of the time. However, if generating example solutions takes prohibitively long, the idea fails as well. This puts demonstration learning in a bit of a precarious position.
Demonstration learning is a form of imitation learning. We try to reproduce the same decision the expert offers, but faster.
![In demonstration learning, the ML algorithm aims to mimic expert solutions as closely as possible [Photo by Andre Mouton on Unsplash]](https://towardsdatascience.com/wp-content/uploads/2023/12/0qlxjeB_YbiyNMxk7-scaled.jpg)
Experience learning
In contrast to demonstration learning, experience learning utilizes observations yielded by deployed policies, with the aim of improving them. This form of learning is typically associated with reinforcement learning (RL), which iteratively executes and improves policies based on observed rewards gained from its environment.
Compared to demonstration learning, experience learning requires no prior knowledge on solution quality or structure. The downside is that – in its purest form – it leverages no information about the environment, interacting with it strictly through state, action and reward. Common problems associated with experience learning are getting stuck in local optima, the need to appropriately define rewards, and extensively explore the solution space.
Experience learning utilizes feedback signals from its environment in order to improve its decision-making policies over time.
![In experience learning, an ML algorithm aims to improve its decision-making policies through interactions with its environment [Photo by Alex Kondratiev on Unsplash]](https://towardsdatascience.com/wp-content/uploads/2023/12/0DptDhDWxihgDTrP1-scaled.jpg)
Note that both forms of learning have substantial drawbacks, which in part explains why end-to-end machine learning still struggles on real-world problems. Thus, the integration of OA and ML is indeed mutually beneficial.
Of course it is possible – or perhaps even preferable – to apply both forms of learning in a single pipeline. An example would be to (i) generate an initial solution through demonstration learning, (ii) improve with an optimization solver, and (iii) use experience learning to guide the search for the optimal solution.
Examples of MO-OA
Time for some concrete examples. Below follow three representative examples based on academic studies.
I. Learning to branch
Perhaps the best-studied example that integrates OA and ML is learning to branch, or more specifically, deploying ML to guide branch-and-bound algorithms. For those unfamiliar with the concept: branch-and-bound solves integer problems by systematically exploring a very large search tree, pruning proven suboptimal or unpromising branches based on established bounds. If the relaxation of a subproblem does not contain the optimal solution, neither will its adjacent integers.
Although branch-and-bound is a renowned algorithm to solve large action spaces to optimality, underneath the hood some fairly rudimentary heuristic choices are made:
- Variable selection: Which variable to branch on next. A common decision rule is to simply branch on the most ‘fractional’ (closest to 0.5) variable.
- Node selection: Which of the currently open nodes to process first. Extremes are to always explore the most promising node (best-first) or to always fully explore a node’s subtree (depth-first).
![Example of branch-and-bound procedure. The choices which node to explore next and which variable to branch on are typically based on heuristic rules [image from WikiMedia]](https://towardsdatascience.com/wp-content/uploads/2023/12/0GaK-2gCA8vvHt7cm.png)
Truth be told, this simplification does not do justice to more sophisticated heuristics developed over the years. Nonetheless, despite the guarantee to ultimately find the optimal solution, the actual search procedure is not as intelligent as one might expect it to be, which is the price paid for solvers being general-purpose.
Given that the tree search is sequential and highly dynamic, it seems sensible to replace static heuristic rules with dynamic learning-based ones. For instance, dynamic features of nodes, branches and trees could be designed to predict suitable variables and nodes, with their values being learned by ML.
A number of solutions have been proposed in the past years, ranging from offline demonstration learning to online experience learning. An intuitive example is that of upper confidence bounds, balancing exploration and exploitation based on the node’s perceived value and the number of time it has been visited.
The example of learning to branch illustrates that ML can be deployed in the context of exact algorithms, potentially speeding up search procedures without compromising optimality guarantees.
Example based on Lodi & Zarpellon (2017) and Balcan et al. (2018)
II. Learning routing with GNN
Instances for routing problems are commonly represented by graphs. Customer locations and travel distances are obvious properties reflected in such graphs, yet nodes and arcs can have all kinds of labels, such as time windows and traffic densities. Graph Neural Networks (GNNs) are a powerful technique to embed and meaningfully represent such graphs in the problem context.
![The vehicle routing problem is a classical combinatorial optimization problem, exploding exponentially in size with the number of nodes [image from WikiMedia]](https://towardsdatascience.com/wp-content/uploads/2023/12/0s2BOZyKfmikGkKhs.png)
Traditionally, routes are generated using exact solvers (e.g., the Concorde TSP Solver) or metaheuristics (e.g., Hybrid Genetic Search). However, these solvers require fairly long runtimes, with exact algorithms failing to offer solutions at all for large instances.
Instead of running optimization algorithms for hours and hours every time we encounter a new problem instance, we might learn to replicate them by detecting the patterns in their solutions. This example is a form of demonstration learning, trying to minimize the performance gap between the computationally-intensive optimal solutions and the ML-generated solutions. Some quality loss is incurred in the process, but solutions can be generated in a much friendlier timeframe.
By running GNNs on the solutions, it is possible to estimate the probabilities that edges will be included in the solutions. Subsequently, a sparse heatmap is constructed that retains the most promising edges. Guided by this heatmap, a dynamic programming subroutine is run for stepwise construction of the routes. If a route is dominated by others, it can be pruned.
Example based on Kool et al. (2019) and Kool et al. (2022)
III. Mathematical programming for large action spaces
In this final case, mathematical programming is deployed as a subroutine in reinforcement learning, with the purpose of handling large action spaces.
Combinatorial optimization problems have the tendency to very quickly explode in size, e.g., the number of sequences to visit a set of nodes is the factorial of the nodes. They often require handling many constraints, such as the capacity and maximum driving time of a truck or time windows at the customer nodes. Enumerating all actions and running feasibility checks becomes cumbersome very quickly.
If the problem structure allows, the one-step decision problem can be formulated as a linear program. Such programs can be solved very efficiently, vastly scaling up the size of action spaces that vanilla RL algorithms – which rely on explicit enumeration – could handle.
![Linear programming can efficiently explore large action spaces with constraints [image via WikiMedia]](https://towardsdatascience.com/wp-content/uploads/2021/01/1MG5W_d5S7chRURlmW6Ak9g.png)
The integration between OA and ML does not end there. Next, consider the approximation of Q-values. In many modern RL applications, this is done through a Deep Q-network, which by definition perform a nonlinear transformation on its input. However, ReLU activation functions can be embedded in the objective function through piecewise linear functions, supported by a set of additional constraints.
Another OA layer? Instead of searching the full action space, we may impose domain restrictions and local branching to control the action space, exploring only the neighborhood in which we suspect (based on Q-values) the best action resides.
Example based on Van Heeswijk, W.J.A. & La Poutré (2020) and Akkerman et al. (2023)
As you see, the opportunities for integrating machine learning and optimization algorithms are vast and diverse. It has been an active research area for years, with lots of potential yet to be unlocked.
TL;DR
- Both machine learning and optimization algorithms have flaws in terms of computational times, performance guarantees, knowledge utilization, etc. Given the nature of both paradigms, they may naturally complement each other.
- Machine learning specializes in pattern detection, optimization algorithms specialize in pattern exploitation. They can be combined to build powerful algorithmic pipelines.
- In practice, companies often repeatedly solve a single problem with relatively limited variance. Extracting patterns from solution data (i.e., statistical learning) **** can boost existing exact methods or heuristics.
- Machine learning can be used in the form of demonstration learning (approximating solutions of optimization algorithms in a fraction of the time) or experience learning (interacting with the environment to iteratively enhance policies).
You may also like the following articles:
Using Linear Programming to Boost Your Reinforcement Learning Algorithms
Five Ways To Handle Large Action Spaces in Reinforcement Learning
For those interested in machine learning for combinatorial optimization, I can warmly recommend the following keynote speech by Prof. Andrea Lodi:
References
Akkerman, F., Luy, J., Van Heeswijk, W.J.A., & Schiffer, M. (2023). Handling Large Discrete Action Spaces via Dynamic Neighborhood Construction. arXiv preprint arXiv:2305.19891.
Balcan, M. F., Dick, T., Sandholm, T., & Vitercik, E. (2018, July). Learning to branch. In International Conference on Machine Learning (pp. 344–353). PMLR.
Khalil, E., Le Bodic, P., Song, L., Nemhauser, G., & Dilkina, B. (2016, February). Learning to branch in mixed integer programming. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 30, №1).
Kool, W., Van Hoof, H., & Welling, M. (2019). Attention, learn to solve routing problems!. International Conference on Learning Representations 2019.
Kool, W., Van Hoof, H., Gromicho, J., & Welling, M. (2022, June). Deep policy dynamic programming for vehicle routing problems. In International conference on integration of constraint programming, artificial intelligence, and operations research (pp. 190–213). Cham: Springer International Publishing.
Lodi, A., & Zarpellon, G. (2017). On learning and branching: a survey. Top, 25, 207–236.
Parmentier, A., & T’kindt, V. (2023). Structured learning based heuristics to solve the single machine scheduling problem with release times and sum of completion times. European Journal of Operational Research, 305(3), 1032–1041.
Santana, Í., Lodi, A., & Vidal, T. (2023, May). Neural Networks for Local Search and Crossover in Vehicle Routing: A Possible Overkill?. In International Conference on Integration of Constraint Programming, Artificial Intelligence, and Operations Research (pp. 184–199). Cham: Springer Nature Switzerland.
Van Heeswijk, W.J.A. van & La Poutré, H.L. (2020, December). Deep reinforcement learning in linear discrete action spaces. In 2020 Winter Simulation Conference (WSC) (pp. 1063–1074). IEEE.
Vazacopoulos, A. (2020). Combining Machine Learning and mathematical optimization integration using Python. Via LinkedIn.