The Graph Coloring Problem: Exact and Heuristic Solutions

Exploring the classical discrete optimization problem through custom constructive heuristics and integer programming in Python

Bruno Scalia C. F. Leite
Towards Data Science

--

Graph coloring heuristic solution for 32 nodes instance. (Image by the author).

Graph coloring theory has a central position in discrete mathematics. It appears in many places with seemingly no or little connection to coloring. It deals with the fundamental problem of partitioning a set of objects into classes, according to certain rules (Jensen & Toft, 1995).

We can summarize the problem as follows: given an undirected graph G(V, E), assign colors to each node (vertex) such that no two adjacent nodes share the same color and the number of colors used is minimized. Despite the concise and clear statement, this problem is notorious for its computational complexity, falling into the category of NP-hard problems.

Due to its combinatorial complexity, the exact approach of Integer Linear Programming (ILP), might be unable to solve large instances, even when using the best available solvers. In such situations, heuristics and metaheuristics can be interesting alternatives. Although not able to prove optimality, these methods can provide fast and good-quality solutions.

In this article, we will solve the Graph Coloring Problem using the constructive heuristic DSatur (Brélaz, 1979) and an integer linear programming model using pyomo (Bynum et al., 2021) with the solver HiGHS. As for other stories, you can find the complete code in my code repository.

In case you are not yet familiar with Linear Programming, I recommend reading my previous article for foundational knowledge before proceeding.

Now let us put some hands-on to creating amazing solutions such as this.

Graph coloring heuristic solution for 32 nodes instance. (Animation by the author).

Constructive heuristic — DSatur

Let us recall our problem definition:

  • Consider an undirected graph G(V, E).
  • Assign colors to each node such that no two adjacent nodes share the same color.
  • Minimize the number of colors used.

A naive constructive solution can be successively choosing an empty node and assigning a color to it ensuring that this color was not yet used in its neighbors. That might not be the best strategy possible, but surely can provide an upper bound for the necessary number of colors. However, the DSatur (Brélaz, 1979) algorithm includes new elements to choose the next node and improve this heuristic.

  • Degree: Number of edges connected to a given node.
  • Saturation: Number of different colors used in neighbors.

The algorithm for DSatur starts from a queue of uncolored nodes and iteratively chooses a node with a maximal saturation degree, removes it from the queue, and assigns a color to it. If there is an equality in saturation, Brélaz (1979) suggests choosing any of maximal saturation. We will choose a slightly different approach by breaking ties by choosing the node with the maximal overall degree. The color assigned to a given node should be the one with the lowest index available. A pseudocode is presented below. Consider N the set of nodes, Q the queue of uncolored nodes referencing the same memory locations of N itself, and C the set of used colors.

dsatur(N)
Q = [&n for n in N]
C = {}
while |Q| > 0:
sort(Q) // descending order by saturation and degree
n = Q.pop(0)
assign_color(n, C)

The function assign_color should verify the lowest index available color and, if none from the current set is feasible, include a new alternative and increment the set.

Let us then put that into Python code. Starting with package imports. Our heuristic will be written in pure Python with some type-hint imports only.

from typing import List, Tuple

Now, let us define our basic modeling elements: Colors and Nodes. The class Color is defined to be a mutable placeholder for the corresponding indexes of their instances. That can be memory efficient in Python as multiple variables can reference the same memory location. The method add_node should be called every time a new node is colored with a given color.

class Color:

index: int
n_nodes: int

def __init__(self, index) -> None:
self.index = index
self.n_nodes = 0

def __repr__(self):
return f"C{self.index}"

def add_node(self):
self.n_nodes = self.n_nodes + 1

Now the class Node. Each instance of Node has an attribute neighbors which is a list of nodes (pointers). So, every time a node is modified, we do not need to change any information in its neighbors. We can add a neighbor using the method add_neighbor, set its color using the method set_color, and access properties that depend on their neighbors' current states neighbor_colors, saturation, and degree.

class Node:

neighbors: List['Node']
index: int
color: Color

def __init__(self, index):
self.index = index
self.neighbors = []
self.color = None

def __repr__(self) -> str:
return f"N{self.index}|{self.color}"

def add_neighbor(self, node: 'Node'):
if node not in self.neighbors:
self.neighbors.append(node)

def set_color(self, color: Color):
self.color = color
color.add_node()

@property
def neighbor_colors(self):
return [n.color for n in self.neighbors if n.color is not None]

@property
def saturation(self):
return len(set((n.color for n in self.neighbors if n.color is not None)))

@property
def degree(self):
return len(self.neighbors)

Now into the DSatur algorithm. To implement it, a class of the same name will be created. The creation of a new DSatur instance takes as arguments the number of nodes n_nodesand edges of the graph. It then instantiates nodes and sets their neighbors based on edges.

class DSatur:

N: List[Node]
C: List[Color]
history: List[Node]

def __init__(self, nodes: List[int], edges: List[Tuple[int, int]]):
N = [Node(i) for i in nodes]
for e in edges:
i, j = e
N[i].add_neighbor(N[j])
N[j].add_neighbor(N[i])
self.N = N
self.C = []
self.history = []

def find_next_color(self, node: Node) -> Color:
next_color = None
for c in self.C:
if c not in node.neighbor_colors:
next_color = c
break
if next_color is None:
next_color = Color(len(self.C) + 1)
self.C.append(next_color)
return next_color

def solve(self, save_history=False):
Q = [n for n in self.N] # Pool of uncolored nodes
while len(Q) > 0:
Q.sort(key=lambda x: (x.saturation, x.degree), reverse=True)
n: Node = Q.pop(0)
next_color = self.find_next_color(n)
n.set_color(next_color)
if save_history:
self.history.append(n)
self.C.sort(key=lambda x: x.n_nodes, reverse=True)

@property
def cost(self):
return len(self.C)

Now let us use it to solve some challenging problems! You can find several instances in the OR Library. They are denoted as “gcolX” in there. We can extract the number of nodes and edges from one of those files using the following code.

# Open and read file
with open($YOUR_FILE_PATH, mode="r") as file:
lines = file.readlines()
header = lines[0].strip().split()
n_nodes = int(header[2])
edges = []
node_set = set()
for line in lines[1:]:
if line.startswith('e'):
_, i, j = line.strip().split()
edges.append((int(i), int(j)))
node_set.add(int(i))
node_set.add(int(j))
nodes = sorted(node_set)
assert len(nodes) == n_nodes, "Wrong number of nodes specified"

Then we just parse these to a new DSatur instance and solve the problem (or at least get a good-quality approximation).

dsatur = DSatur(nodes, edges)
dsatur.solve()

The code to create that interesting visualization of results from the introduction might be too verbose to include here, but you can find it in my code repository. It can convey a general idea of how hard it gets when handling several nodes…

Graph coloring heuristic solution for 100 nodes instance. (Animation by the author).

Now let us see how we could handle it by an exact approach.

Integer Linear Programming

To refine our solutions obtained using heuristics and try to prove the optimality of solutions, let us formulate the Graph Coloring Problem as an ILP model. Remember it might be unable to handle large instances though. The model presented in this section and other exact algorithms are presented in Chapter 3 of Lewis (2021).

Let us define the Sets considered in this approach:

  • C: Colors
  • N: Nodes (or vertices)
  • E: Edges

A first question already comes up “How many colors should we consider?”. In the worst-case scenario, all nodes are connected, so one should consider C of the same size as N. However, this approach could make our solutions even harder by unnecessarily increasing the number of decision variables and worsening the linear relaxation of the model. A good alternative is to use a heuristic method (such as DSatur) to obtain an upper bound for the number of colors.

In this problem, we have two groups of decision variables:

  • x_{i, c}: Are binary variables indicating that node i is colored in c
  • y_{c}: Are binary variables indicating that color c is used

We must also create constraints to ensure that:

  • Every node must be colored
  • If any node from one edge has a color, ensure that the color is being used
  • At most 1 node of each edge can be colored with a given color
  • Break symmetry (this is not required, but might help)

Finally, our objective should minimize the total number of colors used, which is the sum of y. Summarizing we have the following equations.

Graph coloring integer programming model. (Image by the author).

Without further ado, let us import pyomo for the Integer Programming model.

import pyomo.environ as pyo

There are two approaches for modeling a problem in pyomo: Abstract and Concrete models. In the first approach, the algebraic expressions of the problem are defined before some data values are supplied, whereas, in the second, the model instance is created immediately as its elements are defined. You can find more about these approaches in the library documentation or in the book by Bynum et al. (2021). Throughout this article, we will adopt the Concrete model formulation.

model = pyo.ConcreteModel()

Then, let us instantiate our Sets. Parsing the iterables directly from dsatur attributes N and C is valid, so one could use them in the initialize keyword arguments. Alternatively, I will pass the original nodes and edges from our input data and create a range from the DSatur as our initialization for colors.

model.C = pyo.Set(initialize=range(len(dsatur.C)))  # Colors
model.N = pyo.Set(initialize=nodes) # Nodes
model.E = pyo.Set(initialize=edges]) # Edges

Next, we instantiate our decision variables.

model.x = pyo.Var(model.N, model.C, within=pyo.Binary)
model.y = pyo.Var(model.C, within=pyo.Binary)

And then our constraints.

# Fill every node with some color
def fill_cstr(model, i):
return sum(model.x[i, :]) == 1


# Do not repeat colors on edges and indicator that color is used
def edge_cstr(model, i, j, c):
return model.x[i, c] + model.x[j, c] <= model.y[c]


# Break symmetry by setting a preference order (not required)
def break_symmetry(model, c):
if model.C.first() == c:
return 0 <= model.y[c]
else:
c_prev = model.C.prev(c)
return model.y[c] <= model.y[c_prev]


model.fill_cstr = pyo.Constraint(model.N, rule=fill_cstr)
model.edge_cstr = pyo.Constraint(model.E, model.C, rule=edge_cstr)
model.break_symmetry = pyo.Constraint(model.C, rule=break_symmetry)

You can try and include other symmetry-breaking constraints and see what works better with your available solver. In some experiments I performed, including a preference order using total nodes assigned to a given color has been worse than ignoring it. Possibly due to the solver’s native symmetry-breaking strategies.

# Break symmetry by setting a preference order with total assignments
def break_symmetry_agg(model, c):
if model.C.first() == c:
return 0 <= sum(model.x[:, c])
else:
c_prev = model.C.prev(c)
return sum(model.x[:, c]) <= sum(model.x[:, c_prev])

At last, the objective…

# Total number of colors used
def obj(model):
return sum(model.y[:])


model.obj = pyo.Objective(rule=obj)

And our model is ready to be solved! To do that I’m using the HiGHS persistent solver, which is available in pyomo in case highspy is also installed in your Python environment.

solver = pyo.SolverFactory("appsi_highs")
solver.highs_options["time_limit"] = 120
res = solver.solve(model)
print(res)

For large instances, our solver might have a hard time trying to improve heuristic solutions. However, for a 32-node instance available in the code repository, the solver was able to reduce the number of used colors from 9 to 8. Worth mentioning that it took 24 seconds to complete execution whereas the DSatur algorithm for the same instance took only 6 milliseconds (using pure Python, which is an interpreted language).

Graph coloring integer programming solution for 32 nodes instance. (Image by the author).

Further reading

Although DSatur is an intuitive heuristic that provides fast good quality solutions, others might lead to better results, especially on complex instances. One of the most famous metaheuristics for the Graph Coloring Problem is Tabucol (Hertz & Werra, 1987). The authors propose a method that starts from an initial solution with k colors and possibly edges connecting nodes of the same color. Then, local moves changing the color of a given node are performed in a sense that minimizes violations, using a Tabu List to escape local optima.

More efficient exact methods for the Graph Coloring Problem than the one herein presented rely on Column Generation, using an algorithm denoted as Branch & Price. The reader interested in an introduction to the subject might find a concise overview in my other Medium story.

Conclusions

In this article, two approaches for solving the Graph Coloring Problem were presented: The constructive heuristic DSatur (Brélaz, 1979) and an Integer Linear Programming (ILP) model. The heuristic was able to provide fast good quality solutions for a moderate size instance, of which the solution was further improved using the ILP model. The code implemented is available for further use in a public repository.

References

Brélaz, D., 1979. New methods to color the vertices of a graph. Communications of the ACM, 22(4), 251–256.

Bynum, M. L. et al., 2021. Pyomo-optimization modeling in Python. Springer.

Hertz, A., & Werra, D. D., 1987. Using tabu search techniques for graph coloring. Computing, 39(4), 345–351.

Jensen, T. R., & Toft, B., 1995. Graph coloring problems. John Wiley & Sons.

Lewis, R.M.R., 2021. Advanced Techniques for Graph Colouring. In: Guide to Graph Colouring. Texts in Computer Science. Springer, Cham. https://doi.org/10.1007/978-3-030-81054-2_4

--

--

Chemical Engineer, Researcher, Optimization Enthusiast, and Data Scientist passionate about describing phenomena using mathematical models.