Let’s start with a possible real life challenge and I want to convince you that network theory and the mathematical tools that were developed within this field can solve this for you.
Imagine you work in the administration of a school for several years now. To make the exam schedules was painful but manageable with paper and pen up to now. But the school was growing really fast and the whole schedule arranging became overwhelming. There are so many students enrolled and you want that they all can attend their exams before they go off on holidays. How terrible it would be if students have to come back after holidays to take their exams. How are you going to solve this?
I claim that by the end of this article you can turn data from the wild which for example can be lists of students and the classes they attend and build an exam schedule where no students have the uneasy situation of colliding exam dates. The coloring problem is not just useful for exam schedules but many different situations. Maybe you come up with another idea, I would be glad to hear that from you in the comment section!

Before we start with this short tutorial, I want to share with you the learning goals for this article that you can check off while working along.
Table of contents
- Networks in python with library Networkx
- Graph coloring problem, what is it and how do you solve it?
- Solve practical example
Networks in Python
I assume you are familiar with network theory or also called graph theory. But if not let’s recap the most fundamental blocks in networks while getting to know the functionality of networkx.
import networkx as nx
If you don’t have networkx on your computer or in your virtual environment use
pip install networkx
or
conda install networkx
Let’s create a network with this library and call it network.
network = nx.Graph()
A network is made up from nodes and edges which are the connection between the nodes. Let’s add three nodes and two edges to our network.
To multiple nodes at once, we can provide a list of node names. In this case the nodes are called 1,2 and 3..
network.add_nodes_from([1,2,3])
print(f"This network has now {network.number_of_nodes()} nodes.")
This network has now 3 nodes ..
To add an edge between two nodes, name the first and the second node that you want to connect with each other.
network.add_edge(1,2)
network.add_edge(1,3)
We are also able to plot this network to get a visual understanding too.
nx.draw_networkx(network, with_labels=True)

So to summarize this paragraph, we know what a network consists of and how it can be built and visualized with networkx. If I wanted to describe this graph, I would say, this graph G consists of three nodes called 1, 2 and 3. In which node 2 is adjacent to node 1 and node 3 is adjacent to node 1. Node 1 is adjacent to node 2 and 3. Okay, enough for now. Let’s continue with the coloring problem.
Graph Coloring Problem
The Graph Coloring Problem is defined as:
Given a graph G and k colors, assign a color to each node so that adjacent nodes get different colors.
In this sense, a color is another word for category. Let’s look at our example from before and add two or three nodes and assign different colors to them.
network.add_nodes_from([4,5,6,7])
network.add_edge(1,4)
network.add_edge(1,5)
network.add_edge(2,4)
network.add_edge(3,5)
network.add_edge(5,6)
network.add_edge(7,3)
network.add_edge(7,6)
Okay let’s color this graph manually to solve the coloring problem.
color_list = ["gold", "violet", "violet", "violet",
"limegreen", "limegreen", "darkorange"]
nx.draw_networkx(network,node_color=color_list, with_labels=True)

Okay this is not what we want. Node 5 and 6 have the same color but are adjacent, also node 4 and 2 are adjacent but share the same color. We have to fix that.
color_list = ["gold", "red", "violet", "pink", "limegreen",
"violet", "darkorange"]

Okay this looks like a possible solution to the graph coloring problem. But you might ask yourself, how do I know how many colors I am gonna need? We now look at the toy example of making an exam schedule where this becomes clearer.
Practical Example
In this practical example, we try to find an optimal solution for the exam schedule of a single semester.
I created an artificial dataset that consists of 250 students which attend our fictive school that offers 5 majors, that also can be attended as minors. Each student is allowed to register for 5 classes, if she or he enrolled for a combination of a minor and a major (which is done by 10% of our students) then she or he takes 3 classes from the major and 2 from the minor. Otherwise if they study only a major, they choose all 5 lectures from their main subject. Each subject offers between 6 to 9 courses (classes).
I uploaded this dataset on kaggle with the following link. We download it and use pandas to read in the csv. The csv has the following structure. The rows show the students, and the column 1 shows the major, column 2 shows the minor if she or he has one. Column 3 to 42 are the different subjects.
import pandas as pd
student_data = pd.read_csv("synthetic_school_enrollment_data.csv")

Up to now, I didn’t tell you how we are gonna prevent students from having colliding exam dates. But now we have everything prepared to tackle this.
As I mentioned before, in the coloring problem, we want to prevent adjacent (neighboring) nodes from having the same color. In our example we want to avoid that students have to write two exams at the same time.
These two problems sound similar!
Thus we can come up with the idea, that we want to model the courses as nodes and exam dates as colors, and the nodes are connected if they share participating students. Therefore our exam scheduling is solved, when no neighboring courses/nodes have the same date/color. Okay, so let’s create a network that has our 40 courses as nodes and makes them connected if the share participating students.
Create a list of our 40 courses.
courses = list(student_data.columns)[2:]
Create a network object with networkx and add a node for each of the 40 courses
class_network = nx.Graph()
class_network.add_nodes_from(courses)
Let’s add edges to connect the nodes. An edge is drawn between two classes if class A shares at least one student with class B.
Each student attends 5 courses. I want to pack them into a list, so that I can later make edges between all the possible combinations in this list, since this one student cannot attend any of these 5 exams at the same time. Therefore I loop over the students and make a list for each them.
without_subj = student_data.drop(['Major', 'Minor'], axis=1) # We don't need major and minor for the moment
without_subj = without_subj.T # transpose
list_of_overlaps = []
for student in name_list:
list_of_overlaps.append(list(without_subj.loc[without_subj[student]].index))
The next step uses a library (itertools) with a cool function called combinations. First argument is the list from which you want to have combinations, and the second argument says of how many elements a combination is composed. I provided a little example, to make you familiar with the functionality.
import itertools
for pair in itertools.combinations([1,2,3],2):
print(pair)
(1, 2)
(1, 3)
(2, 3)
We thus loop over the list of overlaps that were created for each student and then we combine every course with every other course of this list. This enables us to take the pairs and form edges between them.
for sublist in list_of_overlaps:
for pair in itertools.combinations(sublist, 2):
class_network.add_edge(pair[0], pair[1])
This process resulted in 259 connections between classes.
n_edges_total = len(list(class_network.edges))
print(n_edges_total)
259
The formula that describes how many connections or edges are possible for one single graph is.

n_nodes = len(list(class_network.nodes))
n_edges_possible = (n_nodes*(n_nodes-1))/2
There are 780 possible edgees in this graph, our particular graph from the school example has 259, thus 33% of the possible edges are realised.
We can have a look at our school network that shows the courses and shared students.
fig = plt.figure(figsize=(12,12))
nx.draw_networkx(class_network, with_labels=True)

Okay let’s build our algorithm for the coloring problem. To start off, I want to say that this is a NP-complete-problem, meaning the solution can only be found with brute-force algorithms. So basically, what we do is:
- Order the nodes randomly
- Order the colors (if colors should represent dates, start with your first date at the top)
- Process the nodes one at the time, assign the first legal color from the list to our node
Since it is a NP-complete-problem we cannot get better than this greedy algorithm.
A greedy algorithm is a simple, intuitive algorithm that is used in optimization problems. The algorithm makes the optimal choice at each step as it attempts to find the overall optimal way to solve the entire problem.
And different orderings (step 1) can give different results for the same graph.
To demonstrate that, we will run the algorithm a couple of times to get an example.
If you are interested, this video gives explanation of the mathematical fundamentals. One thing I want think is important to say, is that the algorithm uses max n+1 colors at max, where n is the number of the highest degree in our network. The degree of a node v_i says to how many nodes a particular node v_i is connected to.
We can find out how many degrees our network has by calling network.degree and this can be converted into a dictionary and we take the max.
max(dict(class_network.degree).values())
Which is in our network 23. But remember, this is the upper-ceiling, maybe we can get better than this. Is 24 (23+1) actually a good number at all? We have to look at how many courses we offer, to see if we could consolidate the exam dates into fewer ones.
len(courses)
Oh we have 40 classes! So at least 16 dates can be saved, this is encouraging. Maybe fewer dates are possible? To be on the save side, I will prepare 24 different colors (from matplotlib), which is the maximal number of colors we will need.
colors = [
"lightcoral", "gray", "lightgray", "firebrick", "red", "chocolate", "darkorange", "moccasin", "gold", "yellow", "darkolivegreen", "chartreuse", "forestgreen", "lime", "mediumaquamarine", "turquoise", "teal", "cadetblue", "dogerblue", "blue", "slateblue", "blueviolet", "magenta", "lightsteelblue"]
We need also 24 possible exam dates to which we can assign the exams and a dictionary that translates from colors to datetime objects.
from datetime import datetime
dates = []
calendar = {}
for i in list(range(14,20)):
for j in list(range(10,18,2)):
date = datetime(2021, 6, i, j, 0)
dates.append(date)
calendar[date] = []
Our translating dictionary:
from_color_to_date = {col: dates[i] for i, col in enumerate(colors)}
And now we can write our greedy algorithm…
def greedy_coloring_algorithm(network, colors):
nodes = list(network.nodes())
random.shuffle(nodes) # step 1 random ordering
for node in nodes:
dict_neighbors = dict(network[node])
# gives names of nodes that are neighbors
nodes_neighbors = list(dict_neighbors.keys())
forbidden_colors = []
for neighbor in nodes_neighbors:
example.nodes.data()[1]
len(example.nodes.data()[1].keys())
if len(network.nodes.data()[neighbor].keys()) == 0:
# if the neighbor has no color, proceed
continue
else:
# if the neighbor has a color,
# this color is forbidden
example.nodes.data()[1]['color']
forbidden_color = network.nodes.data()[neighbor]
forbidden_color = forbidden_color['color']
forbidden_colors.append(forbidden_color)
# assign the first color
# that is not forbidden
for color in colors:
# step 2: start everytime at the top of the colors,
# so that the smallest number of colors is used
if color in forbidden_colors:
continue
else:
# step 3: color one node at the time
network.nodes[node]['color'] = color
break
run the algorithm
greedy_coloring_algorithm(class_network, colors)
It’s time to look at the results. Let me grab the color values of each node and pass it into a list called colors_node.
colors_nodes = [data['color'] for v, data in class_network.nodes(data=True)]
nx.draw(class_network, node_color=colors_nodes, with_labels=True)

This graph looks promising. It’s a little bit messy. But we can also look at the colors list to see how many colors or dates we finally need.
len(set(colors_nodes))
10! We only need 10 dates for our 40 exams. Who would have guessed that? For me that is amazing. Such a powerful tool at hand.
But as mentioned before, this algorithm does not reliably generate 10 categories, it will for sure generate 24 or less, but how many there will be, depends on the ordering in which we color the nodes.
Let’s make a short comparison with different orderings and see what we get, if we can get better than 10.
number = []
for i in list(range(0,50)):
greedy_coloring_algorithm(class_network, colors)
colors_nodes = [data['color'] for v, data in class_network.nodes(data=True)]
num_col = len(set(colors_nodes))
number.append(num_col)
[10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10]
Okay it seems that this network is an easy to solve regarding the coloring problem. We get the same result everytime.
Our final goal is to provide the principal of our school with a exam schedule. We assume the exam lasts 1.5h and the students get 30-minute break.
for v, data in class_network.nodes(data=True):
calendar[from_color_to_date[data['color']]].append(v)
max_number_exams_sync = len(max(list(calendar.values()),key=len))
rooms = ["Room "+str(i) for i in list(range(max_number_exams_sync))]
pd.DataFrame.from_dict(calendar, orient='index', columns=rooms)

Here we go! We can proudly give the exam schedule to the schoolmaster who will be surprised that we can finish all exams after just 2.5 days.
Conclusion
I claimed that you will be able to use some data and the graph coloring algorithm to automate difficult tasks for you. You have seen how you can create graphs with networkx as well as how to apply such a graph coloring algorithm in Python. I walked you through this rather theoretical algorithm with a nice application and if you work in a school that wants to prevent exam dates to collide, you have seen a possible solution. But if not I hope you still can transfer this knowledge to your domain. Let me know if it helped you finding a solution in one of the challenges you are facing.