
1. Introduction
We use optimization in our everyday lives, a lot. Sometimes we do it without even thinking about it.
Let’s say you are buying groceries, and you need to buy razor blades. Of course, if you buy 10000 razor blades you won’t have any money left in your bank. Nonetheless, if you buy 4 razor blades, you will need to buy razor blades again in 4 days (assuming that you shave daily) and you will spend more money on gas, which is not smart. So what we usually do is mentally solving an optimization problem.

We buy an amount that is something in between:
- Large enough to have a good amount of razor blades and be ok for a week or 2
- Small enough to still have money left in the bank account 🙂

Now, is it the real optimal value? Probably not. 🙂
You are in that supermarket, with that specific price, that specific number of available pack of razor blades. So you are exploring only a small domain of choices.
Moreover, you are not using any mathematical model. This means that, even if this model is actually optimal, you are not able to prove it. Which is not great.
Of course we are talking about buying groceries, and I believe you if you tell me that, at least when you are buying groceries, you don’t want to write any code or do any mathematical computations. Nonetheless, there might be situations where Linear Optimization can become handy.
2. Theoretical Background
In this article we will talk about Binary Linear Optimization. Let’s define the problem properly:
- Binary: it means that the questions we are trying to answer are not like "how many razor blades should I buy?", but more like "should I act this strategy or not?". This implies that the answer will not be a integer number between 0 and infinity, but the answer could be 0 or 1.
- Linear: it means that the solution is a linear combination of all the variables that we have and a cost/score vector.

For example, in this equation the cost (loss) function is the sum of the x_i values, which are again 1 and 0, and a cost vector which is made by c_i values, with i between 1 and N (in this case, we have N binary decisions to make).
3. Optimization: it means that we want to find the minimum or maximum of the loss function that we expressed above. In other words, the optimal solution that we want to finds is something like this (for a minimization problem):

Or something like this for a maximization problem:

You may have noticed that we have defined, in both the minimization and maximization problem, a X set. But what is it?
Well, imagine if I tell you that you have infinite money in your bank account. How many razor blades would you buy? I guess you would buy all the razor blades of the store, because you won’t have any problem of moving them (as you have infinite money to buy all the cars of the world), and you won’t have any problem of storing them (because you have infinite money to buy all the houses of the world).
Now, imagine that you are an alien and you don’t need to shave at all. As you are an alien, the solution of your problem will be to spend 0 money and buy 0 razor blades.
Now, of course, this is simply ridiculous. Because, of course, nobody has infinite money and everybody, at a certain point of their life, will need to shave. This is formally called a constraint of your problem. The constraints are the things that make all your optimization model tricky. Without any constraint, the answer is always be 0,+infinite or -infinite.
Now, why do we have to talk about binary linear optimization? Why does it have to be binary?
Well, it kind of works like this:

A linear problem is very easy, when it’s continuous. In fact, it exists a very well known algorithm to solve this kind of problems, and it is named "simplex algorithm". If the domain is continuous it is again relatively easy to solve it if the Loss function is convex. The problem is very hard if the loss function is not even convex.
As you see from the picture we define "hard" the problem when the problem is integer. Nonetheless, this doesn’t have to scare us! In practice, again using the simplex algorithm, we are pretty good at solving this kind of problem when the function is at least convex.
Bad News: Our problem is technically "hard", but as we will see, it will be solved in fractions of seconds. Nonetheless, it is important to consider that, when the number of decision variables and constraints increase we may not be able to have a verified optimal solution.
Good news: It’s not the case . 🙂 Our solution will be in fact optimal, as it is supposed to be for standard Linear Integer Optimization Problems
3. Hands On Example
As I promised, there will be an Hand On Example. I took a very famous problem, that is the Fantasy Soccer one. I used a different dataset and did things differently from the other blog posts that you will find online. However, a good alternative to the things you will see in this post can be found in this other (very good, in my opinion) article.
The dataset that we will use is public, it can be downloaded and used by everyone, and can be found here.
Let’s describe the problem by looking at it. First of all, this is what you will need:
Nothing more than some very well known libraries (numpy, pandas, matplotlib, seaborn,…) and Pulp, which is the library we will use for the proper optimization part.
You can install it, as always, like this:
pip install pulp
No big surprises so far 🙂 Let’s give a look at the dataset:
We have different Players with their cost, position, points, and a bunch of other features.
If you play fantasy soccer you have limited budget. That means that the sum of the costs of all the players can’t exceed a certain value (e.g. 100,1000,2000). Nonetheless, of course, you want your final team to have the highest number of points. This means, again, that you want the sum of the points of all your players to be the highest as possible. Moreover, you have to obey the following rules (listed in the official Fantasy Premier League page):
- 5 Defenders
- 2 Goal Keepers
- 5 Midfielders
- 3 Strikers
Let’s try to convert this into a mathematical model:
- Our variables. We have 628 rows. This means that we have 628 binary variables (1/0). We will have 1 if we select the player, 0 if we don’t select the player. The final result is thus the vector x made by 628 entries:

2. Limited Budget. Each player has a cost. This means that we will have a list of costs, let’s call it vector c. Given a budget B, the budget constraint is the following.

2. Role Constraints. We need to pick exactly 5 defenders, 2 goal keepers, 3 midfielders and 3 strikers. Let’s say that we have the following r vector:

Ok. This part is a little tricky. We define 4 different subsets of the players (X_1,X_2,X_3,X_4). The first subset is the subset of the defenders, the second one is the one of the goal keepers, the third one is the one of the midfielders and the fourth one is the strikers. Then we can say:

If you don’t work with math, or you are more a software engineer than a data scientist, this may be tricky, but the idea is very simple. We are just imposing that we have exactly 5 defenders, 2 goal keepers,… .
- Score Function. In our case, each player has a number of points. Of course, we want to maximize it. Let’s call this vector of points p. Thus, the loss (score) function is the following:

And our optimal solution is:

Where X is the domain that obeys all the constraints we discussed about.
Everything is now rigorously defined. Let’s start with the coding part.
First of all, let’s create our dataset of interest:
And plot the distribution of the roles of the player:
Now, let’s define the role constraints.
We have stored them into a dictionary. We will now define our model in a parametric way. It means that we will get better squads as we will allow the budget of our team to increase. 🙂
With this function we will print the actual optimal team given a certain budget:
Let’s explore three scenarios:
- Low Budget. B=80
- Middle Budget. B=100
- High Budget. B=3000
Mathematically, the domain we are exploring gets bigger. This means that we should find better solution (largest sum of points) and consequently better players. Let’s see if it’s the case.
Now, there are a bunch of properties that are printed, but there is one which is the most important:
(B=80) Objective value: 1676
(B=100) Objective value: 2007
(B=3000) Objective value: 2266
So we are right indeed. The number of points of our team increases as the budget becomes bigger. Nice 🙂
Let’s print the results:
Best team with B=80:
Best team with B=100:
Best team with B=3000:
And it’s funny how I know almost all the players when B=3000, some of the players when B=100 and almost none of the players when B=80 🙂
Now of course we can add multiple constraints. Consider this approach for the bench as well. Add some smarts considerations about the points, the probability of injuries, the probability of the player being sold,….
Nonetheless, it is still fascinating to see that few lines of code (I don’t know, maybe 20?) can solve, a very challenging problems with hundreds of variables. I don’t believe that this could be a super intelligence (actually I’m sure it is not), but it certainly is theoretically interesting and practically insightful for all the Fantasy Soccer Manager out there 🙂
4. Conclusions
If you liked the article and you want to know more about Machine Learning, or you just want to ask me something you can:
A. Follow me on Linkedin, where I publish all my stories B. Subscribe to my newsletter. It will keep you updated about new stories and give you the chance to text me to receive all the corrections or doubts you may have. C. Become a referred member, so you won’t have any "maximum number of stories for the month" and you can read whatever I (and thousands of other Machine Learning and Data Science top writers) write about the newest technology available.