Optimization is at the heart of almost all machine learning and statistical techniques used in data science. We discuss the core optimization frameworks behind the most popular machine learning/statistical modeling methods.

Disclaimer: Many equations and optimization formalizations in this article, are sourced from Georgia Tech’s Online Masters in Analytics (OMSA) program study material. I am proud to pursue this excellent Online MS program. You can also check the details here.
Introduction
Often, newcomers in data science (DS) and machine learning (ML) are advised to learn all they can on statistics and linear algebra. The utility of a strong foundation in those two subjects is beyond debate for a successful career in DS/ML. However, the topic of optimization, although less celebrated, is equally important to any serious practitioner of data science and Analytics.
To put it mildly, without a good understanding of this topic, our modern world view of a data-driven culture and life, remains incomplete.
This is because optimization is at the heart of every major business, social, economic, and – dare I say– personal decision, that is taken by an individual person, collective representation of people, or intelligent machines and software agents.

Everyday examples
You are optimizing variables and basing your personal decisions all day long, most of the time without even recognizing the process consciously –
- Scheduling the order at which you will answer the emails
- Switching to a new route back home to minimize traffic woes
- Trying to squeeze out the optimum gap between lunch and afternoon meeting to have a quiet walk around the office campus
Sounds familiar? Read about an interesting illustration here…
Each of these seemingly personal decisions can be modeled precisely using cold, hard mathematics to show that our brain is an amazing optimizer solving these computationally complex problems all day!
But, moving away from the realm of personal, now to the question about data science…
How important is optimization to data science?
Extremely important.
Basic science, business organizations, and engineering enterprises have been using optimization techniques and methods for long. In this sense, almost every engineering product is a compact physical (or virtual) form of a solution of an optimization problem. Engineers are specifically trained to work under resource constraints and to produce ‘good enough‘ solutions from incomplete or noisy data or input. Essentially they solve optimization problems every day with computers, semiconductor ICs, furnaces, or combustion engines.
Same goes for business organizations. Virtually every business decision is taken with the aim of maximizing some form of gain (e.g. profit margin or IP leadership) under the constraint of time, budget, space, and legal and ethical boundaries. These are all some form of optimization problem or other.
Become Data-Driven or Perish: Why your company needs a Data Strategy and not just more Data People
Today, almost all business and technology are being impacted by the new paradigm change brought forth by the advent of data science and machine learning. This, however, does not change the fact that the fundamental natural and human resources still remain finite. There is still 24 hours a day. And legal and ethical bounds are not going away anytime soon.
Advanced techniques of Artificial Intelligence or machine learning may be able to guide businesses toward a better optimal solution at a faster clip, but they must confront and solve the same (or more complex) optimization problems as before. A deluge of new data will aid this process but the expectations will also grow as time goes by.
As an extremely simplified example, if in the past, an engineering team had access to 1 GB of data and could produce an optimal solution at a cost of 10 dollars, they will be expected to reduce that cost to 7 dollars if I am given a ‘richer’ data set of 10 GB. Otherwise, what’s the point of Big Data?

Therefore, it is crucial for a Data Science/machine learning practitioner to have a sound knowledge about the theoretical underpinning of the optimization frameworks, used for common statistical/machine learning algorithms –
- how to use the data effectively,
- how to estimate the computational load for processing a large data set,
- how to avoid local minima and search a good solution from a complex multi-dimensional space.
In my article on an essential mathematics background for data science, I discussed the role of optimization and some online courses which you can take to a good grip on this topic. Read it here.
Basic elements of optimization
There are three basic elements of any optimization problem –
- Variables: These are the free parameters which the algorithm can tune
- Constraints: These are the boundaries within which the parameters (or some combination thereof) must fall
- Objective function: This is the set of goal towards which the algorithm drives the solution. For Machine Learning, often this amount to minimizing some error measure or maximizing some utility function.
The rest of the article focuses on some basic, widely used statistical model and ML algorithms and show the optimization framework at their heart with the above-mentioned elements.
Simple Linear Regression

Also, note the following distinction in regression, from two points of view,

Regularized Linear Regression

Logistic Regression

Support Vector Machine

Time Series Analysis – Exponential Smoothing

Time Series Analysis – ARIMA

K-means Clustering

Deep Learning/Neural Network
Most neural networks work by optimizing the weights of the connections between neurons by the back-propagation techniques. Advanced optimization methods are employed to ensure finding a good solution with a high probability of convergence.

Reinforcement Learning (RL)
RL is at the heart of any modern AI agent/system. If you have heard about the famous AlphaGo program from Google, which defeated the best human champion at the ancient board game Go, you can be assured that some really advanced optimization technique was behind all that ‘machine intelligence’.
Solving an MDP with Q-Learning from scratch – Deep Reinforcement Learning for Hackers (Part 1)
A little detour – optimization utilizing machine learning
There can be exciting optimization problems which use machine learning as the front-end to create a model/objective function which can be evaluated/computed much faster compared to other approaches. This is, of course, differs from the main discussion point of this article. but nonetheless shows the intricate interplay, that is possible, between optimization and machine learning in general.
As an illustration, the update formula (e.g. in a gradient descent) in an optimization framework may use a neural net in place of complicated functions.
One of the applications of this approach is to replace time-consuming simulation models by machine-learning mapping functions in an optimization loop, where thousands of input variables are fed into the simulation model and we are looking to somehow find the optimized set of parameters for best simulated output. The idea is illustrated below,

Summary and Other Methods
In this article, we discussed the general role of optimization in modern business and engineering enterprises and why having knowledge about it is becoming critical for data science.
We showed the basic optimization model, that resides at the heart of some widely popular statistical techniques and machine learning algorithms.
With some simple search, you will find similar optimization framework at the heart of other popular ML methods such as,
- Expectation maximization
- Deep learning/Neural network (How gradient descent work)
- Genetic algorithms
- Simulated Annealing
If you have any questions or ideas to share, please contact the author at tirthajyoti[AT]gmail.com. Also, you can check author’s GitHub repositories for other fun code snippets in Python, R, or MATLAB and machine learning resources. If you are, like me, passionate about machine learning/data science, please feel free to add me on LinkedIn or follow me on Twitter.
Tirthajyoti Sarkar – Sr. Principal Engineer – Semiconductor design, AI, Machine Learning – ON…