In this article, we illustrate the concept of integrating the goal of optimizing a business metric with a machine learning pipeline. Specifically, we show a case illustration of how a simple optimization loop can be wrapped around a core ML algorithm for guiding it toward achieving a specific business objective.
Introduction

Machine learning (ML) is in serious demand. This branch of AI is upending traditional and modern Business practices and bringing once-in-a-lifetime transformative changes.
The Machine Learning Revolution: How Artificial Intelligence Could Transform Your Business
However, scholarly discussions and online articles about ML algorithms, tools, and techniques, often focus exclusively on their implementation, performance, deployment, and scalability.
The business side of ML is also discussed at length, but often those discussions are somewhat distant from the core algorithm or the ML pipeline. It is not easy to find a simple illustration of an ML pipeline which integrates the dual goal of achieving decent ML performance and satisfying a fairly common and intuitive business objective.
However, large corporations are constantly doing this very integration in their operations – applying ML and tuning the pipeline to propel it toward satisfying the overall business objective (or a subset thereof).
For young data science and ML practioners, hwoever, it is not very intuitive, how to demonstrate this idea with a simple enough example – perhaps a few lines of code.
In this article, we show one such example.

The machine learning problem: High accuracy
For the case illustration, we will imagine a very simple scenario – a brand new startup company generates revenue by offering ML-based predictive analytics.
In particular, they receive a data stream and predict a binary class as output – ‘Yes’ or ‘No’. At the core of their service, they use a decision tree-based classification model with an ensemble technique – using AdaBoost classifier.
Higher the accuracy of their prediction, higher the revenue.
Of course, one can make a random prediction (without any model underneath) and still get about 50% prediction right. So, they only get paid for higher accuracy above a certain threshold, which can be as low as 50%. For example, if their prediction accuracy is 75% for some a job, then they get paid a revenue proportional to 75% – 50% = 25%.
But how do they achieve higher accuracy?
An easy and intuitive answer is by tuning the hyperparameters of the algorithm. There are quite a few hyperparameters in an algorithm like AdaBoost – mostly related to the underlying base estimator. For a decision tree, these can be the minimum number of samples per leaf, maximum tree depth, splitting criterion like Gini index, etc. However, to keep this case illustration simple, we choose the most intuitive hyperparameter – the number of tree estimators applied to the boosting.
At their core, ensemble techniques like Boosting work by keeping the base estimator relatively simple and low accuracy (slightly above 50% is fine). They achieve a robust generalization power by employing a large number of such simple base estimators in parallel and averaging their predictions and dynamically updating focus on the examples that the estimators got wrong in the previous iteration.

It should be, therefore, no surprise that a higher number of base estimators may lead to greater generalization power – higher accuracy on the customer data (the true unknown test set).
Previously, we established the simple scenario that revenue is directly proportional to the level of accuracy in the predictions for the customer-supplied data.
So, it seems that the strategy to maximize ML model accuracy, and thereby, maximize the revenue of the company, is to keep the individual base estimators real simple – choosing the maximum depth of trees something like 2 or 3 – and employing a large number of them.
It seems a straightforward strategy. But, it may not be an optimum one.
Revenue is not profit. In all likelihood, the young startup company would like to maximize profit and not just focus on revenue because that shows their long-term viability and help get them more investment and more customers.
Let’s drill down to the profit aspect a bit.
The business objective: Maximize profit
Profit is the king (in most business situations anyway) and a very good indicator of the economic value added for most types of business. Nobody likes it better when they can earn a decent revenue with a low operating cost.
We understand the relationship between the ML model and the revenue. But, how is the operating cost related to the model?
In a real-life scenario, it can be pretty complicated. But for the sake of case illustration, we can simply assume that the cost is proportional to the total compute time for model fitting and prediction.
This is not very hard to imagine, as the case would be similar if the young startup firm had rented some kind of cloud service to host and run their ML algorithms e.g. AWS EC2, which is billed based on the total compute time.
Now, do you remember the hyperparameter of interest – number of base estimators for the Boosting algorithm? The bad news is that bigger this number, larger is the computational load for the algorithm, and higher the compute time for model fitting and prediction.
Thereby, we identify the key relationships – between a single hyperparameter of the ML algorithm and two business metrics – revenue and cost.
And what is profit? It is the good old definition,
Profit = Revenue – cost
The optimization problem: How to choose the ML algorithm to maximize profit?
This is somewhat different than traditional discussions on ML algorithmic choice, isn’t it? You may have done the following many times,
- Bias-variance trade-off analysis
- Grid-search on hyperparameters to determine the best accuracy
- Debating the correct metric for the ML performance measure – accuracy/precision/recall? F1 score? ROC curve and AUC?
- Brainstorming the data acquisition and annotation strategy – does it make sense to do a particular feature engineering? does it make sense to pay for a few more annotations to increase the training set size?
And, all of these are still critically important.
But, from a business perspective, it could well be the case that the only thing you will be judged on, is how much profit your ML algorithm could generate. If it is a large positive number, higher management, most likely, won’t grill you on the algorithmic details. If it is negative, all hell may break loose!
So, we must do the following balancing act,

We constructed an extremely simplistic scenario, but at least, it shows that there is a fair chance that an algorithmic choice can be strongly coupled to a key business metric.
And what do good engineers do when they have a model parameter, which, impacts two outputs (accuracy and cost in our case) simultaneously?
They optimize.
They try to find the optimum setting of that model parameter which will maximize the business metric – profit in this case.
Let’s see how this can be done through a simple code demo.
Demo: Business-centered ML optimization
Code is, in fact, tedious to follow and can be distracting. Ideas and pictures are much better 🙂
Therefore, I’ll let you fork and copy the code for this demo from the Github repo here. But the main idea is as follows,
The hyperparameter of interest – the number of decision tree estimators – has been coded as an argument to an objective function, which an optimizer algorithm can minimize. The objective function value is computed taking into account both the Boosting algorithm accuracy on the validation set and a simple cost term proportional to the time it takes for model fitting and prediction.
Here is how the training and validation set accuracy varies with the number of decision tree estimators.

And here is the computation time (model fit and prediction),

Clearly, the accuracy starts at a low value for a small number of estimators but saturates after that number reaches a certain level. On the other hand, the computational load keeps on increasing.
Therefore, it does not make sense to keep increasing the number of estimators as the marginal rate of return (in terms of improving accuracy) peaks at a certain level and then goes down.
Ah… there it is… the famous Marginal Rate of Return, which is so near and dear to business folks.
Machine learning scientists, engineers, and business development teams finally have a common concept to ponder over, a common metric to plot and make decisions with.
To illustrate this behavior more clearly, we make up an objective function – which integrates both the accuracy and computing cost in a single scalar output by combining the validation set accuracy and the compute time in a linear function with proper weights. The MRR behavior can be clearly seen if we plot the objective function,


It is easy to see that the coefficient of accuracy is a positive number in this equation whereas the coefficient of the compute time is a negative number to reflect the true nature of the objective – cost subtracted from the revenue.
In the Github repo, I also show how to solve the optimization by calling a function from the Scipy package. For this particular example, the objective function is extremely simple and a simple plot will show the evolution to determine that the optimum number of trees to include in the ML algorithm is around 10 or 11. So, the use of a Scipy function is really not needed.
But the same idea can be extended to a more complicated objective function which encompasses a plethora of ML hyperparameters. And then, the full power of optimization algorithms can be brought to bear to solve the business-centric optimization.
See this article for Scipy optimization algorithms discussion,
Optimization with SciPy and application ideas to machine learning
Summary
In this article, we talked about business-centric optimization. A simple demo was discussed to illustrate the idea clearly – that often, the choice of ML algorithms and their settings, need to be guided by an overarching goal of business metric optimization. In this endeavor, wrapping the core ML algorithm with an optimization loop could be extremely handy for quick decision making.
In fact, in one of my previous articles, I talked about how this idea does not need to be restricted to a single type of machine learning, or even a single type of analytics process but can span over a variety of quantitive disciplines – ML model, statistical estimation, stochastic simulation, etc. – all feeding to a common optimization engine.

If you have any questions or ideas to share, please contact the author at tirthajyoti[AT]gmail.com. Also, you can check the author’s GitHub repositories for other fun code snippets in Python, R, or MATLAB and machine learning resources. If you are, like me, passionate about machine learning/Data Science, please feel free to add me on LinkedIn or follow me on Twitter.
Tirthajyoti Sarkar – Sr. Principal Engineer – Semiconductor, AI, Machine Learning – ON…