Improving your Machine Learning Model Performance is sometimes futile. Here’s why.

Improving model performance doesn’t warrant business growth.

Published in

Towards Data Science

18 min readDec 1, 2020

Model Performance increase doesn’t always mean business growth. Monitoring and correlating AI model metrics with the business KPIs help in bridging the gap between performance analysis and business growth, integrating the whole enterprise to function more efficiently towards a set objective. Coordinating different departments with the data analytics department enforce alignment of the technical data-analytical goals with the business objectives ensuring progress, harmony and no cross-purpose work. It is essential to view every improvement in the machine learning pipeline through the lens of KPI, this helps in quantifying what factors affect the business growth making the data scientist or engineer wary of how to tweak the model for optimum business growth.

Machine Learning is a great analytical tool helping to optimise and scale any business nowadays, it draws critical data-driven business insights and enables better decision making. Effective use of data analytical models enable businesses to leverage data for optimised and rapid business growth.

The model performance gives a measure of how well a Machine Learning model is performing. But, is it all about the accuracy of the model’s predictions? The answer is No. Model performance is an assessment of the model’s ability to perform a task accurately not only with training data but also in real-time with runtime data when the model is actually deployed through a website or an app. It is necessary to evaluate performance to spot any erroneous predictions like drift in detection, bias, increased data inconsistency. Detection is followed by mitigation of these errors by debugging, based on its behaviour to ensure the deployed model is making accurate predictions at the user’s end and is resilient to data fluctuations.

AI Model metrics are measured and evaluated based on the type of model: linear regression, binary classification etc., to yield a statistical table which enlists all the metrics and which becomes the basis of model performance. These are the main metrics which are measured to assess an AI model’s performance:

Accuracy: It is the measure of the number of correct predictions made by the model. It is the ratio of correct predictions to total predictions.

TP= True Positive: It is the value when a model correctly predicts the result as positive.

TN= True Negative: It is the value when a model correctly predicts the result as negative.

FP= False Positive: It is the value when a model incorrectly predicts the result as positive.

TN= False Negative: It is the value when the model incorrectly predicts the result as negative.

Confusion Matrix Representation. Image by author

2. Confusion matrix: It is a tabular representation of the predicted value and the actual values of the dataset. The matrix created is meant to provide a better understanding and clear visualisation of the models’ results to avoid any ‘confusion’.

3. Precision and Recall:

§ Precision: It gives us information about how correctly the model has detected the positive outcomes. It is the ratio of true positives to the total positives.

§ Recall: It is the measure of the positive points predicted with respect to all of the actual positive points( including false negatives). It actually communicates the offset in correctly predicting the positive values by the model.

§ The Precision and Recall curve(PR) Curve: This curve is the correlation between the precision and recall at particular cut-off values. These cut off value are set according to the particular model: e.g. for a disease prediction model, a threshold value will be chosen to distinguish prediction of Disease A from Disease B. This curve shows how well the model is sensitive and correct in predicting the positive points. A large area under the curve accounts for a large precision and recall value, which is favourable as it means the model is correctly predicting true negatives and is sensitive to false negatives(with a lower value of false negatives).

4. F score: It is the measure of the harmonic mean of precision and recall. F score is a result of integrating these parameters into one for a better understanding of the accuracy of the model. F-score can be modified into F, 0.5, 1,& 2 based on the measure of weightage given to precision over recall. Below is the formula for it.

5. Logarithmic Loss: This measure keeps track of any false labelling of the data class by the model and penalises the model if deviations in probability occur into classifying the labels. Low log loss values account for high accuracy values.

6. Specificity: It is the measure of the negative points predicted with respect to all the actual negative points( including false positives). It is similar to recall but for negative predictions, it actually communicates the offset in correctly predicting the negative values by the model.

7. ROC curve & AUC: It is the receiver operating characteristics curve which is plotted between True Positive Rate(TPR) and False Positive Rate(FPR). It is a probability curve which helps in visualising the binary classification model by yielding information on how well the model can distinguish between the classes e.g. whether a patient has Disease A or not etc. AUC is the area under this curve, higher the area better is the model’s ability of classification.

By improving the measurement of these metrics e.g.: increasing AUC, minimising log loss, Improving recall & specificity by decreasing the FP & FN, model’s performance is improved. Even though in academia, the improvement in model performance seems very promising and revolutionising, it isn’t the case in the business world. Spoiler alert: The business world doesn’t care about high accuracy, low drift, an improved area under ROC, more explainability models more than it cares about their business Key Performance Indicators(KPI). KPIs are quantified measurements of factors affecting a company’s objectives. KPIs, basically, embody a focus on the strategic and operational improvement of a business objective or target. There are several KPI’s which are branched on the basis of revenue growth, profitability, strategic planning: sales, marketing, etc. These give a detailed look at how well is a particular department performing or the whole organisation with reference to its set target, creating an analytical basis for an optimised decision making.

In order to understand KPIs better, let's delve a little deeper.

There are mainly two types of KPIs:

1. Leading indicator: This indicator keeps track of future performance in terms of existing metrics. e.g. growth in the sales pipeline, rate of increase in subscription of a particular website. Keeping track of these indicators enables a company to curate a path before actually getting there and ensures the company stays relevant even in the future. It’s more like ‘prevention is better than cure’. Companies prepare themselves for the future timeline by monitoring and acting on early signs of a possible effect which might follow.

2. Lagging indicator: This indictor enables the company to formulate existing strategies to boost business growth by analysing and drafting plans based on past activities. These indicators are grounded in more concrete evidence of data analysis drafted from the historical data. Since it is mainly the analysis of past performance for the present, most of the strategies, unlike leading indicators, are curated as a part of damage control only after the repercussions are witnessed. e.g. customer satisfaction, costs associated with a product.

Establishing KPIs

A business recognizes and gauges its objectives/goals in terms of some specific and exclusive metrics(the KPIs) in order to organise and streamline its workflow for the essential objectives which are critical for business performance. Different metrics are curated when assessing a business and its objectives which might apply exclusively to a particular business based on its type, strategy and business model. Even within a single business enterprise different KPIs are assigned by different departments based on their specific goal for sales, finance, branding, customer grievances & marketing etc. However, there are different metrics associated with a goal in a business but only those metrics are determined as KPIs which are integral and critical in determining the sought objective e.g. In e-Commerce, costs per product, site traffic, wish-listed items, product views, wish-listed but not bought, pageviews per visit. There are a plethora of metrics to be monitored but only selected few will have a substantial impact on sales and growth, thus making them worthy of monitoring. Wish-listed items, product views and wish-listed but not bought are only metrics and not KPIs. All KPIs are metrics but not all metrics are KPIs. Monitoring only the significant and relevant ones saves time and resources.

Here are some examples of KPIs of different business enterprises:

Gaming app/website business:

Retention Rate: This metric is the percentage of active players over a period of time. It is important for revenue generation. Therefore, useful in assessing the app’s performance.

Cost Per Install (CPI): It is the cost of one installation of the app. It signifies the price of acquiring a new user.

e-Commerce business:

Pageviews per visit: Average no. of pages user visits during a single site visit. A high value would indicate an unsatisfactory user experience due to the enormous digging the user had to do to reach what they want.

Returning customer order: It measures the orders of an existing customer which is essential to keep track of, for brand value and growth.

Digital Media/Publishing business:

Ad clicks per visit: It measures average ad clicks made by the user during a single visit. Revenue generation is directly affected by this metric and improving website to yield more ad clicks would straightaway lead to increased revenues.

Average viewing time: This metric is the average time a user spends on a particular post. This metric reports on customer engagement, habits and choices.

Correlating KPI and Machine Learning model performance

Data Science has become an integral part of any business model capable of scaling any product or service offered by the business. According to 2017 Big Data Analytics Market Study, a 37% increase in adopting big data in businesses happened only during a short span of 2 years from 2015–2017. This is increasing every year.

Any machine learning model after deployment in a business becomes a product or service. This product/service is directly interacting with the user and a company’s business is at the behest of its user’s experience. There are ML models which work internally(not directly with the user as a product or service) to improve user’s experience or create brand awareness only to support companies’ main objectives. Improving model performance doesn’t always support the business objective. It seems intuitive that improving the model based on the performance metrics will yield a favourable outcome. However, it is witnessed by data scientists that it is not always the case. The differentiator here is the business KPI’s, oftentimes optimising the model have no result on the KPI’s which means the improvements are no good for a business stakeholder of the company, in fact, it would be viewed as a waste of resources and time which is equivalent to money in the harsh business world. KPIs lay like a wall in front of the model, in order to reach the business objective it must affect and cross the KPIs. An example would elucidate this issue:

A digital publishing business has a website which publishes articles and generates revenues through advertising and subscription fees for premium users. The data analytics team comes up with a six-month project to focus on providing accurate article suggestions to retain the attention of users, which requires to improve the recall, lower the false positive rate and logarithmic loss. After six months of arduous work, the model performance is improved with over 95% accuracy in labelling the user data into correct classifications to curate precise suggestions. After a month of deployment of this new model, the business analysts observed a decline in revenue generated through ads. After setting up a team to spot the problem and devise a solution, the team observed a substantial decrement in ad-clicks per visit and pageviews per visit, which after the critical analysis was revealed to be the aftermath of the new model performance increase.

Since the model provided way accurate suggestions for the readers, they no longer had to scour through multiple articles to find the one that interests them the most. Even though earlier, users were receiving fairly accurate suggestions they still read several articles in search of ‘the most suitable one’, which kept them fairly engaged, satisfied and meanwhile triggered their urge to click on the ads popping up. But now they get what they want from the get-go and don’t look any further which doesn’t prompt many ad-clicks and viewers spend less time per each visit. This change in the suggestion model affected business revenues. Time, resources and money lost inflict huge penalties on a business having long term effects which are sometimes hard to recover from.

Model of Integrating Business Enterprise. Image by Author

A business enterprise is analogous to a body which requires channelling of electrochemical signals and energy, similar to information, core objectives, resources and data into different faculties, similar to departments, for proper functioning which supports the well-being of the whole body. In order to prevent the data scientists and engineers focus on trivial business performance boosting objectives, monitoring and correlating model metrics with business KPI’s are a must. Coordinating different departments with the data analytics department enforce alignment of the technical data analytical goals with the business objectives ensuring progress, harmony and no cross-purpose work. This coordination is achieved by meticulous monitoring of model metrics with respect to the business KPIs, enabling prioritized objective settings, strategy building, metrics re-evaluation and re-determination over time, cascading KPIs etc.

These are the steps of aligning KPIs with AI model metrics and perform correction of the errors:

Steps of Aligning KPIs with AI Model Metrics. Image by Author

1. Monitor: Information is collected on the identified business KPIs and AI model metrics individually.

2. Identify: These data values are then identified and classified into different sets of labels like sales, marketing, HR, Support, Customer grievances, Social media branding etc.

3. Evaluate: Different KPIs and model metrics after classification are now to be evaluated for linking them with each other. Linking the two requires rigorous evaluation of the effects of model metrics on KPIs based on historical data.

4. Correlate: After effective evaluation, the model’s metrics are simply correlated with the KPIs. This correlation is visualised using a correlation scatterplot or line graph. Changes in accuracy and data inconsistency could prompt serious fluctuations in KPIs which are spotted quickly due to visualising the correlation.

5. Debug: This step involves tracing the KPI error e.g. drop in sales, revenue per customer, low site traffic back to AI model metrics, which could be possible causes of deviation, e.g. drift in data, accuracy, bias or high false predictions. After tracing the cause of errors, fixing the model by re-scoring or re-evaluating is done.

The scorecard of a business enterprise is balanced by making sure of certain factors when monitoring KPIs and AI model metrics. However, there are several nuances to enable effective monitoring of KPIs with model metrics.

Nuances of monitoring KPIs and model metrics:

1. Establishing a link for tracking: Critical business objectives are translated through KPIs. Monitoring and correlating the KPIs with model metrics establish a link which enables engineers/business managers to achieve their milestone. For example, a gaming app team makes its gaming software more responsive and fast. After some time, the retention rate had a sharp decline which, only after the intervention of the data analytics team, took months to discover why. As software became faster, consequently Cost per Install also increased but the business team is unbeknownst of these facts due to lack of coordination, proper monitoring and correlation. Establishing a link between AI model metrics and the KPIs helps to bridge the gap between the functioning of data analytics and effects of data analytics on the business, integrating the whole enterprise to function more efficiently towards a set objective.

2. Priority check: An eCommerce website has low returning customer order and high pageviews per visit(PPV). The machine learning model deployed could either improve suggestions to decrease pageviews per visit or devise strategies to engage the existing customers whilst attracting new ones by improving social media interaction, providing discounts, specialised offers, lowering customer grievances etc. The course of action is decided on the priority ascribed to the KPI with respect to the model metrics. Only after careful examination of the reverberating impacts of different metrics and KPIs, a priority chart can be developed for business growth. Here, working on increasing returning customer orders as a top priority is more reasonable as it has more leverage over business growth than PPV. Also, data inconsistency can drive faulty results in the AI system which affects the KPIs too. Similarly, bias detection compromises user confidence in business and brand value. These fluctuations are to be dealt on a priority basis, enabling the highest prioritised metric to receive most resources, money and time. Priority check enables the business manager to check how much resources to spend on when a problem arrives.

3. Correlating and cascading for a more focused strategy: Different model metrics prompts changes in KPIs. There are different KPIs which translate to a specific measure prompting a particular effect. E.g. if revenue per visitor and returning customer orders are low, it signifies of a single issue. Once these KPIs are correlated and cascaded within the system, the burden from the system is reduced as only specific KPIs would be monitored which bring about distinguished effects in the system. Then, it would be easier to spot any changes in the model performance metrics in order to track causes or to fix the fluctuations. Monitoring all metrics means the system isn’t monitoring anything. Key events and KPIs are measured whilst making changes in the AI models, it helps to correlate the necessary cause-effect relationships for strategic evaluations of business performance.

4. Creating balance: In the example of the digital publishing business, improving the suggestions to a highly accurate degree produced counter-intuitive effects and decreased revenues partly because the system was already providing fairly good suggestions and the need of improvisation was unwarranted. Therefore, it is essential to strike the balance between the need for improving model metrics. Improving AI model performance must be backed with its suitable correlation to one of the KPIs, which helps to look at the larger scheme of business growth. It is important to address the underlying issues for any permanent improvement otherwise quick-fixes extenuate the problem further e.g. to increase revenues, minimising costs might seem like a good solution but doing so at the expense of reducing the quality will hurt the business in the longer run. Therefore, in order to strike a proper and calculated balance within the enterprise where an improvement doesn’t tailspin into a business failure. Enabling monitoring of KPIs with model metrics helps create balance by looking at what performance improvements to focus on and what to let go.

5. Fast error detection and debugging: The tedious task of spotting errors, filtering, sorting potential causes helps debug the system. Often inconsistency in data or drift in data predictions drive changes in the KPIs. A correlated system with proper mention of model’s explainability would directly indicate the possible causes which help to nip the problem in bud and stops any further proliferation into other departments. Re-scoring the model becomes the final step to eliminate any problems.

6. Alerts enabled: The correlated information records past surges and errors in the model prediction with its respective effects on the business in terms of KPIs. Therefore, a threshold value can be set for any data fluctuations, bias, missing value, mislabelled values or outliers enabling a warning before any possible damage. Every objective attained acts as a new starting point built upon the existing data and errors for future evaluation.

Importance of Correlating KPI with Model Performance Metrics

Visualising correlation graph of model metrics and the KPIs can reveal certain beneficial insights and behaviours which were left unnoticed lurking amidst the huge stockpile of data. The correlations between the KPI and AI models are able to reveal any drift in the accuracy in runtime with respect to a particular KPI and then analyse these observations to highlight the factors affecting the most and least to business success. This revealed information can be taken further by performing tests and improvising the identified factors for better business performance. Correlation enables to spot any dependencies between KPIs and model performance factors as well as missing data, data outliers and clusters, after which debugging any error within the model becomes easy for data engineers. In order to understand better, let’s look at the graph of the digital publishing site we mentioned earlier.

Correlation graph of KPI with Model Metrics. Image by Author

Once different KPIs are cascaded or correlated with model metrics based on priority, cause-effect and link, tools can be developed for careful monitoring of these metrics. Here, in this graph ‘blue line’ indicates ad-clicks are rising with viewing time which is favourable for KPI as it would promote business revenue. However, ‘orange line’ has lower ad-clicks with increasing average viewing time which amounts to losses in revenue, thus unfavourable. A graphical interpretation of metrics and KPIs help in visualising and spotting of errors back to its roots. Similar graphs of various model metrics and KPIs can be visualised to look for possible reasons of decrement in ad-clicks. Enabling correlation helps to indicate the severity as well as potential causes of the anomalies. We understand through this graph that increasing model performance by improving suggestions could have detrimental effects on KPIs and it could be easily monitored and thus corrected. We can identify the bad transactions in the model which gives skewed results due to data inconsistency and re-evaluate our model accordingly to eliminate erratic results of the model.

The AI models don’t source data from a reservoir with a constant influx, it can change subtly or erratically. Even the perfectly curated model would require rescoring or remodelling after several years in production due to rapidly changing demands of users or catastrophes like a pandemic. Data inconsistency affects business in terms of KPIs due to poor performance of AI models. Therefore, monitoring becomes even more necessary in times when a pandemic has already hit us which incurred changes in customer shopping preference, article reading choices, entertainment choices, induced market volatility, increased loaning risks etc. The best model is the one who adapts to these changes which are only possible after careful monitoring, correlation and evaluation. Monitoring helps to react early before any major reverberating damage has been done. Certain tools like Splunk IT Service Intelligence (ITSI) solution, IBM Watson OpenScale help examine properties of the model with the relevant Business KPIs to interpret errors.

AI for AI

The idea AI for AI is basically to employ an ML model to identify, evaluate, monitor, correlate and debug the model performance factors and the business KPIs. The ML model could predict the possible behavioural change of the business metrics based on changes made to AI model metrics without having to test it out in post-production saving resources, time and money. This is a project of the future which could eliminate various lapses and errors caused by the unpredictability of the business outcome when AI models metrics are tweaked.

A specialised AI system can be set up which accurately links KPIs with possible model errors: drift, bias, inaccuracy, missing data, outliers or data inconsistency based on careful analysis of the historical data trends. Summary statistics would be provided for every AI model which identifies current and predicts future errors tracing it back to the source of the problem which is otherwise a tedious task to do. Error correction requires re-scoring the model which costs money, time and resources which can now be bypassed after deploying AI model to predict possible effects of AI model metrics on KPIs. However, AI for AI requires access to a colossal amount of data, skilled workforce and computing power to properly identify and correlate trends. It would be revolutionising when it becomes a reality.

Correlation between model metrics in runtime and business KPI’s enables:

A check on the progress of the business objective with respect to the analytical models.
Easier problem resolution provides a comparison of performance growth and analytical improvements.
Identification and impact of runtime errors on business performance.
Tracking of efficiency and behaviours.
Proper resource utilisation, saving time by focusing on essential qualitative features which optimize business success.
Alerts and recommendations based on the impact of errors(inaccuracy, inconsistency, drift etc.) on the business APIs.

Conclusion

KPIs are not disjoint from real-world business metrics. The real-world metrics are analysed on the basis of their role in improving business performance, which is assessed and quantified in terms of certain indicators like revenue growth, working capital, HR support, marketing goals etc.

AI models employed in a business enterprise are weighed in with respect to the KPIs. AI models cater to the business enterprise productively only when they augment the business growth and success which is marked using KPIs associated with that particular business. The business enterprise works much efficiently when data analytics and other departments are seamlessly integrated for a particular goal. Monitoring and correlating help in bridging the gap between performance analysis and business growth. Thus, it is essential to view every improvement in the machine learning pipeline through the lens of KPI, this helps in quantifying what factors affect the business growth making the data scientist or engineer wary of how to tweak the model for optimum business growth.

AI for AI is a model of the future, but when adopted would change how business enterprises function and operate. Also, there are certain applications where optimising business success is not the primary goal when applying data analytics, such as in healthcare, drug delivery, modelling physical or biological processes to analyse or make certain predictions. So, there are vast fields where we apply AI and other data science techniques without having to worry about KPIs.