Vendor Model Validation Approaches for Digital Marketing Analysts

With the proliferation of machine learning environments now suited to applications outside the purview of data scientists and ML practitioners, modern Digital Marketing analysts now find themselves in situations where they are asked to speak to the potential implementations of newfound technology. Digital marketing analysts are nearly never asked to speak with full authority on the matter, but client facing situations often dictate discussion surrounding hypothetical ground as to which can be broken in predictive analytics. To provide value to clients, digital marketing analysts use their own expertise, vendor recommendations, and internal teams’ experience to provide input into the machine learning implementation of many organizations’ promotional endeavours.

Today’s digital marketing analysts not only need to be familiar with fundamental machine learning concepts, they must also be aware of the responsibilities of implementing models in the pursuit of accomplishing clients’ marketing initiatives. These responsibilities include a thorough consideration for model risk and its management, defined by the Office of the Superintendent of Financial Institutions as:

Model risk – The risk of adverse financial (e.g., capital, losses, revenue) and reputational consequences arising from the design, development, implementation and/or use of a model. It can originate from, among other things, inappropriate specification; incorrect parameter estimates; flawed hypotheses and/or assumptions; mathematical computation errors; inaccurate, inappropriate or incomplete data; inappropriate, improper or unintended usage; and inadequate monitoring and/or controls.¹

Model risk inherently carries the assumption that the risk must be mitigated, and for the purposes of discussion of this text, we focus on validation as the first form of mitigation in digital marketing. For this, we advise following the definition of model validation as prescribed by the Office of The Comptroller of the Currency (U.S. Department of the Treasury):

Model validation: The set of processes and activities intended to verify that models are performing as expected, in line with their design objectives, and business uses." It also identifies "potential limitations and assumptions, and assesses their possible impact."²

This document aims to act as a guide for digital marketing analysts on navigating the complex discussion of vendor model validation to clients and other stakeholders. As many industry practitioners can attest to, vendors are typically the holders of data that clients build their marketing strategy around. Even with internal data science teams being competent enough to build custom client solutions, vendors hold significant sway in model building discussions as they are owners of the data that models are constructed upon.

At the end of this text, the reader will have a basic understanding of model validation techniques that can manage clients’ expectations of digital marketing initiatives primarily dependent on proprietary vendor models. We will firstly be analyzing the multiple limitations digital marketing analysts face when looking to validate a vendor model, followed by a list of viable solutions for readers without an extensive background in ML and data science. The text concludes with exploring modular frameworks for validation with additional discussion of future trends for model validation moving into 2021 and beyond.

Key Considerations

When beginning work on validating an external model, there are a handful of complexities that arise from not having direct access to the step-by-step procedures employed by the third party that generated the model. In particular, digital marketing analysts should be wary of limitations in their approach, and ask the following questions:

Do I have the ability to contact the model creator?

a) Is the developer still with the organization that generated the model?
b) If the answer to a) is no, who is responsible for the maintenance of the current model?

Do I have the ability to review the code that constitutes the model?

a) Is the code available for public review?
b) Is the code open-source?
c) Is the code something my organization can purchase from the third party?

Are there any gaps in any of the model records?

a) Is the documentation provided with the model sufficient and transparent?

Is there a comprehensive record of data sets used to generate the model?

a) What inputs and outputs does the model take into consideration?
b) What are the transformations the model undertakes to come to a predictive conclusion?

From a digital marketing analyst’s perspective, the average practitioner may not have the skills required – the same skills a data scientist would have – to navigate the discussion surrounding model building and maintenance. That being said, there are approaches to each of the above questions that can be probed upon to ultimately help inform next steps. Potential solutions to the above questions can be seen below:

Do I have the ability to contact the model creator? – The analyst here ought to source the model to a particular individual, if possible. With models that are created by third parties, it is essential to understand the process and minute details that go into model construction and maintenance. In many cases in the digital marketing industry, the models are often delivered in near real-time through an in-browser platform. Platforms such as Google Search Ads 360, Google Ads, and Bing Ads provide predictive results in a matter of seconds with the inability to have any transparency to the code or process behind it. There are organizational support teams that analysts can leverage to get in touch with the engineering groups building these models in order to obtain further transparency into the predicted results.
Do I have the ability to review the code that constitutes the model? – In this scenario, the analyst may be required to interpret the vendor model at a basic level or conduct additional research into expanding their own knowledge. This is the one step which may require the aid of the data science team within the analysts’ organization to make sense of the model’s structure and construction.
Are there any gaps in any of the model records? – In the case of custom built solutions from third parties, documentation for the model ought to be fully provided by the vendor to ensure 5 transparency. This is often very difficult to obtain as vendors seek to protect their ML solutions from competitors and other practitioners. A recent study conducted by McKinsey & Company concluded that nearly 76% of all vendor models delivered to clients had incomplete or poor quality of model documentation.³ In the absence of complete documentation provided manually, the analyst can reference online documentation that is often found on support forums and on discussion boards found on the vendor’s website. A good example of documentation and discussion in this format can be found in the Google Support section pertaining to forecasting.⁴ If no documentation at all is provided, it will be necessary to contact the model developer as mentioned previously to gain insight into its creation and maintenance.
Is there a comprehensive record of data sets used to generate the model? – The analyst in this scenario will have to focus on determining the inputs and outputs of the model accurately. Given that most third party solutions require strategy guidance from internal teams to accomplish a particular task, the analyst ought to be familiar here with at least the inputs required for the model. If this is unknown, the respective project managers overseeing the vendor’s involvement with model creation should be familiar with the inputs and expected outputs of the model. These inputs will then act as primary parameters for linking model transformations to expected outputs. In addition to inputs and outputs, the vendor or project managers overseeing the model will be able to clarify precisely what data sets are used for training the model, as well as any external data ingestion techniques employed.

Without a comprehensive understanding of the questions discussed in this section, it is difficult for a digital marketing analyst to understand which tools, people, and data sets are a part of the model creation and validation. In a sense, a large part of the upfront work is administrative in nature, but is essential to providing answers and transparency to clients when entrusting vendors with model recommendations. These initial steps also prove to serve as an internal audit for the analyst and internal teams putting the client in touch with the vendor. If the vendor cannot provide answers to many of these questions in a thorough manner, it is likely their model is not representative of the reality the client is seeking to gain insight on.

Available Solutions

In the process of validating a model created to accomplishing a marketing goal, the analyst coordinating the project has a handful of options in approaching the task at hand. Part of the toolset available to digital marketing analysts comprises of both "soft" and "hard" solutions. In the following section, we discuss the implications and deployment of each solution to validating vendor models.

‘Soft’ Forms of Validation

We subdivide validation techniques into "soft" and "hard" categories due to the inability of many industry practitioners to genuinely acquire the model details required for conducting a proper audit. In particular, vendor solutions are significantly more regulated than in-house solutions, making the ML backend very difficult to obtain in a transparent manner. Soft forms of validation inherently face limitations as they rely on periodic model monitoring and review of conceptual soundness, rigorous assessment of documentation on model customization, developmental evidence, and the general application of the vendor solution to the clients’ marketing initiatives.⁵

Conceptual Soundness: The analyst here is tasked with determining the fit of the model to the data set and measured reality. From a conceptual standpoint, this originates by asking questions surrounding whether the model is theoretically suited to answer clients’ initiatives. The analyst will also call into question the design of the model and its intended long and short term viability. More specifically, model fit techniques (r-squared, mean squared error, etc.) are employed here for standard testing. When more direct techniques are not employed and computational intensity is required, analysts are advised to conduct sensitivity analysis⁶ to the model to validate the ML backend. For the digital marketing analyst, it is also advised to contrast results to industry benchmarks and calculate the drift of the model with historical reality for a similar time frame. Model outputs should be compared in detail to industry and peer results, taking into account multiple states of reality that test the model’s stability over longer time frames.
Assessment of Model Documentation: This has been briefly explored in the previous section, yet the premise remains the same here for the analyst conducting the validation exercise. In an ideal scenario, the documentation ought to provide clients with the ability to replicate the design of the vendor model in its entirety. This is likely the least reliable method of soft validation available as the standard for a correct assessment (and in turn, replication) is exceptionally low in a ML/AI application where the quantity of dimensions and variables involved are subjective to developer settings. Not only are there limitations on the initial setup that the documentation cannot aid with, but outputs generated by starting values may differ from one computational cycle to the next.
Developmental Evidence and Application: The last approach of soft validation an analyst can apply to their audit is conducting a logical check against the expected outcomes prescribed by the vendor and the actual results originating from the implementation of their model. Has the model moved the needle in the direction that the vendor predicted it would? This approach is of course easier in theory than in practice, yet the premise marries into the audit of conceptual soundness to a more simple level – we are looking to reject Ho.

Hₒ: The vendor model does not provide incremental marketing benefit upon deployment.

Hₐ: The vendor model does provide incremental marketing benefit upon deployment.

‘Hard’ Forms of Validation

The forms of validation covered here are practically-oriented techniques for the digital marketing analyst to use on a daily basis when some (or all) aspects from the Key Considerations phase are known. From a model replication perspective, these techniques aim to bring the analyst closer to the vendor-produced model.

It ought to be noted that model replication is not always about generating a 1:1 identical model to the original – there may be discoveries along the way that create a model to come to the same, or better conclusion, than the original vendor-produced model under audit. Models that are replicated and come to the same conclusion as the original model using different methods, inputs, and assumptions are known as challenger models. If the challenger model supersedes the performance of the existing model under audit (known as the champion model), then the champion model gets called into question by the challenger model and may potentially be replaced.⁷

Assumption Validation: This process is similar to the idea of Conceptual Soundness discussed previously, with iterations made in quantifying the assumptions made in the model itself. The analyst in this situation would verify the definitions of assumptions outlined in the entirety of the model. Once identified, each assumption made in its interpretation of reality would be quantified either numerically (discrete or continuous), as well as categorically (high, medium, low, etc.).

Take for example the recommendation of a vendor to use a linear regression of the following form on an advertising data set for keywords in a search engine with 3 variables: the keywords’ respective clicks, average cost per click, and respective clickthrough rate:

f(x)=β₀+β₁x

Upon delivery of the data set, the vendor comes back and provides a model that correctly predicts user clicks to site based only on clickthrough rate with a 95% confidence interval of the form:

f(x)=7.21+0.0901x

Least Squares Regression (gradient descent): Gradient descent is an optimization algorithm used to minimize some function by iteratively moving in the direction of steepest descent as defined by the negative of the gradient.⁸ The value being minimized here is the slope of the regression line (β1) in relation to the output prescribed by the search marketing data set.

Regularization: Regularization methods work by penalizing the coefficients of features having extremely large values and thereby try to reduce the error. It not only results in an enhanced error rate but also, it reduces the model complexity.⁹ In our example, we use our given output values that we already know from our initial data set to generate penalty terms assigned to coefficients the model tests recursively.

Suppose in this instance that our validation exercise is attempting to model the outputs we have in our data sets using all variables given.

f(xᵢ)=7.25+0.0991x₁+9.21x₂

If our generated output is well off the mark in comparison to our known output, the regularization exercise penalizes the offending coefficients and reduces the model to the following form:

f(xi)=7.25+0.0991x₁ ̶+̶9̶.̶2̶1̶x̶₂̶

f(xi)=7.25+0.0991x₁

The process is repeated until the error in the model is reduced and extremely large values for the coefficients are eliminated altogether.

Parallel Model Testing: In order to validate a model via our challenger vs. champion framework, a digital marketing analyst may find it useful to utilize the vendor documentation to attempt a full-scale replication of the model provided. This replication would have identical model structure, identical model inputs, and identical model assumptions. With this method, there are ultimately two scenarios under study:

Hₒ: With complete replication, results of Mvendor ≠ Mreplicated
Hₐ: With complete replication, results of Mvendor= Mreplicated

After multiple cycles of computation, the results of each hypothesis depict two separate conclusions:

Hₒ: With complete replication, results of Mvendor ≠ Mreplicated. The models are unable to come to the same conclusion when computing the same data set – the constant of identical implementation remains the same, indicating a computational variance. There is a noticeable difference in the processing of the code underlying each model separately.
Hₐ: With complete replication, results of Mvendor= Mreplicated. The models are able to come to the same conclusion when computing the same data set – the constant of identical implementation remains the same, indicating no computational variance. More importantly, the replicated model reliably processes the same bias and errors found in the vendor model for a high percentage of results.

Challenger Model Testing: Similarly to the aforementioned Parallel Model Testing, an analyst should also consider comparing a challenger model against the vendor model. Rather than running a duplicate of what was provided through documentation, the analyst uses guidelines for making a similar model to the vendor model, with the difference originating in model configuration and assumptions. Instead of importing the exact same parameters as the vendor model, the structure, model inputs, and model assumptions of the challenger model are varied. This in turn allows the analyst to utilize the similar model to compare results to the vendor-provided one and goes beyond the scope of computational verification.

The analyst may find that variables of prediction in the challenger model are better processed, potentially generating a more accurate, consistent, or stable model. In addition, the digital marketing analyst ought to coordinate with internal data science teams to generate multiple iterations of the challenger model, each with slight alterations to determine the best performing version. These can be compared individually against the champion model provided by the vendor to benchmark and provide empirical evidence of a challenger model’s superiority. Before full deployment against the vendor-specific model, the challenger model itself needs to be fully validated against test data sets.

It may also occur to the analyst at this point that if the longevity of the challenger model cannot be proven in a reasonable amount of time, a consolidated form of the challenger model and the vendor model is to be considered as a viable hybrid for future implementation. This is often the case when the vendor model is operating on a large historical data set and has proven its stability over time, yet the challenger model cannot do the same over longer time frames despite outperforming in the short run. This process can be a form of model refreshment, a topic which goes beyond the scope of this document, but can aid in discussions with clients in demonstrating added value in the process.

Framework and Study Design

Having discussed some potential solutions to the validation problem, the analyst can devise the project and provide a conceptual framework for implementation. This is typically a multi-step process, involving internal and vendor cooperation. A proposed approach for efficient model validation requires the support of developers, business managers, data architects (particularly those involved with data ingestion and ETL), and of course, vendor teams selling the model to the client.

Determine Initial Model Validation Requirements

This first step aims to provide clarity into why model validation is being called into action to begin with and how it impacts other organizational entities involved in validation activities.

Model Risk Management Policies: Is the request coming from a client that has had a bad experience with the vendor in the past? Is the organization that holds the client under a retainer contract looking to provide added value? Are the results of the model skeptical at first glance and does historical data invalidate the predictions of the vendor? Has the vendor shown adversity and low tolerance to vendor model error? These are all questions the analyst ought to ask to dictate the purview of validation activities and the scope behind which the validation can occur from a business perspective.

Ownership and Governance Framework: The analyst will be coordinating likely with the organization’s CIO to determine the impact of model validation. The analyst will need to determine how validation activities influence the aforementioned risk policies in conjunction with compliance and security, information quality, architecture, and ultimately integration.

Legal and Regulatory Considerations: As validation of a vendor model may involve requests for access to techniques acting as intellectual property of the vendor, the analyst will need to ensure that legal and regulatory expectations are managed and are in full compliance throughout the entirety of the validation exercise.

Determine Validation Methodology

This step in the framework can be thought of as the planning phase during which the analyst can provide input on many of the aforementioned key considerations discussed earlier.

Conceptual Soundness Considerations

Determine key data sets and ancillary information sources that will be ingested into the model.
Determine the use of the model, both in the long and short term.
Review the design of the model; its variables, dimensions, and other computational parameters.

Output Assessment of the Vendor Model:

Determine if data quality is preserved in vendor model.
Conduct a thorough assessment of the vendor model’s performance.
Prepare sensitivity analysis of dependent and independent variables.

Documentation Review and Conflict Resolution

Complete assessment of documentation provided by the vendor (what can be used, cannot be used).
Known issue and error identification (this may include anticipated issues for model stability, performance, and ability to interpret different data sets).

Actualization of Validation Exercises

This is the final execution of validation activities where model replication is attempted and challenger models are formed. Any industry and peer benchmarking is also likely to occur during this final phase, with the vendor model’s performance being called into question. It is likely that at this stage a model risk officer or controller will coordinate with the internal data science teams to complete implementation. For all challenger models generated, their own validity will be tested to ensure stability, completeness, and meet the prescribed confidence level calculations.

Interpretation of Results and Renewed Recommendation

Upon completion of validation exercises, the organization may now choose to pursue the vendor model implementation or provide a renewed recommendation to the client based on results. Under the circumstances where the vendor model is deemed appropriate for the client’s marketing objectives, the analyst may simply propose the solution directly to the respective client teams. In the event that the vendor model is invalidated and underperforms compared to the challenger model, the organization retaining the client may proceed with a ensemble model merging challenger and vendor models, or propose an internal model solution altogether.

Conclusion and Future Deliberations

The advent of Machine Learning in specialized applications also brought about the drive by ML-inclined organizations to bring the technology to the masses. At the time of writing, nearly every large financial institution, advertising agency, consumer packaged goods conglomerate, and technology giant uses some form of modeling in their day to day operations. With vendors of each of these entities providing modeling solutions, validation of their work is now a booming subsector of the data science community.

Looking to the future, we already notice several advancements in how model validation will evolve in the early 2020s. Some of the most interesting work comes out of the Defense Advanced Research Projects Agency (DARPA) in the United States, well renowned for their work in artificial intelligence and robotics. Enhancements to model validation at their organization are revolving around the concept of Explainable AI (XAI) – the notion that machine learning can provide an explanation of the results generated by another AI or ML model. XAI is meant to parse the code of another artificial intelligence by attempting model replication, interpreting the results of the replicate, and ultimately automatically generating a challenger model without human interference. At each step of the process, XAI produces documentation in a human-readable format to provide full transparency of the process. This process is currently in a research and development stage, but nearly $70 million have been funding the project to bring the technology to the public in the coming decade.¹⁰

Moving forward, it is anticipated that the growth of Model Validation and computational verification reach the same level of prominence as many modern audit firms do across the globe today. As enterprise solutions give rise to self-serve platforms such as AWS, it is not expected to take long until the layman small-business owner is modeling sales of their sole proprietorship. When the time comes to gain confidence in the model the owner generated in the cloud, validation services catering to him at a fraction of the cost of today’s solutions are likely to be widespread. Based on advances made in ML validation even today, it is possible that the future may have machines simply validating machines.

References

1 "Enterprise-Wide Model Risk Management for Deposit-Taking Institutions." Office of the Superintendent of Financial Institutions, Office of the Superintendent of Financial Institutions, Sept. 2017, www.osfi-bsif.gc.ca/Eng/Docs/e23.pdf.

2 "Sound Practices for Model Risk Management." Office of the Comptroller of the Currency, Office of the Superintendent of Financial Institutions, Apr. 2011, www.occ.gov/news-issuances/bulletins/2011/bulletin-2011-12.html.

3 Crespo, Ignacio, et al. "The evolution of model risk management." McKinsey & Company, Feb. 2017, www.mckinsey.com/business-functions/risk/our-insights/the-evolution-of-model-risk-management.

4 The hyperlinked URL for the example is https://support.google.com/searchads/answer/4552409?hl=en

5 Regan, Samantha, et al. "Validating Machine Learning and AI Models in Financial Services." Accenture Finance and Risk, Accenture, 2017, www.accenture.com/_acnmedia/Accenture/ConversionAssets/MainPages/Documents/Global/Accenture-Emerging-Trends-in-the-Validation-of-ML-and-AI-Models.pdf.

6 Brown, Beverly. "How to Perform Sensitivity Analysis with SAS Marketing Optimization." SAS Communities Library, SAS, 11 Sept. 2017, www.communities.sas.com/t5/SAS-Communities-Library/How-to-Perform-Sensitivity-Analysis-with-SASMarketing/ta-p/311069.

7 Halliday, Elaine. "Practical Considerations for Various Model Validation Techniques." , FMS, June 2017, www.fmsinc.org/documents/forum17/slides/InternalAuditRisk-ModelValidations-Halliday-Zheng-McGuire-tues400p.pdf?fbclid=IwAR2crb4Ha8gtTMhdlh7OthM7kZZiQac3tbCd_f98kPmxDwCmYn8LX9uTQsM.

8 Ruder, Sebastian. "An overview of gradient descent optimization algorithms." , Ruder.IO, Jan. 2016, ruder.io/optimizinggradient-descent/.

9 Paul, Sayak. "Essentials of Linear Regression in Python." DataCamp Community Tutorials, DataCamp.com, 31 Oct. 2018, www.datacamp.com/community/tutorials/essentials-linear-regression-python.

10 Arel, Itamar, et al. "Deep Machine Learning – A New Frontier in Artificial Intelligence Research [Research Frontier]." IEEE Computational Intelligence Magazine , pdfs.semanticscholar.org/ea58/ af907495e97c93997119db4a59fab5cd3683.pdf.