The world’s leading publication for data science, AI, and ML professionals.

The Case For Mystery in Machine Learning

A use case for a black-box machine learning algorithm.

Black-box models, or Algorithms too complex to understand without sophisticated analysis, are becoming more common in the quest for increasingly accurate predictive analytics. While these models have resulted in advances in the field of data science, there are also ethical concerns about their use. See this article in Nature magazine for an overview of the dangers of black-box models and their counterparts, secondary models used to explain them. See the book "Interpretable Machine Learning" by Christoph Molnar for a detailed explanation of model interpretation methods.

Image by author
Image by author

If these interpretation methods are followed, there is less cause for concern of bias. Even still, there are proponents who believe that all black boxes are bad, and others who believe they can be useful when carefully applied. Still others prefer to avoid the term "black-box" and simply call them more or less interpretable.

Image by Author
Image by Author

Before getting into the details of what makes a black box a black box, place algorithms inside the machine learning box, and machine learning inside the AI box. Machine learning is the science of approximating quantitative estimates using algorithms. Here is a breakdown of machine learning models that are interpretable versus ones considered to be "black box."

Readily Interpretable Models

Interpretable models include, Linear Regression and its cousins Logistic Regression and GLM (Generalized Linear Models), Decision Trees, Decision Rules, Rule Fit, Naive Bayes Classifier and K-Nearest Neighbors.

Rule Fit Model. Image Credit: Christoph Molnar posted with permission
Rule Fit Model. Image Credit: Christoph Molnar posted with permission

Less-Interpretable / Black-box Models

Less interpretable models can include any algorithm other than the above models, or any model with many transformations applied to it. There are a number of transformations that help improve model performance, but decrease interpretability. These can include basic statistical manipulation such as standardization, normalization, imputation and log transformation as well as more specialized techniques such as sub-sampling, super-sampling, and noise injection. Zhiwen Yu et al found that "random combination of transformation operators across different dimensions outperforms most of the conventional data transformation operators for different kinds of datasets."

Here is a rundown of the usual transformations applied to data. These are just a start. Any mathematical operation that can be applied to a function can potentially be applied to an algorithm, if there is a good reason to do so.

Log TransformationsThis transformation multiplies the data by its log.

Min-Max Scaling This transformation will bring all values between 0 and 1.

StandardizationThis transformation will change the mean and standard error of the data.

Mean Normalization This transformation will make all the data between -1 and 1, with a mean of 0.

Unit Vector TransformationThis transformation transposes vectors, which changes its shape from horizontal to vertical, or vice versa.

Every data set will have a unique application of transformations, depending on the structure of the data itself as well as the target being predicted. Without attempting to opine on a one-size fits all solution, the application of machine learning models is highly nuanced across different applications. A "black-box" approach may be useful for some situations, and harmful for others. The term "black-box" is somewhat misleading. The data scientist creating the model should have knowledge of what transformations are being applied and why. The disconnect comes with interpreting the results after creating the model, and whether the process of obtaining the results can be explained to a person without Data Science knowledge.

This blog will explore a use case for a black-box model, to debunk the idea that black box models should be avoided at all costs.

Image Credit Alexander Abero posted with permission
Image Credit Alexander Abero posted with permission

Consider LEED, or Leadership in Environmental and Energy Design. LEED is a green-building rating system that rewards buildings with high environmental performance. Buildings are awarded points based on various features, such as the efficiency of the mechanical system or the quantity of recycled materials. With enough points, buildings can be rated as certified, silver, gold, or platinum. The relative value of all points is published on the web via a checklist, so architects and engineers can design the building to achieve a particular certification level.

The problem with publicly publishing the checklist is that it becomes easy to game the system. Cheating on LEED certification is a known problem within the industry. Every LEED credit has to be backed up by documentation, characteristic of a trust-less system. For example, the site boundary is used for many calculations. This can be drawn in a certain way to achieve credits that in reality, should not be received. Even worse, a design team may submit documentation saying that a particular system was installed, but then it might be downsized in construction resulting in poorer energy performance. A conflict is occurring because the same system is being used for design and evaluation. Essentially, this is like studying for a test while knowing the answers.

Image credit Chris Liverani posted with permission
Image credit Chris Liverani posted with permission

Continuing with the example of a standardized test, the correct answers of a test are unknown to the person taking the test. Similarly, the exact formula used to calculate a consumer’s credit score is unknown. The company keeps the formula a secret, because if it were known, it would be gamed. The integrity of the system would fall apart under the condition of total transparency. For a person to improve their credit score, qualitative recommendations, such as, ‘keep low balances on revolving accounts’ is given, or ballpark estimations such as ‘keep balances below 30% of the limit.’ However, the exact nature of how revolving balances affect the score is unknown.

Image credit Stephen Phillips posted with permission
Image credit Stephen Phillips posted with permission

What if an architect turned statistician wanted to be able to compare all buildings to one another in quantitative terms. Through industry knowledge of the way the system is gamed, LEED scores are may not be an accurate way to evaluate a building’s environmental performance. Instead, key performance indicators and various other data points can be used. This is exactly the project being undertaken by my startup – the Building Quality Index (BQI.) The purpose of the BQI is to stop the problem of disposable buildings, where buildings are being built lower and lower quality and demolished in less time than a human life span.

A comprehensive quality and risk metric is needed – and it needs to be one that can’t be gamed. For example, an owner might want to know if they should invest money into renovating a building, or if they should buy a new building. Knowing that the risk of future leaks, structural problems, or vulnerability to earthquakes is high, based on a set of data, would be highly useful information. But if the person supplying the data knew what would be considered high risk, they might be tempted to doctor the input so as to receive a favorable outcome.

For this reason, the insurance industry does not disclose exactly what factors cause it to increase and decrease premiums, and by how much. In a system where gaming is a known problem, or where inaccurate results could cause losses of property or life, using a ‘black box’ algorithm may be the perfect solution to prevent gaming.

‘Black-box’-ness can be summarized as the cost of high precision. The more finely tuned a model is, the less interpretable and less transparent it will typically be, and the easier it will be to maintain privacy. In addition to the common transformations mentioned above, ensemble methods as well as bagging and boosting will also increase ‘black-box’-ness, or decrease transparency and interpretability.

The tradeoff between transparency, privacy and precision is a delicate balance. The diagram below is a starting point for determining whether to prioritize transparency, privacy or precision. Use cases in government, science, social science, and business may have very different priorities. By asking the right questions, data scientists can develop best practices for different types of use cases.

Image by Author
Image by Author

References:

  1. Dallas Card, The "black box" metaphor in machine learning, 2017, Towards Data Science.
  2. Virginia Dignum, On Bias, Black-boxes and the Quest for Transparency in Artificial Intelligence, 2018, __ Medium.
  3. Jeffrey Heer, Joseph M. Hellerstein and Sean Kandel, Predictive Interaction for Data Transformation, 2015, 7th Biennial Conference on Innovative Data Systems Research (CIDR.)
  4. Milo R. Honegger, Shedding Light on Black Box Machine Learning Algorithms, 2018, __ Karlsruhe Institute of Technology.
  5. P. Krammer, Ondrej Habala, Ladislav Hluchý, "Transformation regression technique for data mining," 2016, IEEE 20th Jubilee International Conference on Intelligent Engineering Systems (INES).
  6. Marina Krakovsky, Finally, a Peek Inside the ‘black box’ of Machine Learning Systems, 2017, Stanford Engineering.
  7. Colin Lewis and Dagmar Monett, AI & Machine Learning Black Boxes: The Need for Transparency and Accountability, 2017, __ KD Nuggets.
  8. Rob Matheson, Cracking Open the Black Box of Automated Machine Learning, 2019, MIT News.
  9. Bennie Mols, **** In Black Box Algorithms We Trust (or Do We?), 2017, The Association for Computing Machinery.
  10. ODSC – Open Data Science, Cracking the Box: Interpreting Black Box Machine Learning Models, 2019, __ Medium.
  11. Arun Rai, Explainable AI: from black box to glass box, 2020, Journal of the Academy of Marketing Science.
  12. Cynthia Rudin and Joanna Radin, Why are we using Black Box Models in AI When We Don’t Need To? A Lesson from an Explainable AI Competition, 2019, MIT Press.
  13. Cynthia Rudin, Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead, 2019, __ Nature.
  14. Chris Walker, White Box vs Black Box Models: Balancing Interpretability and Accuracy, 2020, __ Data Iku.
  15. Zhiwen Yu, Hau-SanWong, Jane You, Guoxian Yu and Guoqiang Han, Hybrid Cluster Ensemble Framework based on the random combination of data transformation operators, 2012, __ Pattern Recognition.

Related Articles