Interpreting the model is for humans, not for computers

Cruz 𝒥𝓊𝓁𝒾𝒶𝓃

Published in

Towards Data Science

11 min readMay 12, 2020

It is about translating, not rephrasing.

By Cruz 𝒥𝓊𝓁𝒾𝒶𝓃 and Valeria Fonseca Diaz

Critique of pure interpretation

The scientific method as the tool that has served us to find explanations about how things work and make decisions, brought us the biggest challenge that until 2020 we have probably still not overcome: Giving useful narratives to numbers. Also known as “interpretation”.

Just as a matter of clarification, the scientific method is the pipeline of finding evidence to prove or disprove hypotheses. Science and how things work is everything, from natural sciences to economic sciences. But most delightful, by evidence, not only we mean, but humanity understands “data”. And data cannot be anything less than numbers.

Leaving space for generality, the problem of interpretation is particularly entertaining along the path of statistical analyses within the scientific method pipeline. This means finding models that are written in a mathematical language and finding an interpretation for them within the context that delivered the data.

Interpreting a model has two crucial implications that many scientists or science technicians have for long skipped (hopefully not forgotten). The first one relies on the fact that if there is a model to interpret now, there must have been a research question asked before in a context that delivered data to build such a model. The second one is that the narratives we need to create about our model can do much more by expressing ideas about a number within the context of the research question rather than purely inside the model. After all and until 2020, the decisions are made by humans based on the meaning of those numbers, not really by computers. And this last statement is important, because in the 21st century we might actually get to the point that computers take over us in many tasks and they might end up making decisions for us. For this they will need to communicate those decisions among their network. Just then, the human narratives will not count since computers only understand numbers.

As statisticians, we have been adopting the practice of finding problems to solve, finding questions to answer and answers to explain using available data. This mindset has kept us running on a circle of non-sense narratives and interpretations because problems are not found or looked for. Problems and questions emerge from all ongoing interactions and reactions of different phenomena. This fact implies that statistical models and/or other analytical approaches are tools to be used upon the core central problem or question, they are not the spine.

This ugly art of fitting a linear regression on some data and saying that “the beta coefficient is the amount of units that y increases when x increases one unit”, or the art of calculating an average and saying that “it is the value around which we can find the majority of the data points” is a ruthless product that we statisticians have been offering to the scientific method.

The bubble of interpretation

Teaching statistics has made clear for us that people can perfectly understand the way the models work and how to train them to get the numbers. However, what we still did not digest is the fact that out of all the numbers that are produced when training models, most are simply noncommunicable for non statistical people. Let us present some of these numbers whose communication is dark:

- The p-value is one particular concept that may in fact deserve an entire essay on its own. On social media, for instance, we constantly see people asking about an explanation of the p-value and right away there is a storm of statisticians engaging into giving their own interpretation.

- The odds-ratio stars in this debate. After 10 years of working in this field, we must admit that it has never been possible to explain to an expert in another field how to think about the odds ratio. Even Wikipedia has tried out and, in our opinion, has more than failed at it.

- The beta coefficients of a dummy variable in a logistic regression are of the same kind. We have the feeling that they are odds-ratio of the categories with respect to the baseline category. But, how can we make it understandable and actionable in practice? This simply we don’t know.

This problem is central for the scientific and statistical community. The models that we train lose their value because of such a lack of interpretation.

After some years of talking with colleagues about this problem and looking for an appropriate framework that clears up the problem of interpretation, we had come to one obvious conclusion: the interpretation process exists and can only happen within a given context. It makes no sense to fight for the interpretation process inside the model. The inner processes of the model are all numerical and these results can be communicated and understood only by numerical, statistical people. Making interpretations of numbers inside the models is a bubble of rephrasing. In order to interpret the results of a model so that they become tools for taking actions, it is essential to keep in mind the context from where the data is coming and the research questions are asked.

One of our favorite examples to clearly depict the fight for interpretation is the common preference that exists for a logistic regression over a neural network when trying to solve a classification problem while interpreting the model. As commonly known, the community says and publishes papers saying that logistic regression is good when we need to interpret parameters and a neural network does the job if our purpose is only prediction. Declaring that we prefer a logistic regression to sink in the lake of the odds ratio? But this is precisely the effect of wanting an interpretation of a number inside the model. If our problem consists of solving a classification problem and we want to understand the effect of the control variables, what about setting up what-if scenarios for these control variables and checking the resulting yes or no decision? We believe revenue management companies would even pay a ton of more money for those what-if scenarios than for a PowerPoint slide saying that beta is the odds ratio of our binary problem. Likewise, huge knowledge could thrive from what-if scenarios in medicine and biology that the odds ratio will never bring to us.

The burst of the bubble

Setting up the framework to make interpretation possible requires defining with precision what conditions are involved into the model and which ones into the context.

During the model-building stage, the knowledge we use is numerical, statistical and technical. Our core conceptions and concerns are about accuracy and robustness of the methods we use to build the models. Quantities, parameters, mathematical formulas and algorithms are the main components in this world. As such, the main researcher does not need to fully understand every single component during this process because most of those are just required pieces to keep the big machine running.

On the other hand, the context is essentially full of non-statistical technical knowledge. The concepts in this world are transformed. The numbers that represent the data, for example, are translated in terms of units of measure. We do not refer to the data as a mathematical or computational representation, we talk about this number in meters, kilograms, seconds, persons, homes, etc. The understanding of these concepts is the main faculty of the head researcher. Just as before, the technical statistician may not have a deep understanding of all the concepts in the context, but it is crucial that they understand the big picture of the problem that gives meaning to both worlds.

With that said, the interpretation process is about connecting a number in the model with a concept in the context. It is about bursting the bubble.

If we think about this process from a linguistic point-of-view, in semantics there are monosemic (one meaning) and polysemic (multiple meaning) words. There are also denotative meanings (the meaning of the dictionary) and connotative meanings (a meaning in the context). Therefore, in semantic words, the conventional statistical interpretation process is about giving a number only one meaning as presented in the statistics textbooks, being this a monosemic denotative way. We believe that giving a number many meanings in different context situations, that is, in a polysemic connotative way, is an improvement to the statistical understanding for non-statistical people.

Let us just dive into other small and big examples. In financial analysis, the standard deviation of the revenue of an asset is called volatility. The term volatility is a concept in the financial context and this connection between the number and the concept does not make any sense out of this context. The same happens with cinematographics, where the standard deviation of the punctuation of a film is called controversy. In quality control and process improvement, the Six Sigma standard goal is to have 3.4 errors in a million products. We find a lot more of these, such as elasticity in Economics, satisfaction in Marketing, lethal dose in Toxicology, among many others. As we can see, this connection is precisely a translation. This translation makes it possible to read the number in a meaningful way.

We don’t even need to narrow this debate to the statistical and research community. As of 2020, every single citizen with access to the internet may have come across the famous “curve” during the COVID-19 pandemic. All the media has talked about the goal of “flattening the curve”. Well, this “flattening the curve” is probably the most delightful example about translating the model to the context. The “curve” is exactly what we meant here by the “model”. Nothing more to say about that. This model has a very specific mathematical formula that relates the amount of infected cases, cases in risk of infection, etc., with time. Now, very specific numbers inside this model define the capacity of the population to sustain the disease before dying. No effort needs to be put to find the connection of the number with the context because the models were created for this context. While these numbers change in the hands of the statistician in charge as more data comes in and the models are re-estimated, the experts wait for the moment to be told whether the curve got flattened so that they communicate to us what we, as citizens, can do and cannot do.

We can come closer in this debate with something that is more familiar within the statistical community: Interpreting a PCA (principal component analysis). This is one of those methodologies that is adopted to handle data in many different ways. We use PCA for visualization, data cleaning, dimension reduction, etc. This is also one of those methods that is used across innumerable fields of research both in academia and industry. Yet we ask, to what extent are we really exploiting the potential of this type of statistical tool?. Let’s think about one typical case in the social sciences in which researchers have demographic and social variables, and a PCA is implemented in order to “understand the data and the relationships among the variables”. Note that right here there is already a flaw in the framework because “understanding the data and the relationship among the variables” is not a research question or objective. But alright. After calculating the PCA numbers, one of the things we do is to build a scores plot declaring the percentage of variance retained by each axis. Now, what does this percentage of variance mean? What do we really gain from saying that the first component retains 30% of the variability in the data? Nothing actually. However, imagine that our variables are salary and number of hobbies and the researcher hypothesizes for a happiness latent trait. What about telling the researcher how much information of these variables account for that hypothetical trait simultaneously? That would be 30% and so there is 70% of salary and number of hobbies that account for some other latent traits. Other than that, most biplots, eigenvalues and the like, may mean very little within this context.

From computer language to human narratives

The reconstruction after the burst

Our examination of this framework has brought us to some statements which we believe to be true.

First. Not every number has a translation from the model to the context because there does not necessarily exist a concept to connect every number. This happens, for example, in neural networks, whose output has an easy interpretation but mostly the inner layer weights have no direct interpretation.

Second. The connection between numbers and concepts is not always obvious. Some numbers can find their concepts more directly than others. In Economics, the concept of GDP (the market value of all the final goods and services produced in a specific period of time) is associated, by definition, with a number. Therefore, when the number is calculated from the model (i.e. sampling and estimation), the connection is instantaneous, obvious and sharp. On the other hand, also in Economics, the concept of inequality (difference in economic well-being between population groups) is a complete abstract idea. Here, its manifestation in a model as the GINI coefficient (i.e. the area above the Lorentz curve) is not obvious. In the first example, the concept is a number with units and the number is an estimation of that. In the second one, the concept is an abstract idea and the number is an area between two curves.

Third. The idea that the goal of interpretation is the communication between the people who know the context of the problem and people who make the model as a solution is also not very obvious. This last statement comes in a strong way. Initially, you can think that this idea changes nothing, but it does. The idea of interpretation as an act of communication between different worlds has not been used before in its deepest meaning. We must admit that the traditional scope of interpretation consists in rephrasing the numbers inside the model to talk about the numbers in statistical terms, inside the mathematical formulas and with no mercy for communication.

What comes now

We acknowledge that this interpretation approach is challenging but we want to be precise on the actions that can be taken.

Defining and structuring an interpretation framework in some fields of study where statistical models come as a tool is a fundamental step. Finding meaning for the most used models in the different areas not only can provide better usability for non-statistical researchers and users enhancing their analytical experience, but it can also build and spread a better statistical culture.

Studying for the development of this framework in order to have good interpretation practices for the models is the emerging training. The introduction of some concepts of semantics makes us think that there is a great deal of theory to be developed. Statistical interpretation is a communication process that has not been addressed from the Language sciences.

Spreading the word inside and outside of the academy. Lectures, seminars, conferences and the like need to take place in the faculties of Statistics. We need to create the environment to bring up this discussion and its implications. Both statistical, and linguistic students must be involved in order to generate an accurate approach to communication in statistics.

Interpreting the model is for humans, not for computers

Written by Cruz 𝒥𝓊𝓁𝒾𝒶𝓃