Data science
Data science or how to decide on how to decide

Cassie Kozyrkov, at Google, has been promoting the idea of decision intelligence. Decision intelligence, the term is coined by Kozyrkov, aims to regroup known decision-making methods, insights, best practices under a common umbrella. Examples of these known methods and insights are:
- Analytics
- Statistics, both frequentist and Bayesian
- Machine learning
- Decision science
- Behavioral economics
- The psychology and neurology of decisions
Data Science is purposefully omitted from the list above. Data science is a popular nomer, but not well defined: isn’t all science data science? Perhaps this post is an experiment to see if the term decision intelligence better demarcates a coherent field of interest. To start, a central tenet of decision intelligence is that a lot about the field currently known as data science has to do with making decisions; on a large or a smaller scale. How does one decide on how to decide? That is the scope of this post.
Part of the outcome of this new demarcation is that data science projects as we knew them can be better executed by being more precise about the structure of the different tasks and the required roles.
Is there such a thing as a decision architecture or infrastructure, and how would that look?
The goal of this post is to summarize hopefully a large part of the Decision Intelligence content as posted by Google’s Kozyrkov in different blog posts and media. As a summary, it is preliminary because although based on a sizeable chunk of blog posts, it does not aim to include all content. And it is not clear if all information is published online. In this post, I will purposefully drive the term ‘decision intelligence’, as an experiment, but feel free to read ‘data science’ instead.
Overview of this post
For your convenience here is a list of the sections and topics included in this post:
- Decisions: from zero, to few to many
- Analytics
- Hypothesis testing with frequentist statistics
- An example of Bayesian statistics
- A comparison: analytics, frequentist and Bayesian statistics
- Machine Learning
- On explainability, testing and bias
- It is o.k. to do applied
- The decision intelligence process
- Decision intelligence teams and the decision maker
- Some added thoughts
- Final thoughts
Decisions: from zero, to few to many
The name of the game is to make more informed decisions. In this section, an initial categorization for decision making will be introduced, which will be expanded on, in the following sections.
One of the central ideas of decision intelligence is that one can be confronted with the following scenarios:
- One wants to make decisions at scale, a large number of decisions
- One wants to make a few specific decisions
- One is not yet ready to make a decision because one lacks insight
Let’s start with making decisions on a large scale.
Machine learning fits this bill. Want to estimate the value of millions of houses? Want to recommend billions of tweets to millions of people? Want to tag caption billions of photos? The current information revolution is in large part about scaling both data and decisions beyond what was previously imagined to be possible. Before this revolution of scale the making of decisions, more on the scale of a few, was the realm of Statistics, being it frequentist or Bayesian.
Statistics help to manage, not take away, the uncertainty that comes with deciding based on data of a population sample instead of on the information of the complete population. Making millions of decisions using statistics does not fit the framework, one would need to correct for repeated testing using for example Bonferroni correction. But that has its limits. More specific for the frequentist approach, one is supposed to bring a default course of action represented by the null-hypothesis to the table. There are rules to abide by, more on this later.
Last, but not least, sometimes one is simply not yet prepared on how to think about the decision at hand: it is the inspiration that is required. In this case, analytics can and should help out.
To give this categorization more substance, the three areas of interest will be expanded on in the next sections.
Analytics
Analytics is about analyzing the data at hand and more or less forgetting about the world outside of that data set. A good analyst can quickly dissect a data set and select patterns of interest from the data. Analytics is about generating ideas or generating better questions. Some of these questions can be picked-up by a statistician for generalization outside of the current data set using another data set. In some specific cases, analytics can be enough to drive decisions too, as will be illustrated by the next example.
Let’s suppose one can acquire at an equal price either a batch of blue t-shirts or a batch of red t-shirts to sell on a local fair, but not both batches. Having no established preference, or default action, one asks around randomly for a color preference to determine which batch is the better choice. If there is any majority vote on the color, it is o.k. to go with that majority vote without applying statistics. Why? Since you have no preference, no mathematics or statistics will turn the majority outcome in favor of the minority outcome. The difference in votes might be meaningless, but the majority vote is still the safest bet.
Note, there is no political messaging intended in this example, other than that one should take care of where the sample is taken in relation to where you sell the t-shirts. Mathematics will not change the decision in favor of the minority vote, but the concept of sample bias could: but that is not a mathematical issue.
On to making a decision using hypothesis testing.
Hypothesis testing with frequentist statistics
In the previous example, there was no default action. In a lot of situations, there is a default action; this default action is often keeping things as they are. This default action carries some weight, be it trust wise, believe wise or investment-wise. (There are also more technical reasons, more on that later.) For example, one might want to know if a specific treatment is worth it or not. Put shortly, in this case, hypothesis testing can help shape the belief about a population parameter, the absence or presence of some condition, after applying the treatment to a population sample. Let’s expand on this a bit more.
Causality is important if assessing treatments. Typically one would use a randomized trial with the population sample divided into a treatment and a control group. If done correctly, the experiment involves the formulation of a null-hypothesis, which is to be falsified, and an alternative hypothesis. After applying the treatment to the control group, based on the outcomes of both groups, and the possible difference found between them, the notorious p-value or probability value can be calculated. The p-value is the probability that the calculated difference would occur if the null-hypothesis is true, i.e. if there is no effect. If this probability turns out to be very small, then the null-hypothesis is rejected in favor of the alternative hypothesis.
The intended use of this example is not to dive into the more philosophical details of hypothesis testing, but rather to show that the hypothesis testing procedure is designed for a very specific purpose. Repeating the testing procedure is likely to render similar results; there is an emphasis on objective facts and repeatability. The framework is intended to be used if one has to make a small number of well-defined decisions. Scaling up the number of decisions one runs into problems, that only up to a certain degree can be corrected for.
If the decision is less well defined, or more complex, Bayesian statistics could offer a good alternative. Although I have not seen a lot of content on Bayesian statistics in the context of decision intelligence, the framework of Applied Information Economics (AIE) as defined by Douglas Hubbard can provide a nice example as a contrast to frequentist statistics.
An example of Bayesian statistics
Complex decisions can require a lot of information. Some of this information might be at hand, but some might not be. More in general, to formulate complex decisions using only frequentist hypothesis testing could prove insurmountable. It is not always possible to test everything in a rigorous way. Sometimes prior believes or expertise has to be part of a decision-making process. Bayesian statistics allows you to do that. And although the notion of prior beliefs might sound fuzzy, Bayesian statistics can be very powerful not lastly because one can update the beliefs using observations. Hopefully, the following example gives a flavor of its workings. Note that this illustration of Bayesian statistics is purposefully skewed towards an applied business setting and very trivialized.
Suppose one is interested in building a system, any system. Building this system involves the completion of 1500 individual tasks. Prior belief is that the 90% confidence interval for the number of hours for a single random task to be completed is between 15 minutes and 7 hours. Further, prior belief is that the market price for an hour of work is between $19.50 and $47.50. Using these beliefs, and assuming normal distributions for both beliefs, one can calculate the probability that the system labor costs exceed say $10.000 by running a Monte Carlo simulation. How? One can draw 1500 samples from the task distribution and multiply each sample by a draw from the hourly rate distribution to get one ‘realization’. By repeating these steps many times one can create a simulated probability distribution of the labor costs; and this distribution can help to reason about possible outcomes.
This simulation is made possible by accepting prior believes as the starting point, a central tenet of Bayesian statistics. And although the example is very simplistic, the framework allows the complexity to scale. To show that we are not riding roughshod over reality, observations, and statistics, I will expand on the AIE framework a bit more. The following notions, supported by scientific evidence, are advocated within the AIE framework:
- Most people can be trained to provide accurate/calibrated probability or confidence interval estimates following some training.
- Estimates, although a good starting point, are easily influenced by emotions and the environment.
- Any uncertainty in the estimates can be reduced by updating the prior beliefs with sample observations. A prior distribution is transformed into a posterior distribution, wherein the evidence is taken into account.
- The sample size needed to reduce the uncertainty in the distributions is smaller than expected; often 5 to 100 samples can work miracles.
- Observations have the highest impact where the uncertainty is highest.
- Given some model outcomes, it is possible to assess the value of adding additional information – indicating what costs are justified.
- Modularization of complex decisions makes decisions more reliable and more accurate.
- The goal is not necessarily to be perfect but to beat the current method of decision making (often expert advice only).
The AIE framework allows to modularize and manage complex decisions and its related information in a remarkable way; and it is all backed by statistics, notwithstanding that the process might start out as a thought experiment.
It is time for a recap and a comparison.
A comparison: analytics, frequentist and Bayesian statistics
In the previous three sections, the virtues and shortcomings of three different decision frameworks were illustrated. In this section, some comparisons and antagonisms will be discussed to add more depth.
Analytics and frequentist statistics often share common terminology and calculations like mean, variance, and p-values. Still, the goals of the frameworks are different, and this can lead to confusion. An analyst might observe a difference in some statistics and obtain a p-value. What does this p-value mean? Well, according to the statistician nothing. P-values belong to the realm hypothesis testing, and hypothesis testing requires adherence to a well-defined process. And the statistician is probably right here. Any scientist obtaining a data set, turning that data set inside out, obtaining a p-value on some statistics, and then claiming to have tested a hypothesis is likely to lose his or her job.
One does not use data set for inspiration and then uses that same data set for the hypothesis testing. Splitting the data, whilst often advisable, only helps so much. How about the controlled setup of the experiment to prove causality? Or is causality not relevant? Or confused with correlation? Was a significance level set for the hypothesis test, and how was that level set? Was it adjusted for the power of the test? There are lots of questions to be answered before a data set can be used to test a hypothesis. Analytics is more on the ‘business’ side of the process, and can provide valuable insights that can serve as an inspiration; but avoid making it messy.
On the flip side, all the rigor of frequentist hypothesis testing has its limits. If confronted with complex decisions involving lots of information, frequentist statistics is likely to fall short. Pure mathematical rigor is not a catch-all for all conditions. Bayesian statistics allows to scaling decisions by using, and verifying, prior beliefs regarding the issues at hand. It allows analyzing where exactly more information is required or most impactful, and when the costs of acquisition justify the benefits. Any data counts can be made useful, significance is not a key issue in this situation.
Both frequentist and Bayesian statistics in this example focused on making a small number of decisions in a rigorous way. So how about making decisions at scale, a.k.a machine learning? Or AI in common parlance.
Machine learning
The term AI has been introduced in the fifties of the last century. Since then the term has covered a lot of methods. Currently is usually refers to machine learning, but there are certainly more methods that could fit the bill and have fit in the past.
Kozyrkov labels machine learning as the labeling of things. Labels can be of various sorts: cats, dogs, numbers, joystick movements. The latter example suggests that reinforcement learning false squarely under machine learning. As noted before machine learning is the tool of choice for labeling on a large scale. For that reason, machine learning can be more software-intensive. This is probably an explanation for the fact that machine learning grew big under the wings of computer science.
The main considerations in machine learning are bias and variance. Controlling for these types of errors is done through notions of train, validation, and test data. (Or using variations thereof.) From a user perspective, statistical distributions are less on the foreground. These are still relevant though in the optimization of models and in the assumption that train, validation, and test data share a common distribution.
Within machine learning, there is a great emphasis on model performance. For pure technical applications like predictive maintenance, this is no surprise and no reason for concern. When it involves making decisions about humans, or other animals, moral issues can quickly arise.
On explainability, testing and bias
To be precise, algorithms are never biased, but data can be. Therefore the outcome of models using these algorithms and data can be biased too. And that is a real concern. Machine learning makes it very easy to scale decisions. It also makes it very easy to make mistakes on a large scale. And if those mistakes systematically disadvantage large groups of people, because the system reflects the sometimes unconscious values of ultimately its creators, then morality is at stake and ethical discussions need to enter the fray. But lets first dive into a small example of bias.
Imagine a system that is to identify tech talent machine learning positions, and in order to do so, the systems scans and scores resumes. It is a known fact that area codes can act as a proxy for a lot of hidden variables. With schooling becoming more segregated, it is not hard to imagine that school names can work in the same way. Is it useful to let an advanced NLP system act basically as a Stanford, Berkeley or MIT scanner? Adding more similar profiles to an industry that already suffers from a lack of diversity? At some point in the process the goal of model will have to be translated or formalized into a function, the objective function, that can be optimized. This formalization together with the selected data introduces assumptions and often unconscious bias. And as a final twist, wasn’t a resume scanner designed to show bias in the first place? You have to know what you are doing.
In Europe, under the GDPR, people have a right to know how an automatic decision has been calculated. Not all models are transparent in this sense, to say the least. And a lot are not transparent at all. Even a simple linear regression model can pose challenges if the conditions are less than perfect. It is in this context that methods like SHAP or Lime have been created. These methods aim to provide insight into what features impact the decision most. This helps. The problem is though that these methods provide local approximations to the underlying algorithm. The explanation can be sketchy, and or not universally applicable. This brings up an interesting point.
The whole point of machine learning was that the workings of the model are to complex too program by hand. Programming consists of data and rules in and answers out. Machine learning consists of data and answers in and rules out. Trying to make the rules transparent is more or less defeating the purpose. If that were possible, then one should have programmed the rules of the system. This does not mean that the situation is hopeless. An alternative could be to rigorously test a model for bias. Although testing a model might not catch every form of bias, it might make it at least as fair as a comparable process.
If you are somewhat dazzled by all the information up till now, then that is o.k. Integrating all the different concerns and applying the right methods is non-trivial, and that is the topic of the next section.
It is o.k. to do applied
The practical application of the different decision frameworks, and all its implications, is quite involved. And when it involves machine learning, there is also the heavy lifting, uptime, monitoring, concept drift, and what have you at play. Getting all the details high-lighted could easily fill up some semesters in a curriculum or could require multiple roles to pull off successfully. And that is the point.
It is not necessary to be a scientific researcher within any one of the fields related to decision intelligence. Almost on the contrary, there is enough to know and consider to make applied decision intelligence a specialism of itself. How about gathering some business acumen? Or learning exactly at what point in the process the leverage of decision intelligence is highest? It is easy to mix in more managerial sciences, behavioral economics, or psychology.
At the same time, science is just doing fine without decision intelligence currently. Science at large has come up with methods to answer questions. Each field of research has its own set of well-recognized methods that are aimed at driving decisions in the respective fields. Actuaries, logistics, psychometrics, biology, epidemiology, the list continues. A lot of the methods in these fields overlap, but there are also distinct methods within each field. These methods were developed to cope with specific challenges in these fields, and are in a way bound to the context of these fields of science. It is not exactly clear how decision intelligence is bounded or how to draw a line.
The decision intelligence process
‘Design with the end in mind’, this rule also applies to the design of data products; not everything changed. Pin-pointing exactly what the outcome of a data product should be, and how it exactly fits into current processes can prevent a lot of misery. Having cleared this up one can look at available data, an algorithm, and finally a suitable model to deploy. But there is more.
How about setting performance criteria for the product? Not any performance will to. Perhaps apply some hypothesis testing to make sure these criteria are met? And how about the costs? Knowing where and how to fit in the data product in the process, one should be able to estimate the benefits of its application. But since making profits is the game, what multiple should be applied to compare costs and benefits?
The aim is to ‘Make data useful’. Let’s break that down into its three different parts:
- Make – The decision intelligence process aims to create things, its coding. Although the coded end-result is important, the code and or the frameworks like Tensorflow, sklearn, or pandas are not where the bacon is.
- Data – If handling data, there are some methodological aspects in the realm of analytics and statistics that need to be taken into consideration.
- Useful— Zooming out a bit, making data useful requires designing the right products that fit the business. That includes why the model is so great in the first place, what assumptions and values are underlying the product?
Decisions, decisions, decisions. Who is the decision-maker of all this? This where the question pops up whether to go for the unicorn data scientist or for a mixed team setup for creating data products. Let’s revisit the topic of roles.
Decision intelligence teams and the decision maker
A common view is that managing data products is harder than managing most regular IT projects. The market for data science tools, end-to-end platforms that take care of it all is growing. In the case of machine learning, there is easily the same amount of software involved. On top of that, there is data-provenance, model validation and monitoring, model bias, data labeling, and what have you. Still, the do-it-all data scientist is in high demand. Could it be that the emerging end-to-end platforms are capable of scaling down all the complexity back to a one-man show?
End-to-end platforms are certainly providing convenience. Less high-lighted is how there still is a need for a diverse skill set, even after taking these platforms into account. In this section, an eclectic list of possible roles will be listed. Rarely one would hire for all these roles, but it is good to know what is being left out:
- Domain expert
- Product manager or product owner
- DevOps and site reliability engineer
- Data scientist
- Data engineer
- Privacy expert/ lawyer
- Ethicist, lawyer
- Statistician
- Machine learning engineer
This is a short-list of possible roles. The list is a bit software heavy, the team is likely involved in machine learning. And that is usually at scale. With a large scale comes a large responsibility. So while machine learning projects can be executed at a small scale, adding scale means adding responsibility. Yes, a data scientist can deploy a model to SageMaker, but that is often not the goal, right? Gradually the nature of the project changes as the scale changes. This is why one of the roles is left out, the role of a decision maker (a decision science term) or a data science leader.
In this post, the number of concerns in data intelligence projects has been piling up. One could argue that all this thinking and nuance is not agile. But confusing agility with ignorance is just plain hubris. And GDPR is no joke. Sooner or later having an applied expert with knowledge of both the opportunities and the pitfalls of data intelligence projects becomes quite handy.
It is time to wrap this all up.
Final thoughts
Decision intelligence is about defragmenting the current decision making landscape. Whilst there is an emphasis on application, it is not possible to execute properly without having a understanding of theoretical and the deeper mechanics of decision methods. There is both a practical and theoretical aspect to the subject. It would certainly help to execute decision type projects if more people have a better, both general and deeper, understanding of these methods.
The methods that fall under decision intelligence stem from different fields of study and often overlap. Some deduplication and cleaning up would definitely be of value. Recognizing common patterns in decision making perhaps makes it possible to better formalize the decisions about decisions, and by doing so raising the bar for all.
But there is something distinctly meta about decision intelligence. Most decision methods are a means to an end. Methods are often developed because there is a particular need for a new way of looking at things. This need is driven by fields as economics, biology, physics, or engineering. Not all methods make sense without the context in which these were invented. Decision methods are not invented in and of itself: what will drive the innovation in the decision intelligence field? Decision intelligence provides good starting points, but how is it bounded?
Having said that. I thoroughly enjoy the many blog posts of Kozyrkov and the valuable insights these provide. Hopefully, this post will motivate you in your own decision making expertise.