The world’s leading publication for data science, AI, and ML professionals.

The shaky bridge from Data to Action

One reality, alternative facts, different opinions, opposite actions

Note: In this article I am attempting to examine the relationship between the facts and the data and our subjective interpretation and subsequent decisions. This interest in the topic intensified recently as we increasingly witness diverging interpretations of what seems to be the same reality.

I’m too hot – I’m too cold …

If I say "it’s too hot" and you say "it’s too cold", those are just opinions, none of us is right! If you state instead "it’s 70 degrees" then that would be fact that is non debatable and we can have a more fruitful conversation on the implications.

That’s how one of my colleagues used to interrupt heated debates during meetings and restart the discussion by urging people to focus on the facts and the data. His intervention often had the merits to calm spirits in the most tense situations. In many ways, this statement had the flavor of a truism but something in it really appealed to my Cartesian mind: cold facts and data are the essence of truth, subjective opinions take us further from the truth. The founding charter of the Statistical Society of London (founded in 1834) also said something along these lines: data has to receive priority in all cases over opinions and interpretations. Data are objective; opinions are subjective. This creed still guides many statisticians today: in fact I think the "blind faith" in data is stronger than it has ever been, to an extent where we purposefully decide to ignore human experts opinion and instead rely on "what the data says".

In an era with an unparalleled abundance of facts ("big data" collected on everything, accelerating scientific progress, multiplying computing power to process the data …), we should be closer to the truth than any past generation has ever been! And yet our times are also filled with increasingly diverging opinions and beliefs, not only on ideologies like the 20th century, but on core facts (fake news anyone?). Never in our modern era have facts and data been more abundant, and yet "truth" more elusive, relative and subjective.

To me this is a fascinating paradox that merits further investigation. I attempted to do so in the following by analyzing how we actually apprehend reality and decide what is true from what is false. I summarize my approach using the following chart.

Source: Wissam Kahi
Source: Wissam Kahi
  • Facts are the raw data that describe reality generated through a process of observation, they are the closest we can get to reality. This is the observation layer
  • Descriptors (or estimates, models, statistical summaries) allow us to summarize the raw data to make sense of it (e.g. averages, variances, medians, correlations, etc). This is the analysis layer
  • Charts are the various ways we represent the truth (using graphs, pictorials, images or even the descriptors themselves). This is the visualization layer
  • Narratives are our subjective interpretations. Narratives can be informed by estimates but they are also heavily influenced by a number of other external factors such as our values, beliefs (education, background …). This is the opinion layer

Finally opinions are extremely important because they ultimately driver actions

Let us illustrate this with another "temperature" analogy: Cimate change.

The facts are the massive data sets of temperature measurements, sea level-rise and man-provoked carbon-emissions across the various years and locations around the earth. Those are of course indisputable but unattainable (nobody has consolidated all measurements ever yet) and unsuitable in their raw format for human consumption (our brains cannot assimilate them as I have illustrated in a previous blog post.

The estimates and the descriptors are the various ways scientists use to model, summarize and make sense of the data: average increase in temperatures and in sea-levels worldwide, occurrences of catastrophic weather events and heat waves, etc. and correlations with the man-made emissions

The representation and visualization are the various publications, articles and and charts to represent the above.

The resulting opinions and beliefs can of course vary from one end of the spectrum people affirming that there is a definite relationship between climate change and man’s behavior and the other end stating that there is not enough evidence to this statement and that the climate change is purely a cyclical phenomenon that is not linked in human activity. In reality scientists will often state their affirmations in terms of probability (for example a scientist may claim there is a 70% probability that man-provoked climate change will cause a 50 cm rise in sea level is the next 50 years or in the opposing side "There is a 80% probability that natural climate variations caused the rise. The probability that this is provoked by green house Gas emissions is so small it is insignificant ) but practically these will be simplified often into more generic and definitive and bite-size statements (fossil fuels cause sea-level rise – sea level rise is not provoked by man and there is nothing we can do about it).

Actions are the government policies (or non-policies) that are put in place as a result of these opinions

This simple example clearly shows how smart educated scientists can reach different conclusions from the same facts, and more importantly, government implementing very different policies

So where does the disconnect in this "Reality – Facts – Estimates – Opinions – Action" happen? Let us examine each dimension separately


I- Facts (observation layer)

The coastline paradox

There is a story in the history of science that always fascinated me: Lewis Fry Richardson was an English physicist that wanted to test a funny theory: the probability of war between two countries depended on the length of their common border! While researching data, he started stumbling on troubling inconsistencies: the length of a border, say between Netherlands and Belgium, varied significantly between sources. It could be for example 380km in one source And 449 km in another.

Richardson was eventually able to solve this paradox by realizing something that seems quite intuitive to is today: the length of a border or any natural coastline depends on the size of the ruler. You will get a different answer with a 200 km ruler than if you did it with a 50 km ruler (see image below)

Coastline paradox illustration - Source: Coastline paradox - wikipedia
Coastline paradox illustration – Source: Coastline paradox – wikipedia

This finding was later called the "coastline paradox" and was an instrumental element in the study of fractals by mathematician Benoit Mandelbrot.

You may think that this is a rather trivial finding: of course with a finer ruler I will get longer lines for irregular lines…. But what is interesting to me is not Richardson’s discovery itself, but rather when it was made: 1950! Think about that, by that time the scientific community had made significant strides in areas like quantum mechanics and general relativity, and yet nobody had noticed this trivial fact that could very well have been discovered by the Ancient Greeks!

What was considered an unquestionable "fact" – the length of the border between Spain and Portugal is 987 km claimed the Spanish and 1214 km claimed the Portuguese – turned out to be wrong for both. In fact fractals theory shows that this length diverges to infinity as the ruler becomes finer: from a theoretical perspective there is no length to speak of …

What we see and what we don’t see

Key chart from Nicholas Nassim Taleb's paper "What You See and What You Don't See"
Key chart from Nicholas Nassim Taleb’s paper "What You See and What You Don’t See"

One of the most interesting mathematical demonstrations I read about recently is by Nassim Taleb. In a nutshell the general conclusion reads as follows: if you have n observations that form a particular distribution (i.e. the histogram of the observations) and that K is the largest observation in your sample, the probability of a future observation being larger than K is 1/n (for all power law distributions – which is quite a generic set). This rather simple conclusion has profound implications: It tells us things like "How likely are we to have heat waves stronger than the 100 ones already on record" or "how likely is the next financial crisis to be stronger than the ones we have witnessed". Indeed, many "experts" overly rely on existing historical data, which makes them under-estimate the potential "fat tail" hidden behind the maximum pattern we have observed historically. Once again, this comes as a humble reminder that our observations are sometimes what Nature has chosen to reveal to us, but she may have kept some hidden cards in reserve …

PS: If you are interested in an easy illustration of this property, check-out Mike Lawler’s great post . The mathematical demonstration can be accessed here

Conclusion

How many other "truths" do we consider unquestionable today, only to have our confidence shaken by some future discovery? Of course discoveries like the one above do not happen every day, but when consider that facts are often "measurements" of reality, we have to allow for some margin of error (even with the best of intentions – like the margins of errors caused by how fine the ruler is).

I work myself with raw data and I know from my experience and from observing colleagues, that there is strong bias towards over-confidence in the quality of the raw data, often because this is the best you can get. Questioning the source, double-checking data quality and assigning proper margins of errors to it is a tedious task that most analysts avoid because it’s much more appealing to dive straight through to the analysis … just like Richardson did with his theory on probability of war (that’s much more exciting than measuring borders …).

Kant said we never really experience reality but only perceive it through the veil of our senses (he called it the veil of perception). It is good to remember that data in itself is an artificial human construct: IT IS h_ow we pe_rceive reality – and it’s appropriate to say it’s through a "veil" as well.


II – Descriptors (Analytical layer)

Never cross a river 4-feet deep on average

I wrote a separate article titled "Never cross a river 4-feet deep on average – summarizing War and Peace" on this very topic. (The article highlights the challenges and inevitable "loss of information" when we summarize data.) The key premise of is that attempting to use descriptors such as averages to summarize data sets is a little bit like trying to summarize "War and Peace" in a couple of sentences: a noble effort but to some extent ludicrous for conveying what this masterpiece is about to a dear friend for instance.

I will not expand on the topic here but below is just one of the various examples detailed in the article to give you a flavor.

I happen to be a co-owner of a catering company that hires talented refugee chefs from all over the world. The company chefs have developed dozens of dishes that come from their culinary heritage – while all exceptional in my mind, their cuisine can be more or less successful with the typical American Palate. In our continuous effort to provide the best to our customers, we wanted to understand who our best chefs are, we sent a survey where customers can rate every chef’s cooking. At each event we cater, each one of the chefs typically prepares one dish.

The average of the results for every chef are represented below

Ratings per chef at Eat Offbeat— Wissam Kahi
Ratings per chef at Eat Offbeat— Wissam Kahi

In this representation Faven has much lower ratings than her fellow chefs. We should probably fire her, or at least ask her to help in the kitchen instead of torturing our customers with unknown and apparently unappreciated flavors?

Well … Not so fast. It turns out that all of these conclusions were wrong or at least misleading: Our survey did indeed ask each customer to rate the chef, and we know that each chef prepared a different dish for each event. What if, instead of averaging just the results for the chef, we averaged the results for the corresponding dish? The results are below:

Ratings per dish at Eat Offbeat - Wissam Kahi
Ratings per dish at Eat Offbeat – Wissam Kahi

As we can clearly see, the most popular dish is Egyptian Moussaka, and it is the creation of… chef Faven! By relying on our average per chef only, we were about to fire the person responsible for our most popular dish! Chef Faven has 2 other dishes that were bringing her average down, and those are manifestly not so popular with New Yorkers and should be eliminated. But she clearly has some talent and we should try to encourage her to create some new dishes.


III – Representation (Visualization layer)

Picasso had a famous quote that would be the best subtitle for this chapter: "We all know that Art is not truth. Art is a lie that makes us realize the truth, at least the truth that is given to us to understand." I will state that when we are visualizing data, we are actually engaging in a form of artistic creation where Picasso’s quote becomes very relevant.

Change colors … change minds

Let me illustrate with an example that I read on the blog of Tableau Tech Evangelist Andy Cotgreave. Below is a chart of Iraq’s casualties in 2011 from the South China Morning Post

Iraq's Bloody Tool from (Dec 17 2011 - South China Morning Post)
Iraq’s Bloody Tool from (Dec 17 2011 – South China Morning Post)

What is striking of course in the chart is the use of the red "bloody" color and the inverted axis that suggests dripping blood. It is extremely effective in stirring the reader’s emotions and the title "Iraq’s Bloody Toll" is quite fitting.

Consider now the spin that Andy Cotgreave did on this chart below:

Andy Cotgrave's take on the Iraq Bloody Toll
Andy Cotgrave’s take on the Iraq Bloody Toll

In fact, not only did he use the same data, but he used the same chart – all he did was rotate the chart upwards (as you can see from the inverted digits) and give it a different color. Of course, a different title is now more appropriate "Iraq: Deaths on the decline".

Let’s pick another example that may resonate strongly with many US based readers: The 2016 US presidential election map below

2016 US presidential election map by county (Red = Republican, Blue = Democrat) - Source: Land doesn't vote - people do by Rain Noe
2016 US presidential election map by county (Red = Republican, Blue = Democrat) – Source: Land doesn’t vote – people do by Rain Noe

The above map, filled with red, seems to suggest a landslide win by Trump. But as you may have guessed by now, it’s extremely misleading to use a map to represent voting by people. A counter-balance to this representation has been given by Karim Douïeb below

Karim Douïeb population adjusted 2016 election adjusted map (Red = Republican, Blue = Democrat)
Karim Douïeb population adjusted 2016 election adjusted map (Red = Republican, Blue = Democrat)

In this visualization the county size is represented proportional to its population instead of its geographical surface (after all people vote, not land masses!) and shows a very different picture visually.

A story is worth a thousand … data points

"A single death is a tragedy; a million deaths is a statistic" – Josef Stalin, to whom this grim but sadly true quote is attributed, cunningly used it to his advantage (and to the detriment of humanity of course). It is a known truth that stories have more impact than statistics, the particular and the specific more striking than the generic, addressing the hearts is more effective than appealing to reason …

What is still shocking to me is how true this still is, even with educated people who should be aware of their own biases: a charity raising money for refugees for instance will raise more money by showing the photo of a an individual girl than a showing a photo of a group of refugees (or even less showing statistics). Even worse, showing the same girl with her brother or her family will also raise less money … (t[here](https://www.musestorytelling.com/blog/importance-of-storytelling) are several studies on the topic, but you could check a couple here and here)


IV- Opinions (The interpretation/narrative layer)

I was recently selected to be a member of a Grand Jury. For 4 weeks, I sat in a room with my 22 other fellow Grand Jurors, Assistant District Attorneys would come and present their case against a particular defendant to us, and we had to vote whether there was enough evidence to indict the defendants (meaning enough evidence to send the case to trial). We reviewed ~40 cases in a period of 4 weeks.

What I initially thought would be a terrible burden and a huge waste of time actually turned out to be a very interesting experiment: is was very insightful to observe how different jurors draw different conclusions when exposed to the same facts. Indeed, in many instances the facts are incomplete – but it is a characteristic of the human mind to fill these factual gaps with subjective assumptions and build a narrative – isn’t that precisely what an opinion is? Not having witnessed the crime with our own senses, we are being invited to build a narrative of it in our own minds, filling the gaps accordingly.

Judging whether there is enough evidence to send a case to trial or dismiss it is obviously an _opinion (_In that light, the system is not perfect as it relies on people’s opinions and not pure facts, but interestingly it is in many ways the only "fair" option we have). __ I would argue similarly that interpreting or "understanding" a data chart is necessarily forming an _opinio_n about it – otherwise it’s just numbers on paper.

Let’s consider another familiar analogy: the cup is 1/2 full and the cup is 1/2 empty are both accurate opinions (subjective reality) of the accurate statement "the 6oz cup has 3 oz water in it" (objective reality). The latter is not any "truer" than the former. Interestingly an Artificial Intelligence will only be able to express the objective statement such as "the cup has 50% of water in it" and will not be able to make a distinction between the half full / half empty opinions. Those are distinctions that for the foreseeable future will remain intrinsically human – which is precisely the reason I consider them critical.

Simply put, the way we process information depends only partly on the information itself, and just as much on our biased perception. If the representation layer (previous chapter) deals with the biases that the author introduces in conveying the data, the interpretation layer deals with the biases that the reader of the data introduces in making sense of the data. To expand the metaphor I used: if the representation layer is about the author representing the data with different color brushes, then the interpretation layer is about the reader looking at the facts through their own colored glasses

There are several factors that can significantly influence this interpretation, some of which are inherent to our human and subconscious biases, other rooted in our intrinsic beliefs, culture and world views, and finally others in external factors such as the group we happen to be into when we "consume" the facts.

1 – Subconscious biases — the narrative fallacy

The Nobel economist Daniel Kahneman wrote an entire book, "Thinking , Fast and Slow" on the topic of subconscious biases. I will not attempt to summarize it here — I would highly encourage you to read it though — but will illustrate with one example on the narrative fallacy.

The narrative fallacy

Imagine you are trying to assess the efficacy of the sales reps on generating revenue. You plot the revenue per customer against the # of calls the sales reps made to each customer and generate the chart below.

Increasing calls to the customers increases revenue
Increasing calls to the customers increases revenue

This scatter plot shows a strong correlation between the # of customer calls and the revenue generated by customer. You present it to the company leadership and everyone sees the clear pattern: when we increase our calls to customers the revenue increases accordingly. Therefore the logical conclusion is to incentivize our sales people to make more customer calls – especially for customers we have neglected in the past.

Imagine now that you invert the axis and plot the # of calls on the Y axis. The chart looks as below.

When the customer revenue is higher sales people increase the # of calls to the customer
When the customer revenue is higher sales people increase the # of calls to the customer

Now at least some people in the audience will come to a different conclusion: when the customer revenue is higher, sales people tend to call on them more often!

You probably have guessed that there is a logical fallacy in both these statements. In fact, a perfectly plausible explanation for the correlation is a 3rd confounding variable that impacts both metrics, for example the customer size / propensity to spend could be impacting how frequently the sales people would call on the customer as well as revenue with the company – said differently the sales people are smart enough to call on the customer that are "worth it".

Potential explanation for the correlation of customer revenue to frequency of sales calls
Potential explanation for the correlation of customer revenue to frequency of sales calls

Many of you may be familiar with the famous "correlation does not imply causation" that we learn in statistics class. The shocking thing is how often executives, managers and highly educated people who are also aware of the correlation / causation difference still fall into this trap every day (in fact the above fictitious example is inspired from true events).

Daniel Kahneman talks about the mind’s tendency to "jump to conclusions". Nicholas Nassim Taleb, another one of my favorite authors explains this as the narrative fallacy

The narrative fallacy addresses our limited ability to look at sequences of facts without weaving an explanation into them, or, equivalently, forcing a logical link, an arrow of relationship upon them. Explanations bind facts together. They make them all the more easily remembered; they help them make more sense. Where this propensity can go wrong is when it increases our impression of understanding

In our example, we are trained to try to explain the variable on the y axis by the behavior of the variable on the x axis. It’s much more interesting if we can link the variables in a causal story. But it’s enough the flip the axis to get a different interpretation … Of course people who are trained not to trust their first instincts on this and to dig deeper will often have different interpretations of this same data than the "common sense".

2- Intrinsic values and beliefs

Subconscious biases are intrinsic to human nature and by definition common across individuals. In contrast our values and beliefs are specific to each individual as they are influenced by the individual education, upbringing, cultural environment, past experience, etc.

The climate change topic is ideal to showcase how these values for can affect the interpretation of the data. The following chart from NOAA² shows the increase in CO2 in the earth’s atmosphere, which has recently surpassed 410 ppm.

CO2 PPM trend - source: climate.gov
CO2 PPM trend – source: climate.gov

Mark Levin, the conservative commentator and radio host uses this information to argue that CO2 is only a trace gas in the planet’s atmosphere. He condemns the establishment’s often strong efforts to dissuade counterarguments and claim scientific near-unanimity on climate change."They never mention what a tiny fraction of the atmosphere CO2 is," says Levin. https://www.washingtonexaminer.com/americans-for-carbon-dioxide-mark-levins-idea-whose-time-has-come

From a pure factual perspective, Levin is right: CO2 is indeed a tiny fraction of the atmosphere. But perhaps someone should give him a glass of water with a tiny fraction of arsenic in it and ask him to show us how brave he is – after all it’s so tiny …

Joking aside, this is a perfect illustration of how adherence to moral values and group thinking (in this case the climate denialists group) can heavily affect facts interpretation. A more nuanced chart could be the following which looks at a much longer time period:

CO2 ppm trend - Source: NOAA at Climate.gov
CO2 ppm trend – Source: NOAA at Climate.gov

This shows that the 410 ppm is unprecedented historically, and a more dramatic figure to quote would be "the CO2 concentration is today more than 30% higher than it’s historical peak over the last 800,000 years".

What we see with Levin is confirmation bias at work : when exposed to several facts, people will pick and chose the data that best supports their current beliefs and ignore everything else. We see the same confirmation bias on the other end of the political spectrum with people picking the data that will make it sound most dramatic. The reality is that while there is scientific majority on one end of this argument, it is not yet a full consensus as there are still gaps in the factual analysis that cannot provide an "irrefutable proof" one way or the other that a a) the increase is provoked by humans and b) it will cause to catastrophe. I have my reading on what to do about this despite the uncertainty that I will elaborate on in the next chapter.

3- External factors

I am borrowing this example from an article by Howard Wainer and Harris L. Zwerling but expanding on it. I was also exposed in Daniel Kahneman’s book Thinking Fast and slow.

You are a data analyst in one of the leading private laboratory developing a new cancer and have been tasked by the leadership to understand the geographies with highest incidence of kidney cancer. You do your research and present the following data map to the leaders

Note: This scenario is purely ficticious but the data is real from the National Cancer Institute

Incidence rate of kidney cancer per 100,000 vs. US Population by county - Source: data by Howard Wainer and Harris L. Zwerling - visualization by Wissam Kahi
Incidence rate of kidney cancer per 100,000 vs. US Population by county – Source: data by Howard Wainer and Harris L. Zwerling – visualization by Wissam Kahi

You show the chart to the executive leaders assembled around the table and for the first minute or so nobody knows what to make of it. But just then the CEO who has a sharp eye says: "I get it! The counties where the cancer incidence is highest are mostly rural, sparsely populated in the Midwest, and the South. These states have higher poverty, a higher fat diet, too much alcohol and tobacco and no access to good medical care. It makes total sense!" And immediately now everyone sees the pattern and nods in agreement.

If you have nodded in agreement again yourself, think again, because the conclusion is in fact false. Indeed the counties where the cancer incidence is lowest are also rural, sparsely populated and in the Midwest, South and West … The chart below which splits the counties <50,000 in population and >50,000 in population and shows how they divide between low incidence and high incidence of kidney cancer. As you can see, we have just as many counties with low incidence of cancer in both types of counties, and roughly just as many low incidence as high incidence in the smaller counties.

Distribution of counties with low, moderate and high cancer incidence rates in the South, West and Mid-West - counties with population <50,000 vs. > 50,000 - Source: data by Howard Wainer and Harris L. Zwerling - visualization by Wissam Kahi
Distribution of counties with low, moderate and high cancer incidence rates in the South, West and Mid-West – counties with population <50,000 vs. > 50,000 – Source: data by Howard Wainer and Harris L. Zwerling – visualization by Wissam Kahi

The rational explanation here is the "Law of small numbers": when we have smaller sample populations, we are more likely to have higher variances around the mean incidence rate. As a result we are more likely to find extremes (both low or high) in the smaller samples (in this case smaller counties) whereas the larger samples will be tend to be closer to the mean with less variation (indeed you can see a higher % of larger counties in the "moderate" category.)

The above explanation is essentially random – there is no causal reason why smaller counties would have lower or higher incidence. But our minds don’t like randomness and will always be actively seeking causal relationship (more on this in Daniel Kahneman’s book).

This is similar to the narrative fallacy above. But there’s another effect of interest here: note that the explanation could have gone a completely different way: some of the people in the room could have noticed that many small counties have a very low incidence of cancer and explained this by the benefits of the rural lifestyle, the lack of pollution, etc. That did not happen: the opinion shift to one end was driven by the power dynamics and role modeling from those in the organization with moral authority (in this instance the CEO). While this example has been explicitly exaggerated for the purpose of illustration, note that this happens in organization extremely frequently, albeit in more subtle ways. Examples of how these group dynamics emerge include for example who has stronger character or authority, or sometimes simply who gets to be the first at stating an opinion. Note that a healthier debate in these instances could be encouraged by soliciting the opinion of everyone before engaging in discussions, or also providing clearer charts by someone with deeper knowledge of statistics.


V- Actions – Closing the loop

There was something that always bothered me in the "too hot / too cold" example that my friend used to illustrate "facts over opinions", something felt off. It took me some time to figure out what, and in hindsight it was obvious: the analogy makes sense if the debate is just about knowing what the temperature is, like a scientist trying to measure a natural phenomenon, but if the analogy relates to whether people are feeling comfortable with the temperature, the analogy breaks. Indeed, knowing that the temperature was "objectively" 68 degrees Fahrenheit did not diminish in any way the subjective feeling of cold or hot. After all the only way to bring satisfaction was for one of them to stand up and adjust the thermostat, therefore bringing in a new reality. Said differently, the metaphor implied that the objective truth should always trump the subjective truth. And yet what ultimately leads to action often has nothing to do with the objective and all to do with the subjective.

This "objective vs. subjective" is an age-old debate in many disciplines¹ What is relevant to this article though is that action is a result of the subjective – or collective subjectives – and these actions have an impact on reality. So the linear chain "reality → opinion → action" we described is in reality a closed loop.

It’s worthwhile here to introduce the concept of Reflexivity: so far we have been talking about the observer as an individual apprehending the data and facts through several lenses (observation, analysis, representation, interpretation …) but assuming that this individual is a mere passive observer of reality, like a scientist observing ants. Reflexivity posits that in many fields, it’s incorrect to consider the individual purely as a passive observer, as the views that he will develop will actually have an impact on the situation he is observing – making the relationship reflexive. Said differently: __ reality is not independent of the perception of the participants – it can be shaped by this perception. We have determined in the previous chapters how 2 individuals can develop 2 very different perspectives when exposed to the same facts. I will explain here why this perception matters especially when it become the reflection of a collective. Indeed, this is particularly relevant in the fields related to human affairs, such as finance or social sciences, where a collective of _subjective observer_s are also participant in the _objective realit_y they are observing.

Reflexivity was taken up as an issue in science in general by Karl Popper who highlighted the influence of a prediction upon the event predicted, calling this the ‘Oedipus effect‘ in reference to the Greek tale in which the sequence of events fulfilling the Oracle’s prophecy is greatly influenced by the prophecy itself. George Soros (his pupil) later significantly expanded on the topic (check this article for a great summary). A quick example can be found in the Financial Markets:

Source: Fallibility, Reflexivity, and the Human Uncertainty Principle by George Soros
Source: Fallibility, Reflexivity, and the Human Uncertainty Principle by George Soros

a) The participants (e.g. hedge funds) develop a biased view of the price of assets that differs from the facts (i.e. as should have been reflected by the market fundamentals based on the prevailing economic theories) – it’s easy to see based on the chapters above why this perception can be biased. Soros calls this the cognitive function and introduces the concept of fallibility to describe the human bias

b) The participants will then act upon their biased perception. And the market will adjust prices accordingly. This will favor companies that have been more favorably prices as they will have more access to capital and penalize companies unfairly biased downwards. To some extent this becomes a self-fulfilling prophecy. Soros calls this the manipulative function³.

c) The continuous interaction between the participants and reality reflects this cyclical effort to both understand and shape reality.

A very similar phenomenon can be said about political surveys and polls. An "imperfect poll" published by CNN or Fox is not just a snapshot of the underlying sentiments – it also affects them!

Let us illustrate with a contemporary issue why this is also very relevant for our Data to Action framework: fake news and fake accounts have had a significant impact on a number of elections around the world. Beyond just affecting opinions, a number of these fake accounts created Facebook pages organizing demonstrations on a variety of controversial topics- often based on fake news- to which real co-hosts were enlisted unwittingly and organized the events. So "fictional people" have organized real events attended by real people whose emotions have been stoked by "fictional news"! So to demonstrate further how the lines between reality and fiction can be blurred in that case, imagine the following scenario:

In a certain town, a survey shows that 5% of people believe in issue X. A fake account creates a demonstration based on fake news to promote issue x, enlists real co-hosts that start organizing rallies in town. A new survey now shows 10% of the town behind issue X … what % (5% or 10%) would you say represents the real sentiment of the town on issue X? To be honest I don’t have a good answer myself – but the example illustrates how we can not disentangle the dynamic closed loop: reality → opinion → action →reality


Conclusion – applying all this to our current crisis?

So now that we have, at least conceptually, closed the loop, we have established that

  1. Facts and observations are not to be trusted blindly – Nature may have chosen not to reveal all its cards yet
  2. Descriptors that summarize and analyze the facts are just a summarized view of reality – and summaries are imperfect
  3. Charts and representations of the descriptors can be manipulated by the "painter" to influence the interpretation
  4. People can have different interpretations of the same charts because of various biases (the narrative fallacy, values and beliefs, external group pressure, the law of small numbers …)
  5. The relationship between facts and actions is not just linear but circular, especially for phenomenons involving human behaviors and collective actions (finance, social science …) further complicating the picture

So what are we to do? I do want to conclude on a positive note. I don’t think what I am driving towards is to give up, but rather to be aware of the uncertainties in the flow and have the right dose of skepticism. This is what will continue to differentiate us humans from A.I. : An AI is objective by nature – it is unencumbered by values and cultural norms and therefore should be able to reach unbiased conclusions (e.g. based on probabilities of outcome). The imperfections are not a glitch in our software but rather a feature of what makes us human.

There is a strong belief in certain powerful circles that "the answer is in the data". I hope that with the above illustrations I have contributed to dispel that myth. As a data enthusiast myself, I do believe data is a very important piece of a larger puzzle that needs to include many other disciplines for decision making, including domain specific expertise(s) but also Philosophy, ethics as well as history. Even in the face of uncertainty, we can still chose our actions wisely.

I will now close the loop and illustrate with Climate Change again. While the facts and data are often undisputed, there is no full consensus in the scientific community on the causal relationships (e.g. is this the cause of human action?) and the forecasting models (i.e. how fast will the impact happen?), as these are higher level order and less open to falsification. That being said, I believe that this uncertainty does imply a certain course of action: even if we assign a small likelihood to the probability of the pessimistic models being right, the precautionary principle implies that we cannot afford not to do anything, because the cost of failure is way too high.

Footnotes:

1 For example, very recently I came across this reading Killing Commendatore, Haruki Murakami’s latest book "The objective does not necessarily surpass the subjective, you know. Reality does not necessarily extinguish fantasy."

2 National Oceanic and Atmospheric Administration

_3 Reflexivity is inconsistent with general equilibrium theory, which stipulates that markets move towards equilibrium and that non-equilibrium fluctuations are merely random noise that will soon be corrected. In equilibrium theory, prices in the long run at equilibrium reflect the underlying economic fundamentals, which are unaffected by prices. Reflexivity asserts that prices do in fact influence the fundamentals and that these newly influenced set of fundamentals then proceed to change expectations, thus influencing prices; the process continues in a self-reinforcing pattern. Because the pattern is self-reinforcing, markets tend towards disequilibrium_


Related Articles