Experimentation and Causal Inference

Updated on Feb-20, 2021
Data Scientists are aware of the caveat that ‘correlation does not imply causation.’ This month alone, Medium has recommended 20+ posts containing this catchphrase. After checking the writing tone, these posts imply or say out loud that:
Correlation is not as good as causality!
The tendency of ‘bias towards’ causation in the data world is understandable, as it takes years of professional training to grasp the causal gist. In contrast, the correlational study has a low entrance bar.
On the application side, many Business scenarios require causal insights, including pinpointing the target audience, iterating future products, and generating actionable insights into customers’ behaviors.
However, these should not be the reasons why we reject correlational study and treat it with less appreciation. Each type of work has a wide range of applications.
Part 1: Concept 101
1. Correlation
In its simplest form, Events A and B happen together but without causal claims. In other words, we don’t know if A causes B or the other way around.
For example, an online booking agency (e.g., booking.com) just spent 1 million and came up with a new web design on November 10th, and there was a surge of website traffic one week later. Event A (new web design) and Event B (the surge of traffic) are correlated, but we don’t know if there is a causal relationship.
UX Designers are waiting impatiently and want to claim the credits.
2. Causality
On top of being associated, causality adds two additional layers of requirements: a time sequence and no alternatives. In the above case, a causal claim suggests Event A causes Event B for two reasons. First, A happens before B. Second, there should be no alternative explanation for the website traffic spike, which needs to be verified.
3. Alternative Explanations
Working with the Product team, Data Scientists have come up with three additional possible explanations for the traffic spike:
Hypothesis 1: Increased spending in digital marketing from the last three quarters finally pays off.
Hypothesis 2: Improved macro-economic conditions boost customers’ willingness to travel.
Hypothesis 3: It’s about the time of the year for travel. Customers start planning for family trips for Christmas.
Here comes the different focus of these two approaches: the school of correlation only tells us how strongly these events are related but can’t decide if there is a causal relationship nor the direction of causation; the school of causation needs to address these puzzles.
To rule out the alternatives, Data Scientists choose one of these three paths: experimental, quasi-experimental, and observational designs.
Part 2: Three Causal Approaches
Solution 1: Experimental Designs
For the hardcore causal "inferencer," the Gold Standard of Causal Inference is a Randomized Controlled Trial, which randomly assigns research subjects to the experimental condition to eliminate biases and attribute the difference to the treatment.
In the above web traffic case, we can run an A/B test and randomly assign customers from different localities to receive the new design or the old version.
However, the devils are in the details.
Technically speaking, the random assignment should eliminate covariate bias and create balanced data distribution. However, it’s easy to have a spillover effect considering social media’s penetration into people’s lives. We all post pictures of what we eat, what we dress, and what we do online, which may dilute or even invalidate the causal effect.
However, Data Scientists should duly note the downsides of experiments, including:
- Time-Consuming: It takes a long time to run and collect data.
- Ethics: We can’t randomly assign people to smoke and track the ratio of getting cancer. e.g., Facebook was under fire for the online negative emotions experiment.
- Threats to Validity. It’s impossible to eliminate the spillover effect.
- Financial Cost. How much money does it cost to recruit 1000+ people and conduct onsite experiments? For online experiments, the cost is also worth noting. Poorly designed web experiments can cost companies millions of dollars.
- Engineering and Data Manpower. Does your organization have the necessary engineering staffing to implement a full-fledged online experiment? Does a Power Analysis ring any bell? How to reduce variance while the treatment effect is small? How to find the optimal stopping time for online experiments?
Solution 2: Quasi-Experimental Designs
For the reasons listed above, it is impossible to adopt an RCT every time, and we can settle for a Quasi-Experimental Design. For this type of design, researchers do not have full control over the random assignment, and we may have to deal with imbalanced covariate distribution.
In a sequence of blog posts, I’ve introduced multiple quasi-experimental methods to identify causation.
Regression Discontinuity Design:
Regression Discontinuity Design: The Crown Jewel of Causal Inference
A Practitioner’s Guide To Difference-In-Differences Approach: Wage Goes Up, Employment Goes Down?
All of the quasi-methods (e.g., DID, RDD, ITS, etc.) share the same design idea: take any prior differences between the treatment and control groups into consideration and find a way to rule them out. We can seek help from time (i.e., comparing today to yesterday, panel data) and space dimensions (i.e., comparing to other similar cases, cross-sectional data).
We have to check available resources and constraints before choosing the most appropriate method. If applied correctly, these quasi methods derive causal inferences as close as an RCT.
Solution 3: Observational Design
The observational approach offers the last resort. Researchers have no idea of the Data Generation Process and no control over the intervention assignment. That’s why the observational method often generates imprecise and biased estimates.
For example, the Facebook Data Science team compares the performances of experimental and non-experimental approaches. They find the observational method performs poorly in estimating the advertisement measurement (the original paper).
Part 3: Business Partners
Running experiments is expensive, and the observational approach is not reliable.
What should business folks do?
As always, I’d recommend the following iteration steps:
- Run a mini-experiment.
- Keep it going for a while and collect some preliminary findings.
- Look out for any updates: something new and different, or the same old?
- If a discrepancy occurred, adapt the workflow, e.g., go back to your business questions, collect new variable/data, etc.
- Suppose nothing changed over the time frame, Congrats. We just tested our hypothesis without carrying out an A/B test on 10 million+ customers.
Always go back to our business questions/hypotheses to validate our models.
Always keep iterating over the workflow.
We can have immediately usable findings to help the product team get started and do minor calibrations if needed.
Big tech companies have incorporated experimental thinking into their business strategy and product development pipelines (check out these incredible work: Facebook, Netflix, and Airbnb).
Part 4: Which Is More Important?
Voice 1: Why Causality?
Causal inference brings benefits for today and tomorrow.
Causality research shows how users engage with our products and quantify the engagements, producing actionable insights for today.
Human beings change their behaviors over time, and the business to adapt side-by-side. Longitudinal causal work helps us track such changes, predicting the future trend for tomorrow.
Voice 2: Why Correlation?
Correlational research has a broader market with more business scenarios. It is so because correlational study requires less "picky" statistical assumptions.
For example, big retail companies arrange store layouts and put similar products together.
As far as I know, Target, Walmart, and Costco rearrange store layout following associational analysis.
You may have heard of the Diaper-Beer Syndrome: new dads grab a cold one after shopping for diapers for their newborns on the way out of the store. So, businesses put Pampers and Bud Light nearby to bundle the sale.
Honestly, shopping is way too heavy-duty for men.
The D-B Syndrome is a business scenario that we care about WHAT products sell together and less so about WHY.
Some items may be correlated for good reasons but more often for no reason. They simply do, and it’s OK for not knowing the why. So, a strong correlation is good enough.
Things are related for a reason.
Things are related for no reason at all.
Voice 3: When and How To Use?
For Causality:
- Why customers only browse the product catalog but never finish the transaction on Walmart?
- How would the new web design affect customer retention and satisfaction?
- Why users disengage with the product?
- Why customers in emerging markets only shop offline and not online?
- For all other questions related to whys and hows.
For Correlation:
- What other products sell together besides Pampers and Bud Light?
- Where to put a food court at a Costco store?
- Where to open another Starbucks, another Amazon warehouse?
- Life Science. Doctors don’t understand how certain diseases develop and rely on the associated signs and symptoms to diagnose.
- Personalized Recommendation System. Amazon adopts an item-to-item collaborative filtering system. It analyzes the past browsing/purchasing history and recommends associated merchandise to the customers.
- For other millions of questions that don’t need to know whys and hows, correlation design is preferred.
Takeaways
Instead of asking which one is more important, the real questions we should be asking are:
- What are the pros and cons of each school?
- What available information do we have? What constraints are we facing?
- How and when to adopt each one?
Medium recently evolved its Writer Partner Program, which supports ordinary writers like myself. If you are not a subscriber yet and sign up via the following link, I’ll receive a portion of the membership fees.
Read every story from Leihua Ye, Ph.D. Researcher (and thousands of other writers on Medium)
Experimentation and Causal Inference
8 Common Pitfalls of Running A/B Tests
Data Scientists Should Run More Experiments For Good Reasons
Enjoy reading this one?
Please find me on LinkedIn and Youtube.
Also, check my other posts on Artificial Intelligence and Machine Learning.