In a world constantly embracing new ideas and capabilities, data scientists who use increasingly complex algorithms to feed in larger and larger datasets seem more mystical than ever in terms of their prediction process.
When working as a data scientist whose main deliverables are product demand forecasts, I couldn’t help but wonder: Are data scientists just modern-day fortune tellers with mysterious but powerful magic that bless businesses with wise decisions?
This post will take a peek at the glamorous yet comprehensive world of predicting the future based on my experience and reflections as a data scientist who consistently strives to gain trust in my predictions, even from myself.
I have never regarded myself as a fortune teller, maybe because I always believed the science behind it would distinguish me from the ‘black-box’ magic used in fortune-telling. My model makes sense because we have ample features, trustworthy algorithms, and decent historical performance.
I used to think that was enough of an explanation, even when facing the technical gap between business decision-makers and model builders. Models built with the most cutting-edge buzzwords and unfathomable amounts of data always bring mixed feelings to stakeholders.
On the one hand, it infuses so much "science" into it, true science or not. The amount of effort put into this analysis should provide a sense of reassurance, particularly when dealing with uncertainties, during which the traditional methods of making simplistic predictions based solely on trends and seasonal patterns are no longer effective.
On the other hand, the predictions made by most advanced models are often too complex to comprehend and interpret. Such "black box" magic is no clearer than asking a fortune teller about the chances of getting married or becoming rich by a certain age.

If I have gained something outside of the hard skills required to derive demand forecasts during these two-plus years at my position, it’s the fact that we data scientists really should take ownership of the entire process.
The end of the work is not when you land a decent-looking model and submit your forecasts downstream. Rather, it lands much further until you’ve gained the trust of stakeholders, even when your model is forecasting so differently from their expectations.
When your model predicts differently from stakeholders’ expectations, recency bias can be a major obstacle in convincing them that your model is trustworthy.
According to Wikipedia, recency bias is a cognitive bias that gives "greater importance to the most recent event." In the context of demand forecasts, if you consider the business benefits of making product demand forecasts, you can think about inventory management, manufacturing planning, revenue outlook, etc. It helps businesses foresee the future and make informed decisions in the present.

A general good practice for product demand forecasting before the big data era is to plan based on recent events. Last month’s sales might already provide enough useful information to plan for the next few months’ inventories.
If you add a bit of seasonality, a growth rate indicating the sales trend, a straightforward analysis of the customer base in a stable environment, and some forward-looking information on future sales, your accuracy can easily be in the 80% or 90% range.
It’s very hard to fight against recency bias because there is really not enough motivation to do so when it works so well most of the time. When the era of big data arrives, everyone has heard about machine learning, deep learning, and LLM, and no one would mind referring to second opinions from these scientific but hard-to-comprehend models, especially when their forecasts align with your expectations.
Nevertheless, who would mind putting a cherry on top, especially when everyone is talking about this cherry, which stands for the most advanced innovation?

However, we all know what happened in the last couple of years. The worldwide economy experienced massive turbulence, which probably reshaped everyone’s beliefs about what could happen in the business.
Recency-biased models and correlation-based ones no longer work. No matter how sophisticated the algorithms are, making reasonable forecasts without enough training data points is very hard.
The black swans not only brought uncertainties but also redefined what is considered normal. Data scientists and stakeholders are scratching their heads, agreeing that we are in a tough and unpredictable environment and need to lower our expectations of forecasting accuracy.
Is it true that we are in an unpredictable environment, or have we not been able to build a predictive model that is robust to uncertainties, a model that has learned the true relationship?
I’ll save the argument on why the model emphasizing Causality is superior to correlation-based models in the series of articles from the Read with Me series, and here is a fun read that kick-starts the series:
Anyway, the story goes on. When causal models have established their stability and robustness during uncertainties, what happens afterward? When the whole economy seems to be stabilized again, and when I say stabilize, I mean the recency-biased and correlation-based models start to work well again. We are reaching a point when your business analysis tells you your sales will go down, and the causal model says otherwise; who would you trust?
What a long introduction, but that’s the challenge I encountered recently. How do you continuously establish your models’ credibility when they are predicting against everyone’s expectations?

To make others trust your model, you have to trust it yourself first. This trust does not come from finding algorithms that have demonstrated their superiority in other use cases or from interactive visualization dashboards showing excellent historical performances in various metrics.
Don’t get me wrong. These are necessary, but they are not the founding stone but the natural outcome of a predictive, make-sense model. The founding stone is why. Whether you are capable of explaining your model forecasts and building a convincing storyline makes the difference.
In traditionally correlation-based models, there is a trade-off between complexity and explainability. When embedding more hidden layers, it is harder to explain why your model is making those forecasts, even though it may exceed expectations on every performance metric.
Causal models don’t really face this trade-off because the fundamentals of these models are built upon answering the question of why, and that’s embedded in every step of building this model.
For feature engineering, we need to run algorithms to identify causal features to establish causal diagrams; for model discovery, we need to choose the model not only with the best fit but also reasonable edge functions that capture the correct and explainable relationship between the feature and the target, and among features; for future forecasts, we are able to have a clear explanation on why my forecasts are going a certain direction, what’s the main driver, and whether it makes sense to everyone.

Metrics come in the background. We need these numbers to build a common ground between us and the stakeholders. However, we don’t use them to fine-tune or modify the models because performance is not the primary deciding factor here. If the features used don’t really make sense, we might find the best-fit model breakdown in the next forecasting period.
Explainability, rather than model performance, is the only way to fight against recency bias. Your model should be performing well enough to be brought to the front stage, but you really need to have a storyline accompanying your model forecasts to fully grasp your audience’s attention and trust, especially when they are expecting otherwise.
Solving a forecasting problem is not about building the best-performing model or using the most cutting-edge technology. It is about uncovering the causal structure in the ever-connecting world and using it to guide future actions. Fortune tellers have their secret magics, we data scientists have causal AI. Moreover, after we discover causal diagrams, we are able to do what the fortune tellers can’t – explain the magic and answer the question of why.

However, no one says uncovering the causal structure will be easy. It involves much more expert intelligence and human intervention, but we really need to guide our AIs to make human-like decisions. It’s the only way.
They say that when an obstacle is placed in the ants’ familiar route while they are moving items, they panic and scurry around, not realizing that a small detour would lead them home. From our perspective, almost like a god’s view, we observe these tiny ants trapped by obstacles, not able to overcome their recency bias, as well as mice in laboratories enduring electric shocks for non-existent rewards just because they had tasted repetitively in the past.
I wonder if there exists a higher-dimensional being observing humanity with the same detached clarity. Do they watch us as we make predictions and navigate through our lives, much like the ants searching for their paths or laboratory mice seeking rewards? Perhaps to them, our actions and decisions are as transparent and predictable as those of the ants and mice we so curiously study.
If there are higher-dimensional beings, the real fortune tellers, they must have mastered causal diagrams and sorted all causal relationships.
Thanks for reading. If you like this article, don’t forget to:
- Check my recent articles about continuous learning in data science; seven principles I follow to be a better data scientist; how I become a data scientist;
- Check my other articles on different topics like data science interview preparation; causal inference;
- Subscribe to my email list;
- Or follow me on YouTube and watch my most recent video: