
Ten years ago, I embarked on my journey in the field of Data Science. I clearly remember the beginning of this adventure – my thoughts, my emotions, and the excitement that came with stepping into a new territory. It all started with a consultancy position at a large insurance company.
I was part of the Research and Development team, where our goals were – quite frankly – undefined. We craved data within the company to experiment, try something different, and innovate.
Neither I nor the company was truly ready for this. But looking back, I see how essential that step was. The company has since grown into a tech leader, and for me, those years were a playground of learning – exploring, studying, and mastering the tools and technologies of the time.
When I remember myself in that context, I see a young professional Brimming with energy and a desire to innovate. I look back on all my daily failures with great affection and a hint of nostalgia because they shaped the professional I am today.
If I could speak to that younger version of myself, I would offer some advice to ease his journey. Interestingly, they would be the same advice I now give to my team members.
This article will summarize five things I have learned the hard way over these ten years.
Understanding Business Logic

It may sound surprising, but I still find people who fail to connect technical decisions to business objectives.
Understanding the nature of the problem is critical not just for building an algorithmic solution, but also for creating meaningful variables and evaluating their impact.
Once, I evaluated the impact of a "sort of oven" temperature data on a food production process.
We were investigating a defect that caused a "sort of burning effect" on the product. What the business (and common sense) suggested was that abnormally higher temperatures may have been correlated with this defect.
However, what we discovered with our analysis was the presence of a significant spike in the burning effect when the temperature was lower than the normal temperature.
An inexperienced analyst could have been satisfied with finding such a strong predictive pattern. However, business logic told us to dig deeper. Together with the team, we opted to consider it, although formally predictive, as a spurious phenomenon induced by another factor in the realization process that was unknown at the time.
By tracing back the possible causes, we discovered that the phenomenon was caused by an event that had remained exogenous until then: a previous and untracked step!
This insight allowed us to trace a new data source related to the exogenous event, leading to more predictive and interpretable variables.
I could mention countless examples like this. The lesson? Always anchor your analysis in business logic. This is what allows us to guide mathematics towards ever-rosier horizons.
If You’re Not Skilled, You Can’t Write Bad Code
You actually can, but please don’t.

There is not always the time needed to follow all the programming best practices. Moving forward dirty and fast is often necessary in experimental contexts where time is tight. Focusing the effort on trying different solutions instead of looking for the "best" code can be a strategic advantage.
However, writing "bad code" is a privilege for the experienced data scientist.
If you are not yet a "pro" developer, it is safer to go slower and try to adhere to high-quality standards. This way, you minimize the risk of making gross errors during your experimentation.
Contradiction? Not really. Here is the thing: experimenting more leads you to have more opportunities in the next phases, but this is true just under the assumption of not committing gross mistakes.
Evaluation errors during experiments can snowball into costly setbacks later – or worse, lead to unfeasible solutions also affecting your reputation.
As usual, we might then look for a trade-off. Would you indeed be happy if your plumber broke your entire bathroom only to tell you that the renovation you dreamed about cannot be done?
I needed a lot of time to learn to write good code, but it took me even longer to learn to write bad and useful code.
Bad and useless code, however, everyone knows how to write.
If You Don’t Want to Dive Deep into the Maths, Master the Intuitions

I fondly remember when I used to read scientific articles two by two. Unfortunately, staying on the cutting edge is a full-time job.
I’ve read more useless papers than useful ones in my career. Many seemingly brilliant ideas over time have been forgotten; some others became surprisingly very popular with subsequent studies.
It is not always easy to keep up, therefore.
Not everyone can afford to study full-time; many people must study and update themselves while working.
For this reason, the technique I have identified over time is to focus on intuitions instead of on mathematical reproducibility.
I come from a very theoretical Italian university, which taught me that if you can’t prove it, it means you don’t know it. It was very difficult to limit myself to managing intuitions. I was inclined to delve deeply, to do the calculations by hand, but I couldn’t with that amount of information.
I had to change the paradigm to: if you can’t summarize it (if you can’t draw it), it means you don’t know it.
In this way, I shifted from wanting to understand the HOW in detail to focusing on the WHY.
I’m not suggesting you completely neglect mathematics; I’m suggesting you choose where to delve into mathematics when to delve, and mastering intuitions first.
But please don’t just "use" things!
No Rules Rules

The human brain is presumptuous. It seeks preconceptions and creates prejudices to simplify the computational process. We are programmed this way; I am not the right person to talk to you about these things, but I assure you that just looking at the literature on the subject leads to this conclusion.
Why this introduction on preconceptions? Because people tend to simplify by building constructs, and procedures to follow. Some examples?
"One hot encoding": I remove a category otherwise "it breaks". I see lots of decision trees with missing categories.
Are there missing values? I impute the mean, and the mode; I remove them. How many XGBoost have I seen with heaps of missings imputed in a fanciful way? It’s not necessary in this scenario!
Data on different scales? I standardize, normalize, MinMaxScale. And the trees look at us very puzzled.
All the young people in the field ask me the same question: "What should I do?"
The correct answer is the easiest one: it depends!
There are no golden rules; otherwise, there would be a script _datascientist.py to apply all those rules, and in fact, I wouldn’t need a team of people but just the need to scale that script.
Fortunately (as of today 🙂 , not sure for any longer) there is still a need for someone to think, someone to evaluate. Every story, every problem, has its peculiarities. You need to get into the business logic, critically evaluate the problems, and find micro-solutions with creativity and competence.
So? What should I do? It depends!
Mathematics is the Simple Part

In my experience, data science projects rarely fail due to mathematical shortcomings.
Instead, they usually falter because of poor alignment with business needs or user expectations.
Is the solution addressing a real problem? Is it cost-effective? Is it user-friendly?
Let’s consider of a really silly case. Let’s assume we have a problem that can be faced with artificial intelligence. Let’s suppose that the problem itself costs the company €10,000/year. What happens if your solution costs €50,000/year?
I keep the problem since it isn’t worth solving.
Similarly, what about a brilliant solution that is however inconvenient for the user?
No one will do it!
Rarely in my career have I seen a project fail due to mathematics; many times, I have seen them fail due to stupid behaviors, only a few for computational reasons.
Conclusion
Here we are. I hope these five tips can be useful to someone. In my daily life, I often find myself talking about them to refresh the memory of some forgetful person.
See you around.