Office Hours

10 Strategies to Boost your Impact as a Data Scientist

Experiences from academia, freelancing, startups and large corporations

Dennis Eilers
Towards Data Science
15 min readJun 24, 2021

--

Photo by Bill Jelen on Unsplash

Like many other professionals in the field of Data Science, I am one of those people who derive their intrinsic motivation from the feeling of doing something relevant and valuable. I want to make a difference with my work and move things forward. However, at the beginning of my career, when aspiration met reality, I quickly realized that my technical knowledge alone would only be of limited help here. So I’ve summarized my experiences into 10 strategies that have helped me generate more impact with my work as a Data Scientist over the past few years.

Note: The 10 points follow a certain logical order based on a Data Science product lifecycle and are not sorted by importance or relevance. The collection is mainly based on experiences from my scientific career, various consulting jobs as a freelancer, working in a startup, and my current position in a large e-commerce company. My remarks therefore naturally have no claim to completeness or general validity. However, I hope to have gathered some useful food for thought for as many people as possible and would love to hear about your experiences on this topic!

1. Create and Select the Right Ideas

Good ideas are, of course, the basic building block for making a meaningful contribution later in the process. However, I would argue that this point is much simpler and much less important than one might initially think.

Why? Because there are so many tools and methods for generating great ideas that the complexity lies more in their selection and implementation. Almost everyone will know the feeling of having the next one-billion-dollar idea while sitting in the bar with friends or just standing in the shower. But let’s be honest, this usually does not lead to real impact. That’s why I’m focusing here on some hands-on tips that will turn you from a dreamer into a doer. ;-)

  • Do more than your job description says: Ever thought about working as a Data Scientist in the warehouse, or interning in your company’s call center? In my experience, the probability of generating relevant ideas at your desk while reading a paper is significantly lower than when you feel the pain of your colleagues yourself.
  • Look for low-hanging fruits: The greatest impact is often not achieved by a complex new system with many simultaneous construction sites, but by the simple solution of a concrete self-contained problem. If the problem is too multifaceted, try to break it down to its core elements and do not immediately chase the big idea as a whole.
  • Think in terms of products and not in terms of ad-hoc analyses: Even if ad-hoc analyses can deliver a large value contribution, this is usually short-term. For a truly lasting impact, your analysis or model should be able to become part of a product, if possible, to generate a longer-term passive value proposition without much effort.
  • Focus on one idea at a time: Try not to chase every temptation immediately when you are working on a promising idea.
Photo by Andrew Neel on Unsplash

2. Follow a Customer-Centric Approach

What exactly does it mean to have impact? I don’t define impact as an end in itself, in order to see my analyses flow into as many management decisions as possible, or to sit on every steering committee. For me, impact is when a customer gets the most value possible. The term customer here includes both end customers of a product or service of my company or internal customers, who in turn can produce greater value for the end customer through the use of my work.

Putting the customer in the center facilitates the daily work process and helps with questions like “what do I do” “how do I do it” and in difficult phases “why do I do it”.

In order to prove the importance of one’s own work to oneself and others, it is necessary to define at an early stage how the impact can be measured. This question will always be of crucial importance in the following steps and therefore it is important to think about it carefully. But there is good news: As a Data Scientist, you are perfectly suited to think about key figures, the necessary data procurement and evaluation methods. ;-)

But beware, typical Data Science methods for measuring model quality can quickly lead astray here when it comes to measuring impact. Let’s take a sales forecast for a manufacturing industry as an example. Suppose your task is to predict as accurately as possible how much of certain products will be sold at certain times. Now you might have the idea to compare your prediction with the reality as a measure of success. Even though this value is important, especially for the optimization of a model, this must not be confused with the real impact of the developed solution.

For this, rather other questions have to be asked such as “can we now better meet our delivery commitments to the customer?” or “can we reduce inventory costs by more accurate purchases?”. And by the way, can you roughly estimate how costly the whole implementation of a possible solution will be?

These questions are customer-oriented and may even lead directly to the conclusion that the added value of even a perfect sales forecast would not be as great as one might have suspected at the beginning of the idea generation. Perhaps the impact of a model that predicts the best time to purchase raw materials is much greater in this case?

So, to increase your impact you need to be clear what you are optimizing for/what it brings to your customer. To do this, it is crucial to know the domain inside out. Optimize for the customer not for any model metric. Think about meaningful metrics that reflect this benefit as well as possible and develop a feeling for “what is good enough” early on. Being significantly better does not necessarily mean more impact. This will help you later on not to over-optimize or even optimize the completely wrong metric.

A question for everyday life:

Would you as a customer pay for your analysis/product? If yes, how much and why?

Photo by Austin Distel on Unsplash

3. Try to Disprove Yourself to Gather Confidence

The goal of many companies today is to act data-driven to avoid gut decisions or unnecessary discussions. This should also be the goal of your work as a Data Scientist. Most of us have certainly had the feeling of holding on to something irrationally because it was our own idea or because our own ego wanted to get its way. As human as this feeling is, it is harmful to act on it, because your time is subject to opportunity costs. If you work on a low impact project, you won’t have time for another high impact project.

Your goal in your daily work must be to become a data-driven person yourself (mind you, only in terms of your decisions at work). If your metrics, which you have defined beforehand, clearly show that you are not achieving the desired impact with your approach (anymore), then a personal attachment to the idea should never stop you from discarding it. On the contrary, be happy that you have recognized early on that your work has more impact elsewhere.

Kill your darlings is crucial.

Sounds easier than it is. Even actively try to disprove your idea or the impact of your idea again and again based on your self-defined objective criteria. Are there exploratory analysis possibilities or already other/similar implementations that invalidate the impact of your idea? Use everything you can think of. This procedure, known from science, to try to disprove theories again and again based on experiments and experiences, should become second nature. After all, the word scientist is in your job description… ;-)

But be careful! Do not become a victim of the impostor syndrome. So don’t artificially denigrate yourself and your ideas or be afraid to defend your point of view. On the contrary, the data-driven approach and the continuous attempt to disprove your ideas serve to massively increase your confidence and conviction in your work (if it is worth it). Making data-driven decisions means neither over- nor underestimating your ideas but arriving at a realistic perception of your impact.

For everyday life:

Just be honest with yourself.

Photo by Jakayla Toney on Unsplash

4. Avoid Overengineering

We probably would not have become Data Scientists if we weren’t at least a little bit in love with technology. Personally, I found it fascinating how computers, without direct programming of rules, suddenly make decisions based on patterns from the past. In my view, this love of technology should never fall by the wayside and certainly not be something to be ashamed of.

But one should always be aware of this and use this characteristic at the right time and to the right extent. However, in order to maximize the impact of an idea, especially at the beginning, this way of thinking is usually a hindrance. In fact, your work should rather be driven by the question: “is Data Science even necessary for this?” or “what is the simplest solution for the given problem?”.

Maybe the identified problem is indeed highly relevant, but could be solved by a simple adaptation in the frontend of the already existing application? If a prediction model is to be developed, does it really need a deep learning approach or is a linear regression with comprehensible parameters already sufficient? If simpler approaches also solve the problem, they generally have a higher impact because they can be implemented more quickly, are less prone to error, and are better understood by other stakeholders.

To be able to make such assessments, a broad technical understanding outside of the Data Science cosmos is necessary. Only those who have a broad knowledge can assess what the best solution is or, if necessary, combine solution approaches from different areas. Therefore, look outside the box as often as possible, e.g., in areas such as software engineering and DevOps, etc.

For everyday life:

Don’t be proud of using complex machine learning models, be proud of not needing them.

Photo by Nicolas Thomas on Unsplash

5. Deliver Fast, Deliver Small, Fail Early

This is the main insight from my time in a startup. A good idea is too good to spend a year developing it in the basement while completely ignoring the customer’s needs. This also applies to your work as a Data Scientist. Let’s say you have an idea that requires the use of a predictive model, for example. If you are convinced of your idea based on your data, then look for the fastest way to get into implementation and build a first minimum viable product (we’ll talk about the proof-of-concept trap later ;-)).

Crucial for this is to know your toolbox as a Data Scientist exactly. With Google AutoML, for example, you might be able to achieve an already acceptable result with minimal effort in order to further validate your assumptions about the impact based on your metrics from part 2. You don’t have to start with hyperparameter tuning to get a feel for the potential impact and demonstrate the value proposition to yourself and others.

The frameworks used in development also play an important role here. If you know how to generate an initial sample of data from multiple sources, how to create a confusion matrix with one line of python, and how to put columns into the right format in a few simple steps, you can easily and comprehensibly achieve meaningful results in the shortest possible time. Every for-loop should already be a warning signal for unnecessary complexity. Especially simple and fast implementations require a high level of expertise and experience. For everyday use, basically assume that the hard problems are already solved by a framework.

Seniority is not necessarily characterized by mastering complex problems, but by simplicity and comprehensibility of the implementation.

And very important: Get feedback from your customers and/or colleagues as early as possible. This constantly ensures that development is going in the right direction. Keep your customers in the loop and communicate a lot to understand what they really need. And again, if you realize you’re riding a dead horse, get off as quickly as possible. Failure is not the exception, it’s the rule.

Photo by Jon Tyson on Unsplash

6. Storytelling

The classic among Data Science advice… I have long thought about whether I should include this point in my collection at all, because it is so obvious, has been described so often and is completely overrated from my point of view.

But how do I arrive at this perhaps somewhat provocative assessment? Everyone knows this feeling that you are so fascinated or excited about something that you can hardly stop talking about it. And everyone has also had the feeling that they have to make something up out of thin air for a boring seminar paper at university or similar. What I want to say:

It is extremely difficult to tell a good story badly and it is damn hard to tell a bad story well.

So, when you get to the point of presenting your idea, your analysis, your data-driven product, notice whether it comes easily from your lips and the graphics just shine with clarity and unambiguity, or whether you unconsciously try to present something more beautifully than it is. I am convinced that you cannot really present a great idea or a precise analysis in a bad way.

There’s no doubt that you can add the cherry on top of the cake, but others have already thought about that enough. Content is and always will be king. Never use storytelling to blind yourself and others. Be honest with yourself here, too.

Photo by Lukas Blazek on Unsplash

7. Be a Team Player and a Visionary

I cannot emphasize this point enough. Each of us only has 24 hours in a day. That doesn’t scale and is far too little to have a really big impact. But what does that mean in concrete terms? Is it enough to work with your team on the ideas? From my point of view, for a really good idea you need real comrade-in-arms.

You need a data engineer who is as passionate about the topic as you are and who will search even the last legacy database for useful input factors. You need a frontend developer who has internalized the topic so much that she can build a suitable interface for a presentation without much explanation. And depending on the size of the company, you need a strong network of colleagues and potential internal customers who can help you get the idea out there and give you honest feedback early on.

Impact is not created at the desk of a single person.

You can see where this is going. I would call it leadership by vision. You don’t need a formal title to confirm that you’re in charge. You simply light the fire in your colleagues for great ideas. The more convinced you are, the easier the fire will jump and your idea will have real potential to scale. To do this, always fall back on your metrics to see if the course is still the right one. This increases your conviction and therefore the confidence of others to work on something big.

And don’t get hung up on the concrete implementation and the exact building blocks of your idea. Surely most of us have experienced this as well: You want to implement your idea the way you imagine it. However, this is not necessarily advisable because a too rigid framework inhibits the enthusiasm of others to actively participate in the idea and thus weakens the result. All participants should be guided exclusively by the desired impact. The concrete realization is secondary. Therefore, appreciate every idea and contribution if it serves the overall goal.

A note for everyday life:

Arguments and emotions are not necessarily a bad thing, but can also be a sign of commitment and the feeling of working on a great cause that is worth fighting for.

Photo by Rahul Bhosale on Unsplash

8. Focus on Execution and Automation

Ideas without implementation are worthless. Implementation can mean presenting an ad-hoc analysis neatly with the help of Jupyter Notebooks. However, if you are focused on generating sustainable impact in the form of Data Science products, entirely new questions arise with regard to implementation that go beyond the usual day-to-day business of a Data Scientist. Keywords such as MLOps or classic DevOps as well as software engineering principles play a role here.

There is a seemingly endless debate about whether Data Scientists should be concerned with the overall operations of models and software, or whether they should rather focus on their core tasks where they can make the biggest impact. Personally, I am basically in favor of Data Scientist having to deal intensively with the deployment and embedding of their models into productive systems, or at least having a good understanding of it.

The consequences of a complete decoupling of model development and productive use lead to results that look great under laboratory conditions, but are difficult to integrate into existing target systems in reality, or cannot meet certain service-level agreements such as maximum response times. As a Data Scientist, it can therefore make sense to deal with questions of model efficiency in addition to the actual model optimization. How, for example, can you reduce the model size and thus the necessary response time without impairing the model quality?

Depending on the size of the company, you may have no choice but to familiarize yourself with the necessary technologies for operational use in order to achieve a real impact of your own models.

Without reliable operation, an idea or initial design remains in the proof-of-concept trap and the impact fizzles out because no value can be delivered to the customer.

Therefore, try to find out early on where your idea can dock in the IT landscape that probably already exists and what requirements are necessary for this. Not every application can wait for the response of a REST API and not every application can import a scikit-learn pipeline directly. If you know the conditions, you can develop your code, model, and infrastructure early and specifically for them, increasing the chances of a smooth operation.

Photo by tian kuan on Unsplash

9. Stay True to your Product

As Data Scientists, we tend to always want to keep up with the latest technical developments. To a certain extent, this is indeed necessary and useful. However, it must not lead to a situation where an idea, once implemented, no longer receives attention. On the contrary. Nothing is worse than an unreliable system in production that is associated with your name or your team.

Especially the productive phase should receive your special care.

Products in this phase are your flagships and your cash cows that, when running smoothly, give you the freedom and legitimacy to research and develop new things.

In practice, this can mean taking time for small optimizations, making the infrastructure more reliable, cleaning up technical legacy or refactoring code to increase traceability and maintainability. As a Data Scientist, the principle of reinforcement learning should come to mind. Find a good mix between exploration and exploitation to maximize your overall long-term impact.

Although described here in a rather short paragraph, this approach is crucial to ensure a truly sustainable relevance of your work.

Photo by DLKR Life on Unsplash

10. You

Ok, Data Science product lifecycle complete, 9 strategies… Come on let’s fill up the 10 to make it sound better. No, I would have left it at 9 or made it 11 if it had made sense, but I would like to explicitly emphasize one essential factor once again. You. The person behind it all.

The pursuit of impact is a personal trait and I have identified this as my driver for intrinsic motivation. However, it should be clear that impact must not become an end in itself for the ego. Of course, we all have this little voice inside of us that wants to be important and paid attention to. But keep the bigger purpose in mind. What are you doing these things for? I’m not saying that your ego should be completely out of the way. It’s probably a very powerful driver for most of us, if we’re honest with ourselves. And if the result is a passionate Data Scientist who comes up with great work, I have no problem with that. But ego should not be disconnected from the larger goal.

Ask yourself if your ego is still a driver to do the right thing or if it has become an end in itself.

So be aware of your drivers stay humble and use your motivation for a meaningful purpose!

Photo by Iulia Mihailov on Unsplash

Conclusion

These are my 10 strategies for making a bigger impact as a Data Scientist. Of course, this is always a snapshot and I look forward to gathering more experience in the future to add to or revise this list.

What are your suggestions, opinions and experiences on this topic? Let me know what you think about it, or how you managed to increase your impact as a Data Scientist or in any other position. I would be happy to learn from your experiences.

--

--

Software / Data Engineer, Lecturer. Writing about MLOps and real-world data science solutions.