Essential Questions to Ask Before Starting a Data Science Project

How to make sure your project gets off to a good start

Shahwaiz
Towards Data Science

--

Before the start of every project, it is important to ask questions to help you understand what you will be working on for the next few weeks or even months. Questions like what are we trying to accomplish, why are we trying to accomplish, and how is it going to benefit the end-user are really important to ask at the start of the project as they are essential to driving successful outcomes and bringing clarity to the problem you are trying to solve.

Here’s a list of questions that you should ask before the start of your data science project:

  1. Who is the client, what business domain is the client in?

Understanding what business domain the customer is in, how they operate, what matters to them, which key variables are used to define success in that space will allow you to build a solution that directly impacts what’s important to the client.

2. What business problem are we trying to address?

The book Fundamentals of Machine Learning For Predictive Data Analytics describes this perfectly:

Organizations don’t exist to do predictive data analytics. Organizations exist to do things like make more money, gain new customers, sell more products, or reduce losses from fraud. Unfortunately, the predictive analytics models that we can build do not do any of these things. The models that analytics practitioners build simply make predictions based on patterns extracted from historical datasets. These predictions do not solve business problems; rather, they provide insights that help the organization make better decisions to solve their business problems.

A key step, then, in any data analytics project is to understand the business problem that the organization wants to solve and, based on this, to determine the kind of insight that a predictive analytics model can provide to help the organization address this problem. This defines the analytics solution that the analytics practitioner will set out to build using machine learning [1].

If your company’s goal is to reduce customer churn rate, one possible solution could be to build a prediction model that would identify which customers are most likely to churn in the near future.

3. How is it going to be consumed by the customer?

Understanding how your customer will use the output of your model will allow you to create your work targeted to them. For example, are you building models that serve internal users and influence company strategy, or are you building models that are customer-facing.

4. What is the economic impact of this project?

Putting a dollar amount to a project is one of the hardest things to do. But knowing how your data product will drive revenue or reduce cost for the customer allows you to get leadership on-board and support you throughout the project.

5. What type of decisions will our data science feature drive?

What is the model going to empower them to do that they cannot do previously.

6. What metric will we use to call this project a success and how will we measure it?

Having a specific target in mind will make sure that your project has an end result, and you don’t work on it indefinitely. Quantify what improvement in the values of the metrics are useful for the customer scenario (e.g. reduce labor costs by 20%). The metric must be SMART (Specific, Measurable, Achievable, Relevant, and Time-bound). For example: achieve customer churn prediction accuracy of 20% by the end of this 3-month project so that we can offer promotions to reduce churn [2].

Imagine a dialogue taking place between a data scientist (DS) and a product manager (PM) on introducing a new ML feature in the app built for providing better visibility to warehouse operations. Suppose the product manager knows the warehouse space well and already has a feature in mind.

DS: I believe customer ABC is facing some problem. Would you help me understand what the problem is?

PM: Sure. ABC is constantly struggling to hit their daily order goal.

DS: What’s a daily order goal?

PM: Warehouses usually set an order goal at the start of the day that they try to ship before end of their day. For example, at the start of the day, an operator in the warehouse will set some order goal, say 45000 orders, that they need to get out the door and ship before the end of day.

DS: Got it! And why is hitting this daily goal important to them?

PM: Good question. Not hitting their order goal for the day means not delivering to their customers on time, which can lead to additional support cost, reputational damage and churn for our client. And to make lives easier for the client, I propose releasing an ML feature in the app that helps our client get a better sense of if they are on track to hit their order goal today based on their current performance.

DS: I see. And why do you think this feature is useful to them? What types of decisions will this drive?

PM: Good question. One of the most important use case is that it will allow operators in the warehouse to allocate labor accordingly early on. For example, if our predicted order shipped for the day is below their daily goal, they can increase the number of workers to move things quicker. So it helps them to conduct day-to-day operations more efficiently.

DS: How is it going to be consumed by the client?

Image by Author

PM: Let me share my screen for a second and show you. The users will be able to see this feature in our app. This is how I envision it: The blue solid line shows the order they have shipped up till now. The green dotted line is the prediction generated from our model. And the red solid line is their goal for today.

DS: Ah, that’s a good visual — makes things very clear for me. So this is going to be a real-time feature where we update the forecasts generated for the day every hour?

PM: Yep. That’s correct.

DS: Another question: what do they currently use and what is the baseline (current) value of that metric?

PM: They currently do not use anything, and that’s why this feature will bring a lot of clarity to their operations.

DS: What’s the economic impact of this project? And what’s the success criteria?

PM: Great question. Well, if our forecasts have a mean absolute error of less than 30% by the end of this 2-month period, we can call the first iteration of this project done. As for the economic impact, my rough estimation is that this feature will also allow them to optimize their resource planning and allocation decisions, which will help them decrease labor dependence and reduce cost by 30%. I’ll have to do some more investigation and crunch some numbers to get the exact dollar amount.

DS: Ah, seems like this feature will increase efficiencies in a lot of department for our customer. Let me examine the data, and then take all this information and create a rough plan on how I’ll go about doing this project, and share with you and the team to get feedback.

PM: Awesome! Thanks.

In his book Anticipate Failure, Lak Ananth stated that “Every business starts with this component of what is the problem, what is the solution, and why is it a compelling business”. Similarly, a data science project must start with a hypothesis of what customer problem we are trying to solve, why we are trying to solve it, and what will be the impact of it.

References

[1] Kelleher, J. D., Namee, M. B., & D’Arcy, A. Fundamentals of machine learning for Predictive Data Analytics: Algorithms, worked examples, and case studies (2015). The MIT Press.

[2] Microsoft, Azure-TDSP-ProjectTemplate, (2021)

--

--