The world’s leading publication for data science, AI, and ML professionals.

Data Science Succes for Start-ups

Once a company reaches a certain level of maturity, it is very often that they will seek that next bit of growth by way of data science…

Why success starts with business decisions not data

Once a company reaches a certain level of maturity, it is very often that they will seek that next bit of growth by way of Data science, becoming what is frequently termed a data-driven organisation. In the context of an established, yet smallish, company or mature start-up, the decision to make the commitment to become a data-driven organisation is typically preceded with the development of a service or platform that generates data, which has typically been in place for some years meaning there should be a sizeable dataset as well. This may also mean such a company will have in place existing data practices such as some dashboarding, and some level of data infrastructure. The company’s historic data usually becomes the jumping-off point for data-driven initiatives.

Photo by Lukas Blazek on Unsplash
Photo by Lukas Blazek on Unsplash

However, such a transition is not without its difficulties and, in recent years, there has been a growing sense of disillusionment with such initiatives, as true data-driven value for many companies has proven to be more challenging than anticipated if not entirely elusive. This in turn has led to a shift in emphasis from Data Science as the immediate focus to data engineering instead i.e. build up the infrastructure first. This comes from the understanding that data science doesn’t work in a vacuum, not least because data science in particular is highly dependent on an established infrastructure. This is due in part to the numerous complexities in creating, deploying, and maintaining a trained model in production.

I’m happy to see such a reassessment, as it represents meaningful maturation within the industry and a growing understanding of what real data-driven value looks like. However, I would push the emphasis of a company, especially one making it’s first steps in data-driven initiatives, back one stage further: Initial focus should really be in fully understanding existing business decisions in data science terms. That is before thinking about the analysis you’d want to make, before thinking about the infrastructure, before even thinking about the data you have, you need to think whether the existing business decisions can fit within the context of a data science project.

Before we go in further, it’s worth stating what I mean by a "business decision". For our purposes, this will be understood as any specific action or process taken to deliver business value e.g. revenue. Examples include, how to process and respond to customer calls, or how to choose between suppliers.

Data Alone is Not Enough

It may not be immediately clear why simply having a large body of data is often not enough to lead to the successful delivery of a meaningful data science project. This is for two main reasons, one of context and one of understanding.

Firstly, up to this point in the company’s life, it’s likely no one within the company has had the responsibility to really think about the data collected in any terms other than simple performance metrics. There may be a general awareness that daily active users is a KPI worth tracking for example, but not within a systematic framework that tells us what it means to the company, whether it’s good or bad, or what metrics it relates to. In other words, the data is not put in context.

Secondly, and relatedly, data has likely not been collected in aid of answering a specific analytical question, so the precise hypothesis asked of this data will remain unclear. In particular, if the data itself is not related to an ongoing activity, then any insights gained from historical data will be largely moot as no future actions can be taken in response. The data doesn’t inform any specific understanding of the domain.

Photo by Mika Baumeister on Unsplash
Photo by Mika Baumeister on Unsplash

Laying the Foundations for Success

A successful data science project must relate to a business decision with an existing closed feedback loop. That is, any candidate business decision must,

  • Be repeatable and so testable,
  • Generate feedback, so that we know what success looks like and can validate business value,
  • Lead to actions that can be implemented in a timely manner, so we know how to act on new knowledge.

The business decisions in this case are the actions taken as part of the feedback loop, which are precisely what we want to adjust as a consequence of whatever the output is of the data science project.

To elaborate, by focussing on existing business decisions, we can also address two other related issues. Firstly, given that the project relates to an existing decision there should be some technical understanding of how to deliver the decision, collect relevant information, and how to determine success or failure ideally in quantifiable terms. This latter point is generally taken to mean that there exists a well defined metric available, supported by data, and made accessible. If this is not the case, then key stakeholders should be approached to set out a defined metric. Secondly, a key step in any data science project is collecting feedback to determine the impact of any innovations introduced, meaning it should be possible to repeat any such process possibly quite frequently.

Imagine instead scoping a project for an entirely new product or business function, we will quickly find ourselves struggling to define what success looks like as there is no historic data or process to work with.

This last paragraph also hints at why historic data, though not sufficient for a data project, is still valuable. We can still use the data to sense check any initial assumptions we might have about the problem area or to validate what degree of improvement is realistic or achievable. For instance, if we were planning on automating elements of a customer service system we might be able to see what the maximum possible length of time a customer is willing to wait for a response and work to ensure we get under that value.

With both defined metrics in place and data sources understood for a set of business decisions, project prioritisation can commence, with a shared understanding of the importance, risks, and possible benefits of a project proposal. If nothing else, the process of identifying suitable candidates will help prevent committing valuable resources to the wrong areas of the business.

So to wrap up: Success starts by thinking business decisions first, data second.


For more data science articles, make sure to check out my blog.


Related Articles