Top 5 Mistakes Companies Make with Data Science

What to avoid in your journey towards data-driven decision-making

Tom Martin
Towards Data Science

--

Becoming a data-driven company is one of the hardest things to strive towards. This is far from an exhaustive list, but are some of the main issues I see companies experience during their data science journey,

  • Not Having Defined Metrics
  • Making the Wrong Hires
  • Being Buzzword Focussed
  • Not Addressing Data Quality Issues
  • Misapplication of Agile Management

Put simply, the problems tend to stem from not addressing more fundamental issues within the business, which may only become apparent once data, and related concerns, take centre stage.

Photo by Franki Chamaki on Unsplash

Not Having Defined Metrics

To correctly act on collected data it is required to know what set of actions it is informing and how to interpret their results. Metrics are a means of contextualising data in such a way. Without metrics in place, it can become anyone’s guess as to what inputs inform what outputs, meaning that a company doesn’t really understand the value of their data. In many cases, this can lead to a situation in which each new analytical question leads to a deep dive into each and every available data point, which is clearly not sustainable. With a new data science hire, who will typically not have specific domain knowledge to be able to contextualise the data independently, these issues will be amplified that much more. Defined metrics should act as the foundation upon which a data-driven organisation is created, allowing transparent and available reporting of relevant company data.

Making the Wrong Hires

Very often, a company will take on a data scientist as their first dedicated data hire. It would be hard to speculate why this is in each specific case, but I imagine this is done primarily due to a lack of understanding of what’s really required in the early stages of a data-driven transformation and what skills make this possible. What is required is someone with contextual business experience to link business decisions to data, meaning that someone with business analyst experience is almost always a better fit for most companies. Business analysts will also likely have more relevant experience using software like Microsoft Excel to do the bulk of their work and share their outputs, which is probably more widely available and understood than say a script or notebook. This latter point is particularly important as success as a data-driven company requires that the utilisation of any such findings is as frictionless as possible.

A related issue is only looking for unicorns: those special people that can single-handedly deliver a data science project from idea to release, being some kind of hybrid business analyst, data scientist, and data engineer. Simply put, these people do not exist and indicates a company’s failure to understand and prioritise their requirements accordingly.

Being Buzzword Focussed

It can be tempting for companies looking to become data-driven to obsess over the buzzwords: AI, deep learning, NLP etc. By just focussing on what makes the headlines, companies all too often ignore the important groundwork required to actually make these things a reality. This will usually manifest itself in the prioritisation of complex and ambitious projects, which even if successfully delivered will not easily find their place within the existing business. Instead, companies should look closely at their existing business decisions and see how analytics can aid these in the short-term. This should lead to more reasonable expectations around projects that can deliver real value and avoid data science disillusionment. This also requires that data science hires are integrated more fully with the existing business processes, to ensure that their analysis is addressing a current need.

Not Addressing Data Quality Issues

Your analytics will only be as good as your data, or to put it more succinctly “garbage in, garbage out”. This is probably not too hard a point to argue in the case of malformed data entry e.g. where an input field is not correctly validated, but data quality issues can manifest themselves in much more subtle ways. For instance, there may be numerous near-duplicate fields introduced to solve separate but related requirements for different business stakeholders, which are not always well communicated. While innocent enough on the face of it, this will introduce unnecessary complexity for any subsequent analysis if not entirely invalidating work done without clarification. Relatedly, if these issues are not addressed it can lead to making automation and reporting that much more difficult as manual workarounds may be the only possible fix.

Misapplication of Agile Management

Data science by its very nature is exploratory and open-ended, meaning progress and expectations can vary day-to-day, which may be quite different to your experience of software engineering. This may lead to situations where it becomes hard to define specific tasks and deliverables that fit a typical Scrum approach, especially so when it’s the first time any such data-related work is being performed in the company. This may also mean that following the typical format of daily standups, retrospectives, and sprint planning can become unnecessary if not somewhat overbearing — although company culture as a whole has a large part to play. Unfortunately there’s no one right answer to give in this case, however in general, it may prove more useful to follow a more loosely applied Scrum approach, or something closer to Kanban. In many cases, it should be that the exploration itself should be scoped as the deliverable rather than anything more concrete. Reading around the CRISP-DM model, which does well to characterise the cyclical nature of data mining work, will be useful here.

For more data science articles, make sure to check out my blog.

--

--

Data scientist and machine learning engineer, fascinated by DS/ML business applications. Check out my newsletter: https://tpgmartin.substack.com/