The world’s leading publication for data science, AI, and ML professionals.

The gap between Data science and the organization

Why do data science projects fail more often than succeed?

Source: Unsplash
Source: Unsplash

The term ‘Data scientist’ was nonexistent when I started my journey in Data analytics space but now it is so called the ‘sexiest job after the decade’: probably after space crews in SpaceX and Virgin Galactic! Data has always fascinated me and I am sure it will continue to do so for many years to come. Throughout this journey I have seen many projects flying off as well as falling apart at various stages. VentureBeat’s quote of 2019, still stays true: ‘87% of data science projects never make it to production’, and there are several reasons which need serious intervention and fixes, to improve this number.

In the latest predictions of Gartner, there are several points which are worth taking a note on:

  1. Through 2025, 80% of organizations seeking to scale digital Business will fail because they do not take a modern approach to data and analytics governance(Gartner, 2021)
  2. Through 2022, only 20% of analytic insights will deliver business outcomes (Gartner, 2019)
  3. By 2025, 80% of data and analytics governance initiatives focused on business outcomes, rather than data standards, will be considered essential business capabilities(Gartner, 2021)
  4. By 2024, the degree of manual effort required for the contract review process will be halved in enterprises that adopt advanced contract analytics solutions(Gartner, 2021)
  5. By 2023, 50% of chief digital officers in enterprises without a chief data officer (CDO) will need to become the de facto CDO to succeed(Gartner, 2021)

Take a look at the difference between points 2 and 3, the humongous gap in impact of Data analytics to the business outcome says, why organizations should fix their Data Analytics capabilities as soon as possible. This led me to drill down the top 5 reasons why data analytics projects fail and how to bridge this gap. It will be a revelation for you and your organization to look into these pull-backs and reassess.

1. Looking at it as a project instead of a product

Source: Unsplash
Source: Unsplash

Most of the organizations consider Data Science initiatives as projects and never as a product. Such scenarios make space for standalone projects. Without appropriate product landscape, the project gets pushed into a very high risk failure area. These projects loose its importance very quickly soon after its deployment due to causes like:

  1. Stakeholders losing interest or end-users not adopting your result
  2. No feedback loop and no updates to the existing solution
  3. Imbalance in development team, leading to no future maintenance and support
  4. No value to the current business challenges

When the organization thinks Data Science as a product then it integrates deep into strategies, value driven initiatives and presence in all business units. Such organizations will have a deep rooted culture of data driven decision making.

2. Lost between C and B-levels

Source: Unsplash
Source: Unsplash

Data analytics and the business outcomes can’t be effective without organizational support, both from the top and mid-level executives. Everyone should go through the maturity curve of data analytics from Descriptive to Prescriptive analytics (in other words: from Dashboard to the predictive outcomes like what is the probability that my customer stops using my product) to appreciate as well as understand the value generated by Data Analytics. When the learning curve takes longer or breaks, that’s where the value chain is lost and projects fail to demonstrate meaning to the organization.

Effective communication with non-technical stakeholders needed at all stages of the product development to have sustainable development. This needs the art of ‘Data storytelling’ to give a powerful narrative to the approach and the value it generates by these decisions.

3. Availability / Understanding of the data

Source: Unsplash
Source: Unsplash

This is one of the most common problems for any analytics team and product out there. It always needs a pair of magical eyes of a data scientist to see the data in the right way to include it to the data pipeline. Data can be a challenging task to organize, collect, create, or purchase. Even after you overcome and make it accessible you will hit the next set of roadblocks:

  1. Is there a bias in the data?
  2. Is it ethical and legal to use the data for the intended purpose?
  3. What is the right way to slice and dice the data to get descriptive results?
  4. Is there an impact of time and other factors?
  5. How to clean the data for the purpose?
  6. When it comes to ML how to label the data adequately to minimize errors and bias?
  7. The biggest threat of all: How do I secure the data and manage it well? As per Gartner by 2024, most organizations will attempt trust-based data sharing programs, but only 15% will succeed and outperform their peers on most business metrics.

4. Product without Value Driver

Source: Unsplash
Source: Unsplash

During the inception of any new initiative, it’s always important to scrutinize the approach and the business outcome as much as possible to arrive at the right plan. Many machine learning models get deployed and remain as a number in the database, then a valuable insight to the business. After many iterations of feedback and fine-tuning the results, when the business realizes that they can’t use these results anymore either because of a new strategy or new tool or any other reason, the prediction will fade away without any one using or updating it.

When the product drives great value and is well connected to the downstream system then there is always feedback coming from the downstream system to perfect the outcome. Skills like social listening wherever applicable will make a huge difference to the product by being relevant to the business practices. Golden word: Depleting value is an indication of diminishing life of the product.

5. The Data scientist

Source: Unsplash
Source: Unsplash

Sometimes a wrong data scientist himself/herself can be a challenge to the project and there are numerous reasons attached to this problem.

1. Lesser understanding of business knowledge

First and foremost step for any data scientist is understanding the background of the problem and how the solution would help. Without this knowledge the solution gets into a puddle and gets messier as it progresses.

2. Capability of data storytelling with and without visualization

Follow the magical rule of Speak, Explain and Draw when talking to the stakeholders. More the interaction with the stakeholders higher the confidence on the results and better the impact. A void gets created when the chain breaks, leading to low confidence on the solution.

3. ML problem difficult and approach adapted

Too complex approach adapted for a simple problem or boiling down the problem into an over simplified statement and trying to solve it. Both will be vicious and need to be addressed with right weightage in alignment with the stakeholders.

4. Not getting the right talent

Experience, domain understanding, data and tools comfortability matters the most to solve a complex business problem.

Closure:

Product life cycles come with challenges at various stages in diverse forms, as Albert Einstein said: "Try not to become a man of success. Rather become a man of value", it well applies to the products we are building as well. When there is a value in what we do, success gets attached inevitably.

I will leave you with few more predictions of Gartner to think:

  1. Through 2025, 80% of organizations seeking to scale digital business will fail because they do not take a modern approach to data and analytics governance
  2. By 2024, 30% of organizations will invest in data and analytics governance platforms, thus increasing the business impact of trusted insights and new efficiencies.
  3. By 2024, most organizations will attempt trust-based data sharing programs, but only 15% will succeed and outperform their peers on most business metrics.
  4. By 2024, 60% of the data used for the development of AI and analytics solutions will be synthetically generated.

Related Articles