How to ensure your data science projects are successful every time

Chris Parmer
Towards Data Science
4 min readFeb 3, 2020

--

In a 2017 survey, Gartner analysts found that more than half of data science projects never deploy. This might lead some to believe there are flaws within the data, analytics tools or underlying ML models, but that’s not the case. At Plotly, we know from experience that failures to launch typically stem from an inability to bridge model outputs with real business or organizational next steps.

As the statistic suggests, this is a major sticking point across our industry. To help navigate this roadblock, we’ve pulled together our learnings from work we’ve done with our clients over the years and what we’ve heard from the Plotly community. It consistently comes back to three main points — if you get these right, you’re on your way to ensuring your projects will have real business impact each time they are developed.

First, ask the right questions. It’s a typical story — a colleague from another department comes by with a dataset and they ask something like, “which chart type should I use?” That’s not the right approach. Instead of churning out some potentially unhelpful model or chart, ask your colleague what problems they are trying to solve and what the ideal outcomes and goals are. Understanding what questions you want to ask the data is key. And, the only path to a successful project is to clearly identify the knowledge the group is trying to attain and reverse engineer a solution from there.

Most teams have turned to Python or R as their language of choice for AI and data science initiatives, using tools for data management and machine learning (ML) model building. This is likely the most familiar ground, and most teams have their tools of choice for this step. It’s also the first step in going from model to analytic web app.

Hmm, what’s an analytic web app? Think of an analytic web application as a front end to your models and data. It’s decidedly different than a BI tool or dashboard in that it enables end users to directly interact with models and data without needing to understand code. Imagine handing your colleague a model they can explore through interactive graphs and UI controls to perform their own downstream analysis.

With an analytic app as the end goal, the second step is to build the right UI. Once you have a solid grasp on why a particular set of data matters and what business and organization partners hope to do with the outcomes, you can focus on developing a model UI that offers the most accurate and explorable view of your analysis. It’s much less about presenting results and more about developing and customizing a way to understand an AI model or data set. Give your audience an opportunity to engage with the data as their needs evolve. There are numerous tools out there to help visualize data in ways that are beautiful and easy for anyone to understand and interact with. Find one that works for you and your organization, but remember, you’re going for interactive, so think beyond static graphs so that others can explore and experiment.

Third, define the structure for operationalizing your application. More often than not, models, data, and even interactive apps tend to get stuck in a notebook on someone’s machine. This is the major roadblock for sharing on a large scale. The solution (and challenge) is to deploy a standalone app that anyone can access–in other words, free the data. Effective solutions account for provisioning, security, visual design/branding, and maintenance, allowing users to truly operationalize key projects by merging data science with real business results. There are many approaches to this, ranging from full stack development teams, which can be time consuming and expensive, to tools that help you do this on your own. Again, you’ll need to find a deployment process that works for you. Once you do, you’ll have successfully developed an end-to-end process.

Now is an amazing time to be a data scientist. Society is practically drowning in data and the vasts amounts of data to be collected, collated, analyzed, and operationalized only grows exponentially each day. That last piece–operationalizing data and models–is the most important. In our work, we have to ensure that we are not only surfacing the insight, but that we are doing so in ways that are relevant to business and organization success, easy to understand, and simple to interact with.

--

--

Cofounder & Chief Product Officer at @plotlygraphs. Most recently author of Dash.