The call for a Data Science Readiness Level

Scott Zelenka
Towards Data Science
7 min readSep 6, 2018

--

NASA Technology Readiness Levels

In the 1970s, NASA developed the Technical Readiness Level (TRL) scale to measure research and development of cutting edge technology. Their purpose is to estimate the maturity of a technology during the acquisition process and are scaled from 1 to 9 with 9 being the most mature. TRLs enable consistent, uniform discussions of technical maturity across different types of technologies.

This concept is well known to researchers seeking grants from many government agencies, but seems to have lost favor in other engineering applications. With the growing cutting edge discoveries in Artificial Intelligence, Machine Learning, and Data Science this blog will explore use of this scale to measure progress and guide success of data science projects by linking them to a value on the TRL scale.

The application of this scale to site reliability engineering, and data science, was first proposed by Emily Gorcenski’s fascinating talk on Going Full Stack with Data Science: Using Technical Readiness. If you have 30 minutes, her talk is definitely worth your time!

She does a great job of dispelling the mythos of “full-stack” or “unicorn” data scientist, and how you actually need a “full-stack” team with specialized roles mapped to each phase of the data consumption cycle and TRLs. She borrows disciplines from site reliability engineering and incorporates TRLs to build a solid plan for scoping and executing data science work.

Some key points worth reflecting on:

Data Science Readiness Levels mapped

Discovery vs. Delivery

At a macro level, there are really only two types of data science projects. Discovery/Research vs. Delivery.

In a Discovery project, you’re working in the TRL range of 1 to 4, and focused on questions such as “Can we solve the problem as stated?”. The entire project is exploring new algorithms, datasets, technologies, etc. to see if the stated problem is solvable. These types of projects are typically isolated to a data driven team and may not have a committed project sponsor to deliver the outcome of the project to extract real world value from it.

While in a Delivery project, you’re working in the TRL range of 5 to 9. These projects must have a dedicated project sponsor with the authority to implement the outcome of the project into an existing/new workflow/process to “turn data into money”. Once we have the authority to implement an outcome, it’s up to the data science team to make the use of the outcome seamless and capture a feedback loop to build a continually learning model.

If you’ve ever dabbled in a data science project, you’ll recognize the stark contrast between these two types of projects. Yet most analytical workflows treat them the same! The use of a TRL scale attached to a project can help set proper expectations to everyone involved in the project.

Outcomes are more granular than “turn data into money”

While the ultimate goal always leads back to the dollar, those should only be the focus on projects which are in “Delivery” phase (i.e. TRL 5 to 9). When starting an “Discovery” project (i.e. TRL 1 to 4) you should have a vision for how the outcome could be consumed by an identified sponsor, but it is not the primary focus of the project at this phase. In some cases you may not even have a committed sponsor bought into the project yet! Asking a data scientists team to quantify the ROI for projects in those early stage TRLs is inappropriate and futile.

Even when a project gets to the “Delivery” phase, the data science team can assist the project sponsor in quantifying the value of the outcome and how that can be translated into business value and/or monetization. But the primary driver for this still must come from the project sponsor. The best examples of “turn data into money” come from embedded data science teams, since they’re both subject matter experts and delivering the outcome.

For each work cycle, know where your Readiness Level starts, and set realistic targets for where it ends

In most data science teams I’ve been privileged to participate on, they like to champion the Agile Methodology and use short duration Sprints for project execution. However, what was missing from the sprint planning was clear communication on where we are, and where we’ll be at the end of the sprint.

I’ve seen DMAIC, CRISP-DM, CAP, and many other methodologies attempted, but none really captured the nature of the sprint as succinctly as the TRL scale. Rather than elaborating on the nature of the cyclic nature of which phase you’re project is currently in, and how it can revert to a previous phase at any time; it’s much easier to communicate to a sponsor that we’re currently at TRL 1, and expect to be at TRL 3 by the end of this sprint.

Having the TRL scale also clearly sets expectations to all parties with an interest in the outcome of the project. Along with a clear path for how they can move the project up the stack.

While any data science team can perform discovery and exploratory analysis up through TRL 4, it requires more investment and hard commitments from the sponsors beyond that level. Data science alone can only solve half the job of monetizing your data. The visualization of the TRL scale does an excellent job of communicating the collaborative nature of any data science project.

Diving deeper, TRL 5–7 is where we start to integrate the outcome of the data science project into an identified workflow to augment the decision makers’ process. This could be entirely automated, or you could have the outcome aiding a human worker. The workflow and process are items which need to be identified by the sponsors, along with commitment from those consuming the outcome that they’re trained and can interpret the outcome correctly.

Without these commitments from the sponsors and consumers of the outcome, it’s impossible to move beyond TRL 4, and actualize monetary value from the data. By linking these expectations to the TRL scale, it becomes a powerful tool when collaborating with your sponsors.

Different Readiness Levels require different skills

Machine Learning Engineer

Each TRL requires a different skillset to accomplish the task at hand. Make sure you have the correct cohort to solve it. Engage sponsors and stakeholders who will be needed at your target TRL at the start of your work cycle.

This is where the team aspect of a data science team really comes into play. Great leaders will leverage individuals’ strengths at the appropriate levels, and use the TRL scale as a roadmap for personal/professional development for the areas you’re weaker in. While there’s no such thing as a “full-stack” data scientist, you can still train to become one. The more you know from other TRL, the more value you can add to the project!

The process is not monotonic

Similar to other project methodologies, we can move up and down the TRL scale as the project progresses. Just because we make it to TRL 5 in one sprint, doesn’t imply we cannot slip back down to TRL 3 in the next sprint. Because most machine learning algorithms and cloud technologies are constantly evolving, sometimes we need to reevaluate our previous expectations and adjust accordingly.

Failure is normal! Some work efforts have a maximum Readiness Level

Failure is normal. In fact, I would take this idea further to state that “failure is expected.” I’ve grown cynical in my career, to the point where I typically approach every project with the mindset that it’s simply not ready for data science. Sometimes the project is simply an idea that doesn’t have any supporting data available. Worse, it could be merely an attempt to add “machine learning” to a dashboard. Along with educating the sponsors of what’s possible, we can also educate them on how to get to a deployed solution for actualizing value from their data with the TRL scale.

With the TRL scale, it now becomes a matter of illustrating where on the scale the project lies. This barometer clearly communicates how far the project can get with the current datasets, technology, workflow, process, etc. If the project gets stuck at a certain level, and if a sponsor is invested enough in the project, they’ll discover ways to remove the roadblocks to get the team what they need. Otherwise, the team can cleanly disengage and move on to the next project. Not every project will reach full maturity and deployment.

Next Steps

The use of TRL scale solves a lot of challenges with working in the data science project space. If you’re reading this, I encourage you to place the scale in your next project readout to see what type of discussion it starts!

--

--