The world’s leading publication for data science, AI, and ML professionals.

Keys to Success when Adopting a Pre-Existing Data Science Project

The code may not have been yours originally, but it is yours now. So what next?

Software Development

Photo from Hitesh Choudhary on Pexels
Photo from Hitesh Choudhary on Pexels

I recently adopted an extensive collection of notebooks that combined aid in the creation of analytics. It sounded like a daunting project to take on, but the more I read into the code, the more I realized it wasn’t all that bad. The notebooks looked overwhelming, but the code was relatively simple when broken down into smaller, more manageable chunks.

Adopting the Code

As I adopted this set of notebooks, the first thing I did was to read it. I spent half a week looking through every notebook, reading every line of code at least three times, and breaking down the flow of how one notebook transitioned into another. I wanted to make sure I understood the inputs needed to create the code, the expected outputs generated from the code, and the codebase’s overall architecture.

As most of the original developers were gone, I leaned on the one person I could to ask as many questions as possible. These questions and answers broke down any misunderstandings or confusion I was having and allowed me to better architect the code’s future state in my head. I began to see the broader picture of the code and how it could be utilized moving forward.

As you adopt someone else’s project, it is good to keep in mind that they may have only had one use-case in mind when they first developed the code. Knowing what this initial use-case can help you understand how to translate the code into a more generalized solution allows for code reuse and expansion. You may not be a software developer, but you can start thinking like one. Sit back and consider the areas that may appear multiple times in the code. Can you create a function or class from that? As you work through the design of the code, document everything. It may seem tedious at times, but documentation can be the key to onboarding new teammates and making sure they can quickly pick up any code and use it.


Determining the Value

As I began to clean up and repurpose the code I adopted, it became clear that there were some known use-cases for the work, but the code’s migration from notebooks to a Python library may result in the tool not being used. People not using the code is a risk you need to consider. If the project takes one week to read, clean up, and migrate to its new location, can your team afford that migration if the project becomes a dead end?

After I migrated the code for the tool, it became clear after a few uses that it may sit dormant in the library with no one using it. A few times, the tool was used to drive insights on other work but had lost its glamor quickly. Months later, as I began working on other projects, I was sitting in on a sprint review when someone brought up a demo of their work. As they began to present their analysis, I realized they had used the tool! After all this time dormant, the tool became relevant again for a use-case that was not expected. The level of effort became worth it, and now new use-cases are coming out; the code is updated as people are using it more, and it provides value.

When you adopt someone else’s project, it is essential to understand the business justification for cleaning, improving, and expanding the work. You need to know how the team or customers will use this code and if the job you are doing will serve a purpose. We can often get caught up in the cool idea that we miss the business justification of why we are working on the project. If no customer is asking for the work, no reason to expand on or rerun the analyses, then what are you doing? Sit back and make sure you see a clear path forward into the migration of code. Understand how this work will be used and provide value to the customer. Developing skills in this area to understand business justification and value-add can be beneficial as you continue to build Data science projects and present your work to others.


Performing the Migration

Know you have read through the code and determined if the level of effort to migrate and clean the code will be worth it. Assuming the answer to that question is yes, how do you start?

The first place I like to start is to develop a strategy sooner, rather than later, in how I will migrate the code to its new home. If you are moving the code from one notebook to another, the cleanup work may be different from integrating the code into a software library. As you migrate your code, think about how you may extend the work in the future and areas that need to be more open for that extensive work. It may help as you decide to create classes or functions.

As I am migrating code, I also like to focus on testing. You want to make sure you are getting the same output as when you initially adopted the code. Getting the same result may not be true if you are updating the code simultaneously; therefore, the functionality may be similar, but the output may vary. I like to validate my code output as I migrate the code several times to make sure it looks the same before adding changes that will alter the outcome. It can also help develop unit tests to validate the code is operating as expected or have others on the team test out your code. Their feedback will be vital in understanding any changes needed as you continue with your migration.

This past year I have adopted two projects, both housed in their own set of notebooks, and migrated them into their libraries. As I worked with other data scientists and developers, it became clear that the functionality could be combined into one software library that allows the two tools to share similar functions or classes. As I migrated code, that feedback was valuable in understanding how I should progress toward my work. These conversations and others will help you determine the next steps for the work you are doing with areas for improvement or expansion of the code.


Key Points to Remember

Adopting someone else’s Data Science project may seem cumbersome or stressful at first, but it can come with many perks. As you adopt this project, you can help improve your skills in reviewing the code and understanding the functionality, grasping the business justification for the code migration and improvements, and continued learning on coding practices as you clean up the work you have acquired.

Adopt the Code

  • Read, Read, Read. You want to understand the code you are dealing with.
  • Understand the initial use-case(s) and future expectations.
  • Design for code reuse and expandability.
  • Document everything. The original comments can be misleading; update, and document all the changes.

Determining the Value

  • If there is no value in adopting and cleaning up the code, why are you doing it?
  • Understand the value-add the project will have to the team.
  • What is your business justification?

Performing the Migration

  • Develop your strategy sooner rather than later for migrating code.
  • Determine how the code will be architected.
  • Test, Test, Test, and even more Testing! You want to make sure your code is working as expected.
  • Develop your unit tests to validate the code is operating as expected.
  • Determine the next steps. Is this work done, or did you identify possible areas for improvement or expansion of the code?

If you would like to read more, check out some of my other articles below!

Stop Wasting Your Time and Consult a Subject Matter Expert

Top 3 Books for Every Data Science Engineer

Do We Need Object Oriented Programming in Data Science?

Thoughts on Finding a Data Science Project


Related Articles