The world’s leading publication for data science, AI, and ML professionals.

12 Reason Your Code Built-in Isolation is not Production-Ready

Lessons learned from developing a repo in isolation and introducing new developers to the project

Opinion

You may often find yourself with what you feel is a brilliant idea, and you decide to spend some time creating a new repository or set of notebooks. You code for a few hours or a weekend, and viola! You have a new tool or application that you want to share out with others. The problem is: code built-in isolation by a developer or data scientist is not production-ready code.


I have worked on several projects where I started as the lone developer for the code. When developing in such a manner, I take time to gather requirements, draft the code base’s architecture, begin a minimal viable product (MVP) and get feedback. Once I start getting feedback flowing in, I can continue development, update where needed, and keep going. After finishing my MVP, I begin to introduce new developers to the codebase. As others become involved, new developments come from the work, such as use-cases that were overlooked, new ideas formed, and updated code based on stakeholder input. Code created in isolation has never stayed in isolation and only expanded in a reasonable and valuable way.


Lack of Stakeholder Buy-in and Feedback

As I mentioned, I have started many Data Science projects in isolation, but they never stay that way. The main two reasons these projects get expanded beyond one developer is (1) stakeholder buy-in in which they believe the MVP should expand, and (2) other developers and data scientists gained interest who are now providing feedback.

1. You need buy-in from the stakeholders that this is the correct direction to go in. If you are not sure about the roadmap of the team, ask about what this roadmap entails. Understand how your work fits into the larger picture of things. 
2. Stakeholders will provide feedback on how the work is coming along and present the next steps. This feedback will mainly focus on the direction of the project, the end-customer implementation and impact, and the broader picture of how you can expand the work. 
3. Feedback is vital in any coding project. Have at least one other developer review your code. It will help you immensely. There may be use-cases you missed, logical bugs that you overlooked, optimizations to consider, and more.

As you begin to add more people to the project, you can now focus on a specific project area. When I add new individuals, I tend to add them to three main parts of the project. Breaking teammates into these groups allows for focused work on the current codebase, updating the codebase processes, and research.

  • Maintainability – These individuals maintain the current codebase, fix bugs, and looking for areas of improvement. The maintenance ensures the codebase is stable and operating as expected.
  • Processes – Those assigned to updating processes for the codebase focus on CI/CD pipelines, unit testing, and code review processes. The code should be production-ready when merged into master with robust processes to ensure the code quality.
  • Research – The research group may or may not include people from the other two groups. This group of individuals research new features and functions for the project, determine if they are a go/no-go, and work on the implementation. The implementation may be an initial MVP feature before giving another developer the work to continue and maintain.

Maintain Current Code

When adding developers to a project, the first group focuses on the current code implementations related to how the code was originally written. This team looks for areas of improvement and works on stabilizing the codebase. When the codebase is started and then reviewed for feedback, you will begin to develop a large backlog of work, which includes optimizations, new features, and bugs.

10. More developers and data scientists allow for fast-paced development with an increased number of features to implement. It becomes hard to keep up with new features and bugs to fix as a lone developer.
11. The increased number of features to implement means an evolving backlog of work. Keeping up with the backlog of work to maintain an application will require more people. Backlogs can become very large and require frequent cleaning to maintain them. Keep your backlog current. 
12. As you increase the developers and data scientists on a project, you can continue to fail fast and iterate quickly. Failing fast is a great way to learn what is or isn't working on the project. Pivot when necessary to keep up with the changes. 

Updated Processes

The second group of developers focuses heavily on implementing updated processes to the new repository. I have found it helpful when migrating from a lone developer project to a project that will be production-ready to have one group focus on processes. This group will ensure the standards and processes set by our other repositories are also implemented in the new repository.

7. Developers may not properly test code developed in solidarity. Developers and data scientists will develop unit testings, QA processes and determine how to do user testing. The level and way of testing may vary slightly by project, but the expectation is that the code is well written and tested before implementation.
8. The right processes need to be put into place. There is a focus on having robust processes to handle the code's development, testing, and production states in our group. These processes include the testing, CI/CD pipelines, validation of output, and peer reviews.
9. Code written by a lone developer has not been thoroughly peer-reviewed. Code must go through a peer review to get at least one other developer's Opinion on best practices, issues, and possible logic errors. These peer reviews use the testing and pipelines to ensure the code is well tested before someone looks it over.

More Use-Cases and Research

The last group are the researchers. As you begin to invite stakeholders, developers, and data scientists into the project work, you will become aware of use-cases that were over-looked and areas that lacked a decent investment in research. This group of individuals defines what areas need to be researched based on feedback, determine the expected outcome of the work, and develop an MVP feature as required. They are working to push the edge of what has been done and fill the gaps.

4. You may not consider every possible use-case. Gathering feedback and including other developers on the work will help ensure those missed cases are accounted for in the backlog of work.
5. Additional developers means more time to invest in research specific aspects of the project. Having data scientists and developers focused on research areas related means improving a particular part of the work. These topics may include areas such as runtime speeds, algorithm optimizations, and data processing techniques.
6. Lone developers may lack enough knowledge of the data to handle every scenario when coding alone. SME's and other data scientists can aid in providing this background where it is missing. The added input will improve the overall result of the work.

Final Thoughts

You can’t write an entire production-ready data science application in isolation. It would be best to have stakeholder buy-in and feedback, understand all possible use-cases, implement CI/CD pipelines, and maintain your code. After getting stakeholder buy-in for the MVP work and receiving feedback, migrate the project from a lone developer to a team. At this point, you can break the team up into three groups: maintainability, processes, and research.

  • You want the code to be maintainable to continue improvements and bug fixes.
  • You want to develop robust processes around the codebase to maintain a production-ready implementation.
  • You want a subset of the team focused on researching new features and functions to incorporate into the work.

You can’t write an entire production-ready application in isolation, but you can start it and invite the team to continue the work.

Have you started a project in isolation and then brought the team into it? What did you learn from this process?


If you would like to read more, check out some of my other articles below!

5 Technical Behaviors I’ve Learned from 2 Years of Data Science and Engineering

15 Topics to Consider as You Review Code in Data Science

5 Leadership Behaviors I’ve Learned from 2 Years of Data Science and Engineering


Related Articles