Don’t toss your notebooks when you finish your analysis. It can ruin your productivity when starting your next project. As I work on different Data science projects, I look back at the notebooks I have created and think, "can this code be reused?" The answer is almost always yes! Developing a reusable code base will allow you to run repeated analyses and make it more readable when you share your code with others.
What is Code Reuse?
Code reuse is using existing code or notebooks for new functions or applications. As stated in the book Systems Programming, code reuse has many benefits, and it allows for better readability of code, better code structure, and reduced testing.
Keeping your old notebooks can help with code reuse. You can start to see code patterns that you are repeating and use those to develop functions or classes. This refactoring effort will help you improve your code overtime as you avoid duplication. You will have more maintainable functions that are easier to update as needed in the future. As you create these functions, you can begin to incorporate unit testing to verify that your functions are working correctly. Unit testing is a valuable effort that will help avoid issues in the future by showing you if your function is producing the expected result or not.
Why Should You Practice Code Reuse?
Code reuse by creating functions/classes for use in later projects is a valuable technique that will help you become more productive and save time when running your analyses. As discussed in Matlab’s post on code reuse, the code’s modularization will also enable multiple individuals to use the same functions easily. If you decided to create your software library for your Data Science projects, you would have many code reuse examples. This library would contain many functions and classes responsible for different aspects of your data science work.
Learning to work with reusable and object-oriented code would also allow you to implement new behaviors with custom implementations. For example, you may find yourself utilizing a mathematical library that does not include a function you need or the ability to act in a specific manner. Learning to reuse the code and write in OOP will allow you to extend the library’s functionality to tailor it to your use-case.
Thomas Huijskens made a good point in that your code should be production-ready. Learning to create functions and cleaning up your code will also be valuable to you when you want to rerun an analysis. If you develop a useful analysis, you will want to show it to different management levels or drive action in a business unit. In this case, your code should be easy to rerun and maintain for the future. Having your code cleaned up in functions and making it readable will make it easier for you to rerun and recreate the results next time you need the analysis. You may find that your analytics and visualizations are rolled out to other teams or customers as you continue to develop. Writing functions will aid in making your analysis reproducible and your code readable.
Final Thoughts
Keeping your old notebooks and developing reusable code will aid in your productivity in data science. Creating a reusable codebase will allow you to run repeated analyses and make it more readable when you share your code with others. Your code should be production-ready so that anyone can pick up where you left off, rerun the analysis, and understand the code. As you start your next project, consider using functions, writing clean documentation, and find areas of high reusability.
Additional Reading
- Richard John Anthony, in Systems Programming, 2016 Chapter 7.5.4 Reuse Code When Opportunities Arise
- Matlab What is Code Reuse?
- Misconceptions of Code Reuse by Arho Huttunen
- Data Scientists, the Only Useful Code is Production Code by Thomas Huijskens
If you would like to read more, check out some of my other articles below!
Stop Wasting Your Time and Consult a Subject Matter Expert
Top 3 Books for Every Data Science Engineer
Do We Need Object Oriented Programming in Data Science?
Keys to Success when Adopting a Pre-Existing Data Science Project