You did it. After years of hard work, you got hired as a junior data scientist. Your first few weeks flew by with company onboarding and before you realized it, a few years had passed. You worked on countless projects, both individually and as part of a team, and your solutions are making a positive impact on the company.
But now, you’re ready for your next challenge: becoming a senior data scientist. But how do you bridge the gap? What are some of the things that a senior data scientist needs to know? And most importantly: how do you transform your junior-level data science code into senior-level data science code?
Luckily, this last question is the easiest to answer and is the easiest skill to improve on your path toward becoming a senior data scientist. I’ve singled out the top four areas where your junior-level code can be transformed into something that would encourage any company to promote you to a senior data scientist position. The key is to master the fundamentals, ditch the spaghetti code, begin implementing testing and QA skills, and learn to optimize your code.
Master the fundamentals of data science code
You can’t run before you walk, so it follows that before you can write senior data scientist-level code you will need to master the fundamentals of code.
At the beginning of your data science journey, it’s an accomplishment to simply write code that runs properly. Now, however, is the time to begin mastering those fundamentals so that it’s no longer a surprise when your code runs properly.
This is the one tip that you can’t speed up, and that will just be achieved by spending time doing the work. Over your first few years as a junior data scientist, you’ll be given opportunities every day to work on mastering the fundamentals of data science code, from programming fundamentals to algorithms, to data structures, and to design patterns.
Furthermore, now is the time to deepen your knowledge base by learning other programming languages (likely the ones that your company uses or those that you have time to learn on your own for fun) and other technologies that can improve your quality of work (i.e., Notion for organizing your projects, Git for version control, code syntax-checking extensions in your code editor, etc.). Some of these languages and tools will stick, while others will simply provide insightful lessons that will make you a better data scientist even if you never use them again.
Now is also the time to stretch your capabilities and begin exploring even more intense concepts in data science. For example, you may be in more of a data analyst position where you’re explaining the causes of past events. However, your boss is now wanting you to move into the predictive side of things which requires you to begin learning about machine learning and Artificial Intelligence. Pushing yourself to learn these topics will allow you to move into more senior and supervisory roles, where you can begin passing on your knowledge to new junior data scientists who are starting out just like you did.
Focus on writing clean, maintainable, and readable code
I’ve often joked in previous articles that data scientists write terrible code. The spaghetti code is real, especially when you’re starting out. This may be permissible for the first couple of years that you’re working as a junior data scientist, but as your experience increases, it becomes less and less acceptable to write messy code.
One thing that will set you apart as the perfect candidate for a senior data scientist position is your ability to write clean, maintainable, and readable code. Not only does this make you easy to work with and immensely professional, but it also shows that you can pass on these techniques to future junior data scientists under your tutelage.
Therefore, to upgrade your junior-level code to senior-level code, you need to focus on making your code clean, maintainable, and readable at all times.
Both Python and R have great guides on best practices and styles which can help you begin formatting your code more professionally. Code cleanliness, maintainability, and readability are the cornerstones of a data scientist who is a pleasure to work with, which is why these standards should be emblazoned on your brain (or at the very least, have a prominent place on your desk within easy reach). Best practices and style are two things that should always be considered and reviewed heavily before pushing your final commit or sending your code to the software engineering department for translation into production-ready code.
This also means that you should be adhering to DRY coding principles (at the very least) and SOLID coding principles (at the more advanced), to ensure that you’re writing the best code possible. While these principles may not be relevant if you’re primarily writing code that will never be touched by anyone else or that will only be run on a small set of internal machines, it’s not a bad idea to become proficient in these principles if you ever change jobs or begin producing production-level code.
Additionally, at this point in your career, you should be a beacon for pristine industry/company code standards. Each code commit you push to the repository should be a gleaming example of what your industry or company is looking for, and should be something that could be printed off and used in a training manual. Yes, it will take extra time for you, but the extra bit of thoughtfulness will pay dividends when it comes time for your company to promote internally. What’s one thing they’ll look for? An employee that consistently writes clean, maintainable, and readable code – and that should be you!
This Quick and Easy 7-Step Checklist Will Help You Write Better Python Code for Data Science
Develop testing and QA skills
Becoming proficient in unit tests, integration tests, and automated testing frameworks is a great way to immediately take your code to the next level. While these are all skills you should be aware of as a junior data scientist, they’re skills you should be proficient in as a senior data scientist.
Testing and QA skills are where you can begin to write excellent code that works as it was designed and that can work in tandem with other pieces of code. Where before you may have just sent your code off to the software engineering department where they would get everything ready for integration, you are now going to be writing code like a senior data scientist and must ensure that your code functions properly and can be integrated into larger code bases.
While your company may have specific unit and integration tests they want you to run, it’s not a bad idea to begin building your own to ensure that your code is running and integrating the way you should. Your own forms of quality assurance are great ways to take responsibility for your own code and to ensure that if your code can pass your own tests, it can pass your company’s tests with no issues. Not only does this make you a better data scientist in the long run, but it allows you to become more efficient when writing code in the first place.
Developing testing and QA skills is a great way to show your company that you’re committed to improving your craft and that you care about the quality of your work and the code you push to the production environment. These are all attributes that make you a great candidate for a senior data scientist position.
Make performance optimization a priority
Nothing is a better motivator to learn how to optimize code than walking past the software engineering department after you’ve pushed your code to them and hearing the grumbles synonymous with having received a data scientist’s code. It’s a humbling experience that every data scientist should go through.
Learning code optimization isn’t just about maintaining a healthy working relationship with the software department – it’s also about making yourself a more surefooted data scientist who can write excellent code without the support of another department. Being able to write stable, optimized code the first time is a great move toward becoming a senior data scientist.
Becoming educated in topics such as caching (storing a copy of the data in front of the main data store – not necessarily relevant in all applications but can be useful when producing dashboards for clients), time complexity (the amount of time it takes your algorithm to run), database indexing (a structure that can speed up data retrieval operations in a database table), and query optimization (figuring out the best way to improve query performance) are great places to get started in optimizing your Data Science code.
While not all of the topics mentioned above are relevant for all types of data scientist work, they’re all great tools to keep in your back pocket, whether for future jobs or for that one time the need arises and you can immediately hit the ground running to solve the problem – an essential attribute of a senior data scientist.
Subscribe to get my stories sent directly to your inbox: Story Subscription
Please become a member to get unlimited access to Medium using my referral link (I will receive a small commission at no extra cost to you): Medium Membership
Support my writing by donating to fund the creation of more stories like this one: Donate