
[Disclaimer: This post contains affiliate links to Book Depository]
The data scientist role is, in my opinion, one of the most exciting ones in the tech industry. As a data scientist, one has the ability to learn skills tied with software engineering, statistics, finances and even project management.
While this blend of skills makes the job extremely exciting, it also brings a lot of challenges to whoever starts working in the field as most individuals are, probably, not formally trained in all the skills named above.
One of the areas that data scientists may struggle with relates to software development and engineering. A lot of people migrate to the industry via economics, statistics or marketing and they are mostly self taught when it comes to coding. While this should be celebrated as it showcases most data scientists’ resilience, it shouldn’t interfere with the need of data scientists to improve their software engineering and general coding.
Adding to that, MLOps and proper deployment of machine learning models (both deeply tied with software engineering) are becoming more relevant to organizations that want to get out of the POC (Proof of Concept) rabbit hole. Even if you will not be responsible for the deployment of your model, you should, at least, be wary that someone will have to look at your code in the future.
Personally, I have a background in statistics. In the first 7 years of my career I’ve been working for big corporations that were not very mature when it comes to their data science landscape. During that time, I was mostly focused in developing models that were good at showcasing the potential of machine learning and not in building stuff with proper software engineering best practices. Since joining DareData Engineering, my whole mindset changed as I’ve started to work with different companies at the same time and understood the necessity of having things developed properly and in a scalable way.
While learning on the job, Books were one of the main mediums I used to improve my skills. Tied with videos and written articles, I loved to read some good engineering related books that help me improve technically or brought me completely new perspectives that were unknown unknowns to me.
The goal of this post is to give you a list of 6 books that were super relevant on my journey to write better code (something that I’m still trying to improve, everyday!)
These books are a mixture of technical and non-technical books. Most of them have coding examples but a couple of them are related to general methods and best practices. Keep in mind that most of these books are probably more relevant for people that are not trained as formal software engineers or are starting their career. Let’s start!
The Pragmatic Programmer

The Pragmatic Programmer is an excellent choice for people that are picking up code without any software engineering background.
Mostly through stories and lessons, the book gives you some cool tips and principles on implementation, logic, software management, among others. Reading it will bring you a new perspective when developing Data Science models and algorithms – for instance, it will help you understand why it’s important that you avoid repetition and some tactics of how to debug and test your code.
The first edition of this book came out in 1999 and some its advice is still relevant today – this is a great argument for the quality of the book as in tech, things tend to get outdated pretty fast. This is also an excellent choice to understand some of the jargon and concepts used throughout software development that may be a bit challenging to understand when you work with other people that have a SE background.
This book was super important to improve the overall architecture of my code and gave me something that is the ultimate goal of most learning journeys: I’ve defined new habits (in this case, when writing code) that are extremely valuable in my productivity and professionalism.
A relevant detail: If you are already pretty familiar with software engineering, you may not get much value from it.
Find it here:
The Hitchhiker′s Guide to Python

What’s better than a Python book written by hundreds of contributors around the world?
A really practical hands-on guide to Python’s medium to advanced stuff. As this is not focused on the most basic stuff (such as data structures) or the Machine Learning components of Python, it will help you develop proper Python coding practices and use other stuff that you may not be incorporating in your code such as Object-Oriented Programming or tests.
If you are a self-taught Python coder, you will benefit enormously from reading this book. Either in the form of improving your code’s readability or speed, this book is a really great choice for those that feel that their Python skills have hit a plateau in the recent times. And what better way to improve in one of the world’s most famous open source languages than by reading the thoughts and ideas of hundreds of Python coders around the world?
Personally, I found it extremely enlightening when it comes to dealing with Python’s specificities. Python is such an unique language with so many details that I was finding it hard to find a source of learning that was going beyond the basics – luckily, I found this book that did that while keeping the learning fun and engaging.
Find it here:
Design Patterns: Elements of Reusable Object-Oriented Software

A great book to understand some software design patterns. Have trouble understanding how to design your functions or classes? This should be your go-to book!
Design patterns showcases 23 different design patterns that you can use to build your code’s logic and structure. When building your machine learning pipelines, you may find yourself implementing the same type of patterns over and over again. This book can give you some alternative structures that can spice up your code and improve readability or performance.
If you are self-taught when it comes to coding, this book is probably a good fit to your learning journey as it can unveil some software architecture patterns that you are not aware of.
It’s common that we, as data scientists, don’t use object-oriented programming(OOP) as soon as we start coding. If your journey regarding OOP has just started, you will find the help in this book really, really relevant.
It really helped me when I jumped from writing code in a strict functional and linear approach to use OOP – although I did not find examples tailored for Data Science and Machine Learning, reading this gave me some tools that I could easily adapt to the industry.
You can find it here:
Refactoring by Martin Fowler

One of the most famous software architecture books, Martin Fowler’s Refactoring is a standard for the continuous improvement and delivery world.
This book gives us an excellent tips and ideas on how to keep improving our code for readability, usability and performance – something that is crucial in the Agile world we leave in. Through a lot of practical examples, the author presents several scenarios where one can improve their code beyond its functionality.
I found this extremely relevant for the Data Science and Machine Learning world. Although the book doesn’t contain examples tailored to that, refactoring techniques fit really well into Data Science projects due to their ever-changing nature.
Although this book uses Java examples, which makes it a bit challenging for most data professionals, hope that you find it as valuable as I did . To me, it was a really great glimpse into how to improve code after the first stages of development and I’ve learned how to tackle the difficult task of changing pieces of my code while being time efficient.
Find Refactoring in the following links:
The DevOps Handbook: How to Create World-Class Agility, Reliability, and Security in Technology Organizations

Speaking of agile, The DevOps Handbook is a "principles" book focused on the DevOps philosophy and best practices on collaborative development. A lighter read than other books I have on this list, it is particularly relevant for technical people that want to continue their career into any technical leadership role.
Although not as technical as other books this book details some stories to debunk myths around the DevOps culture while showcasing some examples from some of the most famous tech companies (Netflix, Amazon, Etsy, etc.) in the world.
This book is specifically interesting in detailing:
- How to incorporate Continuous Delivery into your deployment practices;
- How to setup a team around a Continuous Improvement (CI) and Continuous Delivery (CD) way of work;
- How to align business and IT practices – something that most tech companies thrive on;
As my role combines leadership with technical skills, this book was extremely valuable in understanding how big tech organizations work. Their success, when it comes to development and organizing teams, is enormous, and taking a glimpse of some of the principles they use was a great way to make small changes in my own projects.
A warning: if you are already pretty experienced in the concepts of CI/CD, this book may come a bit short and no news to you.
Find The DevOps Handbook in the following links:
Clean Code: A Handbook of Agile Software Craftsmanship

One of the major contributors to breaking down my huge Python functions. As most Data Scientists, I tended to develop huge functions that were doing a lot of things and that were hard to debug. You know it.. that master function that ingests your data and transforms every simple column, adding 400 new variables right after!
This book was responsible for destroying that mindset and improve my ability to write clean code.
The downside of this book (just like Refactoring) is that it is tailored for Java programming and I’m certain that I missed some of the arguments around it as I’m not really fluent in the language. Even so, the principles it teaches are portable to other programming languages (such as Python or R) and gave me a really excellent perspective on how to write a cleaner version of my code.
The upside is that this is a really (I mean, really) practical book – don’t expect any fluff from the author when presenting the examples. It’s straight to the point as it can get, focused mostly on practical examples.
Find Clean Code in the following links:
Thank you for taking the time to read this post! It took me around two years to read all these books and currently, I’m finishing "Clean Code".
When developing new projects, I still revisit them from time to time as they help me write better machine learning and data science software. Reading them was a great journey towards becoming better on writing code – a never ending journey you start when you become a data scientist.
I hope you also extract some value from them and that having them on your shelf will be as valuable to your professional growth journey as they were to me. Have other recommendations to add? Write them in the comments!