The world’s leading publication for data science, AI, and ML professionals.

5 pro tips to grow your python skills as a data scientist or ML engineer

Like most data scientists out there that began coming out of an interrupted academic career (voluntarily or not, in my case a bit of…

Patrick’s blog – Data Science

Lessons learned from my journey as a data scientist & machine learning engineer

Example of what -not- to do, and how you can improve (Inspiration for this picture here. Image by author.)
Example of what -not- to do, and how you can improve (Inspiration for this picture here. Image by author.)

Like most data scientists out there that began coming out of an interrupted academic career (voluntarily or not, in my case a bit of both!), I began refining my python skills out of necessity to become a data scientist/machine learner. It wasn’t until I had worked with python for about a year that I started profoundly worrying about my python skills (I have some people to thank for that, shoutout to Tomás Moyano, Daniel Montaño and Eric Morfa Morales for their critique of me as a developer, it highly helped me improve!).

So I wanted to share my experience about what were the tips that helped me improve the fastest over my last two years as a data scientist, but also as a developer.

"What one programmer can do in one month, two programmers can do in two months."

  • Frederic P. Brooks

  1. Master git, and use a GUI for it

Most people who begin will get told "worry about git later, first start by writing good code" or "use the terminal at the beginning, that’s how the true pros do it". I totally disagree with that, because my personal experience tells me otherwise… even though it is also paradoxically true that the first rule of creating any kind of content is "focus on the content, then the content, and then, if you have time, improve the content".

Git is not just a tool in your toolbox. It records your changes, so that you can keep track not just of your work, but also of your improvements, and it gives you the ability to repair your mistakes by reverting changes. You can also work on different ideas simultaneously and manage your time by having them separated. This is invaluable, especially in a learning phase, where you can annotate modifications to your code by adding useful messages to your commit.

I also highly recommending using a GUI such as SourceTree or GitKraken to be able to visually see the evolution of your codebase. To get started, the Git Handbook will give you the basics, and I also strongly recommend creating a dummy repository, making test changes to it (or even better, realistic changes on a small project) to try out the commands and gain familiarity with git and the GUI you want to use. The main reason to use a GUI is very simple: you open the thing, and you see what happened in the entire repository in the blink of an eye. The git command-line tool will never achieve this level of communication, and being in sync with the current status of your repository is crucial. A Git repository is, in my opinion, a very visual object, so it should be visualized.


  1. Learn the basics of the programming language you use

Python is the main language used for Data Science and machine learning, but it’s not its only use, and some people may also be using R, JavaScript, Go, or even Java (yuck!) for their use-case/job. Python was designed as a full-fledged programming language and it has an insane amount of use-cases, so it is extremely versatile, flexible and useful. Learning to use it will make you more efficient and productive not just in your data science or machine learning career but also as a developer in general.

Here is a short list of things you might want to have a look into (in order of difficulty):


  1. Use Makefiles

A Makefile can be seen as a collection of terminal shortcut commands stored in a text file that integrates with the make command-line utility. Have a look at this tutorial to understand how to use Makefiles. This is an easy tip to start implementing, but it has two main impacts:

  • It saves you time by making complex commands easier to type. You won’t have to remember all the required parameters, paths and environment variables required to run a particular command, and that leaves you with more mental energy for more important things.
  • It naturally documents your repository not in its functionality, but rather in its usage. Seeing a Makefile in a repository that’s been used to run the code in the repository means that a developer seeing your code for your first time already has a rough idea of how to run your scripts. (You can also put comments in a Makefile, but ideally, put your instructions in a README.md file where you can explain more how the commands should be run!)

  1. Learn how to work with unit tests early

There’s one thing I was always doing when I began writing code: I wrote some code, I would open a terminal, and then I would paste my code in the terminal and then keep playing with it to see if it worked like I wanted. If you’re going to use this code once or twice, this is optimal. If you’re going to use it more than that, it’s a total waste of time. (That’s a general pattern in programming: if you do something more than twice, you should probably automate it. Sometimes even if you do it twice. Less often, even if you do it once. But at some point, the Zen of Python should kick in: premature optimization is the root of all evil!)

What you should do is get acquainted with the python library unittest. It’s built into python and it works wonderfully. What it allows you to do is the following: that thing you wrote in the terminal? You write it in a unit test, and then all you have to do is run that test from the terminal (ideally, write a shortcut to call your test script in a Makefile).

  1. Learn the standard library of python

How many times did I try to be smart and write some super complicated code (that also looked really complicated) to solve a problem already solved by python’s standard library! I’ll give you an example.

Let’s say you have in front of you the menu of the cafeteria for each day of the week. So you’re given two dictionaries:

and you want to write a function that generates the following sentences:

Now of course, you can see the template:

But how do we get to iterate over all of those at once?

A naive way would be to create a new data structure (ideally the first example, with list comprehension, less ideally the second example, not automated at all, but they create the same structure):

and then use that data structure in a for loop:

A little bit less naive way would be to do a for loop and use the dictionaries:

but that makes the f-string a bit unnatural to read. The ideal way would be if we could name those variables what they actually are, and this is possible with zip!

It has many advantages:

  • Improved readability. Now your f-string says exactly what it’s doing, and your variables are named exactly after what they are. (Did I mention you should use f-strings in python? They’re amazing!)
  • No need for an intermediate data structure. All you did was use the data you already had and used the standard library functions, so it’s quick, efficient and re-usable. Plus, when you learn to do these tricks often, you can re-use the data structures you already have in a very flexible and efficient way.
  • Less code. Since zip is part of the standard library, you don’t need to re-read or maintain the logic that constructed the iterator, it basically documents itself (because your variables are named by what they represent) and you didn’t implement anything new, so your code stays more concise.

There are so many more tools from the standard library that I encourage you to explore to improve your code as in the above situation, such as

  • itertools (count, repeat, cycle, product, combinations, permutations, etc.)
  • collections (Counter (!!!), defaultdict)
  • random (choice, sample)
  • functools (partial, wraps for decorators)

and many more!


I believe that if as a data scientist, you take the time to do a little bit of those five steps in your spare time, your coding skills will grow extremely fast and you will quickly become much more productive.

Let me know what you think of this article in the comments!


Related Articles