Background
Nowadays, Data Scientists are becoming more and more involved in the production side of deploying a machine learning model. This means we need to be able to write production standard Python code like our fellow software engineers. In this article, I want to go over some of the key tools and packages that can aid in creating production-worthy code for your next model.
Linters
Overview
Linters are a tool that catches small bugs, formatting errors, and odd design patterns that can lead to runtime problems and unexpected outputs.
In Python, we have PEP8 which fortunately gives us a global style guide to how our code should look. Numerous linters exist in Python that adhere to PEP8, however my preference is flake8.
Flake8
Flake8 is actually a combination of the Pyflakes, pycodestyle and McCabe linting packages. It checks for errors, code smells and enforces PEP8 standards.
To install flake8 pip install flake8
and you can use it by flake8 <file_name.py>
. It really is that simple!
For example, let’s say we have the function add_numbers
in a file flake8_example.py
:
def add_numbers(a,b):
result = a+ b
return result
print(add_numbers(5, 10))
To call flake8 on this file, we execute flake8 flake8_example.py
and the output looks like this:

Flake8 has picked up several styling errors that we should correct to be in line with PEP8.
See here for more information about flake8 and how to customise it for your needs.
Code Formatters
Overview
Linters often just tell you what’s wrong with your code but don’t actively fix it for you. Formatters do fix your code and help expedite your workflow, ensure your code adheres to style guides, and makes it more readable for other people.
isort
The isort package sorts your imports in the required order specified in PEP8. It can easily be installed by pip install isort
.
Imports should be written on separate lines:
# Correct
import pandas
import numpy
# Incorrect
import pandas, numpy
They should also be grouped in the following order:
- Standard library (e.g.
sys
) - Related third party (e.g.
pandas
) - Local (e.g. functions from other files in the repo)
# Correct
import math
import os
import sys
import pandas as pd
# Incorrect
import math
import os
import pandas as pd
import sys
Finally, the imports from packages need to be in alphabetical order:
# Correct
from collections import Counter, defaultdict
# Incorrect
from collections import defaultdict, Counter
The following commands show you how to run isort from the terminal:
# Format imports in every file
isort .
# Format in specific file
isort <file_name.py>
For more information on isort, check out their site here.
Black
Black reformats your code based on its own style guide which is a subset of PEP8. See here for the current guide black adheres to when formatting.
To install black simply run pip install black
and to call it on a file black <file_name.py>
.
Below is an example for a file called black_example.py
:
# Before running black
def add_numbers ( x, y ) :
result= x +y
return result
Then we run black black_example.py
:
# After running black
def add_numbers(x, y):
result = x + y
return result
The output in the terminal will also look like this:

For more information and how to customise your black formatter, see their homepage here.
Unit Tests
Overview
Unit tests provide a structured format to ensure your code is doing what it is meant to do. They test small bits of your code like functions and classes to verify they are behaving as expected. Tests are quite simple to setup and can save you hours of debugging time, so are highly recommended for Data Scientists.
PyTest
Pytest is the most popular unit testing framework alongside Python’s native unit testing package and is easily installed through pip install pytest
.
To use pytest, we first need a function we can test. Let’s go back to our add_numbers
function, which will be in a file called pytest_example.py
:
def add_numbers(x, y):
result = x + y
return result
Now in a separate file called test_pytest_example.py
, we write the corresponding function’s unit test:
from pytest_example import add_numbers
def test_add_numbers():
assert add_numbers(5, 13) == 18
To run this test, we simply execute pytest test_pytest_example.py
:

As you can see, our test passed!
If you want a more detailed and comprehensive tutorial on pytest and unit testing, checkout my previous post on the subject:
Debugging Made Easy: Use Pytest to Track Down and Fix Python Code
Type Checker
Overview
The final topic we will cover is typing, and no not the keyboard kind! Python is a dynamic language, which means it does not enforce strict typing for its variables. A variable x
, can be an integer and a string in the same code. However, this can be problematic and lead to unexpected bugs. Therefore, there are tools to make Python more like a statically typed language.
Mypy
We can ensure our variables and function have the right expected types by using the package mypy. This package checks that the inputs and outputs are correct with the required types.
For example, for the add_numbers
function, we expect the inputs and outputs to both be float
. This can be specified in the function:
def add_numbers(x: float, y: float) -> float:
result = x + y
return result
print(add_numbers(10, 10))
print(add_numbers("10", "10"))
Now, let’s say we pass the following arguments into the function and print
the results:
print(add_numbers(10, 10))
print(add_numbers("10", "10"))
The output would look like this:
print(add_numbers(10, 10))
>>> 20
print(add_numbers("10", "10"))
>>> 1010
We see the first output is what we expect, but the second is not. This is because we passed in two str
types, however the python interpreter didn’t error out as Python is a dynamic language.
We can use mypy to catch these errors and avoid any bugs downstream. To do this, call mypy as mypy <file_name.py>
. So, for this example we execute mypy mypy_example.py
:

As we can see, mypy has picked up that the arguments specified in line 6 are str
, whereas the function expects float
.
If you want a more detailed and comprehensive tutorial on mypy and typing, checkout my previous post on the subject:
A Data Scientist’s Guide to Python Typing: Boosting Code Clarity
What’s The Need?
To summarise, you might be thinking, why do we need all these tools? Well, all these lead to your Python code having:
- Readability: Your code becomes instantly more intuitive and readable to other developers and data scientists. This allows for better collaboration and quicker delivery times.
- Robustness: The code will be less prone to errors and also harder to introduce errors, particularly using unit tests.
- Easier To Identify Bugs: Through the use of linters and tests, we can detect any inconsistencies and odd results from the code, which limits the risk of shipping to production with code errors.
You can view the whole code used in this post at my GitHub here:
Medium-Articles/Software Engineering /code-quality-example at main · egorhowell/Medium-Articles
References & Further Reading
Another Thing!
I have a free newsletter, Dishing the Data, where I share weekly tips for becoming a better Data Scientist. There is no "fluff" or "clickbait," just pure actionable insights from a practicing Data Scientist.