The world’s leading publication for data science, AI, and ML professionals.

A Data Scientist’s Guide to Python Typing: Boosting Code Clarity

The importance of typing and how it can be carried in Python

What is ‘Typing’?

By typing we are not referring to physically touching our keyboard, but rather the datatypes our variables (and functions) take on in our Python code!

Python inherently is a dynamic language, which means there is no formal requirement to declare what datatype our variables take on. For example, a variable may start as an integer but change to a string somewhere else in the code. This flexibility can often lead to errors during runtime that can be hard to debug.

Other languages are statically typed, this means their variable types need to be explicitly declared and cannot change during runtime. If a variable is declared as an integer, it has to be an integer through the whole runtime of the program. Examples of statically typed languages are Fortran and C++.

However, in recent years Python has developed support for typing and nowadays it is an industry-wide standard. This is especially true for Data Scientists who need to deploy robust machine-learning models into production.

In this post, I want to take you through the basic syntax and processes behind typing in Python and how to use the mypy package, which allows us to seamlessly type check our code.

Typing is actually recommended as shown by PEP 484.

Basic Example

Let’s walk through a simple example to explain the need for type-checking in Python. Below we have a function that adds two numbers together called, ingeniously, adding_two_numbers:

What is the output from the two print statements? Well, the first one is:

print(adding_two_numbers(5, 5))

>>> 10

This is expected. However, the output of the second print statement is:

print(adding_two_numbers("5", "5"))

>>> 55

Despite this result being ‘technically’ correct, it is clearly not what we were trying to achieve with this particular function.

To help overcome this issue happening, we can add type annotations to the function to make clearer the type of arguments we need to pass in and the expected return type:

In the above example, we have made it clear that num1 and num2 should both be integers and the expected output should be an integer as well.

It is important to mention that these are truly just ‘hints’ and if you pass in a string there will still be no runtime error when running the program as Python fundamentally is dynamically typed.

So, the general syntax for declaring types is:

function (variable: variable_type) -> return_type

Furthermore, if you are unsure what datatype your objects or variable is, you can check it by calling the type() function:

print(type(1))

>>> <class 'int'>

Typing Module

What if you want a specific function to return a list, but every element in the list must be an integer? Unfortunately, Python’s inherent types can’t quite do this easily. This is where we use the typing package, which can be installed by running pip install typing.

We can use the typing package to declare our datatype a lot more intricately. Below are some examples:

There are also many more types available within the typing package to meet ‘any’ variable you come across (no pun intended!). See this cheat sheet if you are interested in delving further in.

Creating Types

You can also create your own types by simply constructing a class. Below is an example of a dog class that I made:

MyPy Tutorial

mypy is a package that is now the industry standard for checking types in your Python code. It is used in virtually any production deployed code, especially machine learning algorithms, so it is well worth knowing as a Data Scientist.

To get started with mypy, simply install it as pip install mypy. Then to use it, all you need to do is run mypy <file_name.py>. Thats really all there is to it!

See here if you want to learn some of the more advanced features in mypy.

Let’s go over an example to make this more concrete. If we go back to our previous function adding_two_numbers, which looked like this:

And, we run mypy adding_two_numbers.py, the following output looks like this:

adding_two_numbers.py:6: error: Argument 1 to "adding_two_numbers" has incompatible type "str"; expected "int"  [arg-type]
adding_two_numbers.py:6: error: Argument 2 to "adding_two_numbers" has incompatible type "str"; expected "int"  [arg-type]
Found 2 errors in 1 file (checked 1 source file)

Notice that the errors are only for line 6 where we have passed in string types but the function expected integer types. It even states this in the error message.

It raised no errors for the print statement on line 5 as we passed in and the function returned the expected integer types.

Summary: Pros & Cons

Let’s wrap up this article if listing some of the main pros and cons of typing in Python:

Pros

  • Helps with **[linting](https://en.wikipedia.org/wiki/Lint(software))** and reduces the chance of bugs occurring within your code._
  • Improves readability and documentation of your code.

Cons

  • Time spent implementing and writing the types.
  • Backward compatibility on some types is not available for all Python versions.

Overall

Typing is an industry standard procedure for most Python code, and that includes Data Science work. Therefore, it is an important and also relatively easy skill to learn and implement in your work. Not only will it make your code more intuitive but it will help prevent your machine learning model breaking in production!

References & Further Reading

Another Thing!

I have a free newsletter, Dishing the Data, where I share weekly tips for becoming a better Data Scientist. There is no "fluff" or "clickbait," just pure actionable insights from a practicing Data Scientist.

Dishing The Data | Egor Howell | Substack

Connect With Me!


Related Articles