What is ‘Typing’?
By typing we are not referring to physically touching our keyboard, but rather the datatypes our variables (and functions) take on in our Python code!
Python inherently is a dynamic language, which means there is no formal requirement to declare what datatype our variables take on. For example, a variable may start as an integer but change to a string somewhere else in the code. This flexibility can often lead to errors during runtime that can be hard to debug.
Other languages are statically typed, this means their variable types need to be explicitly declared and cannot change during runtime. If a variable is declared as an integer, it has to be an integer through the whole runtime of the program. Examples of statically typed languages are Fortran and C++.
However, in recent years Python has developed support for typing and nowadays it is an industry-wide standard. This is especially true for Data Scientists who need to deploy robust machine-learning models into production.
In this post, I want to take you through the basic syntax and processes behind typing in Python and how to use the mypy
package, which allows us to seamlessly type check our code.
Typing is actually recommended as shown by PEP 484.
Basic Example
Let’s walk through a simple example to explain the need for type-checking in Python. Below we have a function that adds two numbers together called, ingeniously, adding_two_numbers
:
What is the output from the two print
statements? Well, the first one is:
print(adding_two_numbers(5, 5))
>>> 10
This is expected. However, the output of the second print
statement is:
print(adding_two_numbers("5", "5"))
>>> 55
Despite this result being ‘technically’ correct, it is clearly not what we were trying to achieve with this particular function.
To help overcome this issue happening, we can add type annotations to the function to make clearer the type of arguments we need to pass in and the expected return type:
In the above example, we have made it clear that num1
and num2
should both be integers and the expected output should be an integer as well.
It is important to mention that these are truly just ‘hints’ and if you pass in a string there will still be no runtime error when running the program as Python fundamentally is dynamically typed.
So, the general syntax for declaring types is:
function (variable: variable_type) -> return_type
Furthermore, if you are unsure what datatype your objects or variable is, you can check it by calling the type()
function:
print(type(1))
>>> <class 'int'>
Typing Module
What if you want a specific function to return a list
, but every element in the list
must be an integer? Unfortunately, Python’s inherent types can’t quite do this easily. This is where we use the typing
package, which can be installed by running pip install typing
.
We can use the typing
package to declare our datatype a lot more intricately. Below are some examples:
There are also many more types available within the
typing
package to meet ‘any’ variable you come across (no pun intended!). See this cheat sheet if you are interested in delving further in.
Creating Types
You can also create your own types by simply constructing a class. Below is an example of a dog
class that I made:
MyPy Tutorial
mypy
is a package that is now the industry standard for checking types in your Python code. It is used in virtually any production deployed code, especially machine learning algorithms, so it is well worth knowing as a Data Scientist.
To get started with mypy
, simply install it as pip install mypy
. Then to use it, all you need to do is run mypy <file_name.py>
. Thats really all there is to it!
See here if you want to learn some of the more advanced features in mypy.
Let’s go over an example to make this more concrete. If we go back to our previous function adding_two_numbers
, which looked like this:
And, we run mypy adding_two_numbers.py
, the following output looks like this:
adding_two_numbers.py:6: error: Argument 1 to "adding_two_numbers" has incompatible type "str"; expected "int" [arg-type]
adding_two_numbers.py:6: error: Argument 2 to "adding_two_numbers" has incompatible type "str"; expected "int" [arg-type]
Found 2 errors in 1 file (checked 1 source file)
Notice that the errors are only for line 6 where we have passed in string types but the function expected integer types. It even states this in the error message.
It raised no errors for the print
statement on line 5 as we passed in and the function returned the expected integer types.
Summary: Pros & Cons
Let’s wrap up this article if listing some of the main pros and cons of typing in Python:
Pros
- Helps with **[linting](https://en.wikipedia.org/wiki/Lint(software))** and reduces the chance of bugs occurring within your code._
- Improves readability and documentation of your code.
Cons
- Time spent implementing and writing the types.
- Backward compatibility on some types is not available for all Python versions.
Overall
Typing is an industry standard procedure for most Python code, and that includes Data Science work. Therefore, it is an important and also relatively easy skill to learn and implement in your work. Not only will it make your code more intuitive but it will help prevent your machine learning model breaking in production!
References & Further Reading
Another Thing!
I have a free newsletter, Dishing the Data, where I share weekly tips for becoming a better Data Scientist. There is no "fluff" or "clickbait," just pure actionable insights from a practicing Data Scientist.