
Compare with most of the other popular programming languages, Python probably has the most flexible serialisation of objects. In Python, everything is an object, so we can say that almost everything can be serialised. Yes, the module that I was talking about is Pickle.
However, compared with other "regular" serialising approaches such as JSON, Pickle has more aspects that need to be careful when we use them. That’s what the title said, do not use Pickle unless you know these facts.
In this article, I’ll organise some important notes about Pickle and hope they will help.
1. Basic Usage

By using the Python Pickle module, we can easily serialise almost all types of objects into a file. Before we can use it, it needs to be imported.
import pickle
Let’s take a dictionary as an example.
my_dict = {
'name': 'Chris',
'age': 33
}

We can use the method pickle.dump()
to serialise the dictionary and write it into a file.
with open('my_dict.pickle', 'wb') as f:
pickle.dump(my_dict, f)

Then, we can read the file and load it back to a variable. After that, we have the exact dictionary back. They are 100% identical in terms of content.
with open('my_dict.pickle', 'rb') as f:
my_dict_unpickled = pickle.load(f)
my_dict == my_dict_unpickled

2. Why Pickle? What are the Pros and Cons?

Indeed, there will be more benefits if we use JSON to serialise a Python dictionary in the example above. There are generally three main drawbacks to Pickle serialisation.
Cons-1: Pickle is Unsafe
Unlike JSON, which is just a piece of string, it is possible to construct malicious pickle data which will execute arbitrary code during unpickling.
Therefore, we should NEVER unpickle data that could have come from an untrusted source, or that could have been tampered with.
Cons-2: Pickle is unreadable
The most significant to serialising a Python dictionary to a JSON string is that the result is human readable. However, that’s not true for a Pickle file. Here is the pickle file for the dictionary we’ve just pickled. If we try to open it as a text file, that’s what we will get.

Cons-3: Pickle is Limited in Python
A pickle object can only be loaded using Python. Other languages may be enabled to do so but require 3rd party libraries to be involved and may still not be perfectly supported.
In contrast, a JSON string is very commonly used in the programming world and is well supported by most programming languages.
Pickle’s Pros
Pickle constructs arbitrary Python objects by invoking arbitrary functions, that’s why it is not secure. However, this enables it to serialise almost any Python object that JSON and other serialising methods will not do.
Unpickling an object usually requires no "boilerplates". So, it is very suitable for quick and easy serialisation. For example, you can dump all the variables into pickle files and terminate your program. Later on, you can start another Python session and recover everything from serialised files. So, this enables us to run a piece of the program in a much more flexible way.
Another example will be multi-threading. When we are using the multiprocess module to run a program in multiple threads, we can easily send arbitrary Python objects to other processes or compute nodes.
In these scenarios, the security concern usually does not apply, and humans won’t have to read the objects. We just need quick, easy and compatibility. In these cases, Pickle can be perfect to be utilised.
3. What else can be pickled?

Well, I keep talking about almost everything that can be serialised by Pickle. Now, let me show you some examples.
Pickle a Function
The first example will be a function. Yes, we can serialise a function in Python, because a function is also an object in Python.
def my_func(num):
print(f'my function will add 1 to the number {num}')
return num + 1

Just define a simple function for demo purposes. Now, let’s pickle it and load it into a new variable.
with open('my_func.pickle', 'wb') as f:
pickle.dump(my_func, f)
with open('my_func.pickle', 'rb') as f:
my_func_unpickled = pickle.load(f)
my_func_unpickled(10)
The new variable can be used as a function, and the function will be identical to the original one.

Pickle a Pandas Data Frame
Another example will be a Pandas data frame. Let’s define a Pandas data frame.
import pandas as pd
my_df = pd.DataFrame({
'name': ['Alice', 'Bob', 'Chris'],
'age': [25, 29, 33]
})

Now, we can pickle it and unpickle it to a new variable. The new DataFrame will be identical.
with open('my_df.pickle', 'wb') as f:
pickle.dump(my_df, f)
with open('my_df.pickle', 'rb') as f:
my_df_unpickled = pickle.load(f)

Please be advised that Pandas has built-in methods that can pickle and unpickle a data frame. They will do the same job as above, but the code will be cleaner. The performance is also identical.
Then, there might be a question, why we should use Pickle for a data frame rather than a CSV?
The first answer is speed. CSV is human-readable but it is almost the slowest way to store a Pandas data frame.
This SO post benchmarked the performance of different ways of serialising a Pandas data frame.

The second benefit for pickling a Pandas data frame is the data types. When we write a data frame to a CSV file, everything has to be converted to text. Sometimes, this will cause some inconvenience or trouble when we load it back. For example, if we write a datetime column to CSV, we likely need to specify the format string when we load it back.
However, this issue doesn’t exist for a pickle object. What you pickled, you guaranteed to have the exact same thing back when you load it. No need to do anything else.
4. Pickle Protocol Version

It is quite common to use Pickle like what I did in the previous examples. They are not wrong, but it will be great if we can specify the protocol version of Pickle (usually the highest). Simply speaking, the Pickle serialisation has different versions. As Python versions are iterating, the Pickle module is also evolving.
If you are interested what are the existing versions and what was improved, here is a list from the official documentation.
Protocol version 0 is the original "human-readable" protocol and is backwards compatible with earlier versions of Python.
Protocol version 1 is an old binary format that is also compatible with earlier versions of Python.
Protocol version 2 was introduced in Python 2.3. It provides a much more efficient pickling of new-style classes.
Protocol version 3 was added in Python 3.0. It has explicit support for bytes objects and cannot be unpickled by Python 2.x. This was the default protocol in Python 3.0–3.7.
Protocol version 4 was added in Python 3.4. It adds support for very large objects, pickling more kinds of objects, and some data format optimizations. It is the default protocol starting with Python 3.8.
Protocol version 5 was added in Python 3.8. It adds support for out-of-band data and speedup for in-band data.
Generally speaking, the higher version is always better than the lower ones in terms of
- The size of the pickled objects
- The performance of unpickling
If we pickle the Pandas data frame using different versions, we can see the difference in size.
with open('my_df_p4.pickle', 'wb') as f:
pickle.dump(my_df, f, protocol=4)
with open('my_df_p3.pickle', 'wb') as f:
pickle.dump(my_df, f, protocol=3)
with open('my_df_p2.pickle', 'wb') as f:
pickle.dump(my_df, f, protocol=2)
with open('my_df_p1.pickle', 'wb') as f:
pickle.dump(my_df, f, protocol=1)
import os
print('P4:', os.path.getsize('my_df_p4.pickle'))
print('P3:', os.path.getsize('my_df_p3.pickle'))
print('P2:', os.path.getsize('my_df_p2.pickle'))
print('P1:', os.path.getsize('my_df_p1.pickle'))

Why does Python still reserve the old version while the new version is always better? That’s because the protocols are not always backwards compatible. That means, we have to choose a lower version if we want better compatibility.
However, if we are using pickle objects without the need to be backward compatible, we can use the enumeration to guarantee our program use the latest one (the best one). Example as follows.
pickle.dump(my_df, f, protocol=pickle.HIGHEST_PROTOCOL)
5. Pickle a Custom Class

Although Pickle supports almost all the objects in Python, we still need to be careful when we pickle an object that was instantiated from a custom class. Briefly, the class needs to be existing when we load the pickled object back.
For example, let’s define a simple class "Person" with two attributes and one method.
class Person:
def __init__(self, name, age):
self.name = name
self.age = age
def self_introduce(self):
print(f'My name is {self.name} and my age is {self.age}')
p = Person('Chris', 33)
p.self_introduce()

Now, let’s serialise the object "p" using Pickle.
with open('person.pickle', 'wb') as f:
pickle.dump(p, f)

The issue will happen if the class does not exist. This will happen if we try to load the pickled object in a new session, and the class was not defined. We can simulate this scenario by deleting the class definition.
del Person
Then, if we try to load the pickled object back, there will be an exception.
with open('person.pickle', 'rb') as f:
p_unpickled = pickle.load(f)

Therefore, we need to make sure that the class is existing when we load the object back. However, if the definition of the class is slightly different, it might not cause problems but the behaviour of the object may be changed based on the new class definition.
class Person:
def __init__(self, name, age):
self.name = name
self.age = age
def self_introduce(self):
print(f'(Modified) My name is {self.name} and my age is {self.age}')
In the new class definition, I have modified the print message of the self-introduction method.

Then, if we load the pickled object back, there will not be any errors, but the self-introduction method will differ from its original one.
with open('person.pickle', 'rb') as f:
p_unpickled = pickle.load(f)
p_unpickled.self_introduce()

6. Not all objects can be pickled

In this last section, I have to go back to my original statement "almost all Python objects can be pickled". I use "almost all" because there are still some types of objects that cannot be serialised by Pickle.
A typical type that cannot be pickled will be the live connections objects, such as the network or database connections. That makes sense because Pickle will not be able to establish the connection after it is closed. These objects can only be re-created with proper credentials and other resources.
Another type that needs to be mentioned will be a module. An important module cannot be pickled as well. See the example below.
import datetime
with open('datetime.pickle', 'wb') as f:
pickle.dump(datetime, f)

This is important to know because that means we will not be able to pickle everything in global()
since the imported module will be in there.
Summary

In this article, I have introduced the build-in serialisation method in Python – Pickle. It can be used for quick and easy serialisation. It supports almost all types of Python objects such as a function and even a Pandas data frame. When using Pickle for different versions of Python, we also need to bear in mind that the versions of Pickle may also be different.
If you feel my articles are helpful, please consider joining Medium Membership to support me and thousands of other writers! (Click the link above)