Written By: Amal Hasni & Dhia Hmila

Exporting your fitted model after the training phase is the last crucial step in every Data Science Project. However, as important as it is, the methods we use to store our models weren’t specifically designed for Data Science in mind.
In fact, python’s pickle
or the well-established joblib
package, that we often use with Scikit-learn
, are general-purpose standard serialization methods that work on any python object. Therefore, they’re not as optimized as we’d like them to be.
After this article, you’ll see that we can do much better in terms of memory and time efficiency.
Table of contents
· What is piskle · What makes piskle special · How to use piskle · Supported estimators (so far) · Efficiency wise, what should you expect in numbers
What is piskle
Piskle
is a Python package we created that allows you to serialize Scikit-learn’s final models in an optimized way. If you’re not familiar with the term, here’s how Wikipedia defines it:
Serialization is the process of translating a data structure or object state into a format that can be stored or transmitted and reconstructed later
This is especially useful, if you have lots and lots of estimators to store (maybe updated versions of the same model), or if you’d like to store it on the cloud for a web application or an API.
If you’re wondering about the naming choice, piskle
is a combination of pickle
and scikit-learn
😉
What makes piskle special
Piskle offers the possibility to store Scikit-learn models (and python objects in general) efficiently enough to selectively keep the parts that need to be kept. In other terms, piskle stores only the attributes used by a certain action like the predict method. Those are the only attributes necessary to perform the needed action once the model is reloaded.
How to use piskle
To use piskle
, you first need to pip install using the following command:
pip install piskle
The next thing you need is a model to export. You can use this as an example:
Exporting the model is then as easy as the following:
import piskle
piskle.dump(model, 'model.pskl')
Loading it is even easier:
model = piskle.load('model.pskl')
If you want even faster serialization, you can disable the optimize
feature. Note that this feature reduces the size of the exported file even further and improves loading time.
piskle.dump(model, 'model.pskl', optimize=False)
Supported estimators (so far)
So far, piskle supports 23 scikit-learn estimators and transformers. The included models have been tested using the latest version of Scikit-learn (currently 0.24.0). You can check the full list here.
Efficiency wise, what should you expect in numbers
To demonstrate the potential of piskle, we can conduct a simple experiment. We will export the same model using three different methods and compare the sizes of the resulting files.
Before we start exporting scikit-learn’s models, let’s start by getting a big enough dataset we can use to highlight the difference piskle’s use can make. For convenience, we will use a python package called datasets
that allows you to easily download more than 500 datasets. The dataset we chose is called Amazon Us Reviews and has textual attributes we can use with TF-IDF as follows:
To compare piskle with joblib and pickle, we export our model using the three packages and observe the resulting files using these lines of code:
Here’s a recap of the resulting three file sizes:

+----------+----------+----------+
| Pickle | Joblib | Piskle |
+----------+----------+----------+
| 1186 KB | 1186 KB | 388 KB |
+----------+----------+----------+
We can observe a significant size reduction when using piskle as opposed to pickle and joblib: almost 67% gain in file size.
💡 Note that we can optimize this further using compression algorithms for the three packages.
Final thoughts
Piskle was born out of a real need for an efficient way to export a scikit-learn model for a Web App creation. It has demonstrated a significant efficiency on the models tested so far and thus proves to be of a high added value. Don’t hesitate to try it out, especially if you mean to store your model on the cloud and/or you’re short of space.
As piskle is still a work in progress with a lot of potential improvements planned, we will be pleased to receive your feedback and/or suggestions:
Thank you for sticking this far and for your interest. Stay safe and we will see you in our next article 😊 !