Create Your requirements.txt Using This Technique

Stop using ‘pip freeze’ without additional filters

Fabrício Barbacena
Towards Data Science

--

Photo by Joel & Jasmin Førestbird on Unsplash

INTRODUCTION

The requirements.txt file is a very important document in a Python Data Science or Machine Learning project since it lists not only the packages necessary to run the code but also registers their respective versions. These data increase the project reproducibility, by allowing other people, for example, to create a new virtual environment on their machines, activate it, and run pip install -r requirments.txt. Thus the user will have installed locally the same packages, with identical versions, all that done in just a few seconds.

THE PROBLEMS WITH THE TRADITIONAL requirements.txt FILES

The most common technique to create a requirements.txt file is to run pip freeze > requirements.txt when all packages are already installed. However, the problem with such an approach is that it saves into the requirements.txt not only the Python packages that you actually installed via pip install <package_name>, but also their own dependency packages. Here is what I mean by that.

Let’s consider the following scenario: in a new virtual environment, I will only install Pandas and Django as my Python extra packages for my project. So, I just run:

pip install pandas django

However, those two important Python modules depend on other packages to work properly. Pandas, for example, is built on top of Numpy, so the former will be automatically installed when pip install pandas is run so that we can use the latter.

The same happens with Django: when pip install django is executed, other packages are automatically installed at the same time because Django depends on them to function. If you execute pip freeze > requirements.txt now, you won’t have a new file with just two lines, but there will be 9 (one for Pandas, one for Django, and seven unnecessary ones for their dependencies).

This is what annoys me most with the pure pip freeze approach: it pollutes your requirements.txt with needless information (all the extra dependency packages). Wouldn’t it be better to have a requirements.txt that actually lists only the packages installed by you using pip? If your answer is positive, keep reading and I will show you how to do that with the grep Linux command.

THE WAY I CREATE A requirements.txt FILE NOW

Since I only installed Django and Pandas, I want just these two to be listed in my requirements.txt. The following commands do exactly that:

pip freeze | grep -i pandas >> requirements.txtpip freeze | grep -i django >> requirements.txt

Notice that the only difference between these two commands is the package name.

Thus, the new command structure has a pipe (the symbol |). It allows the output of pip freeze to be used as input by the grep command, which will keep only the lines where the words pandas and django appear. Adding the -i flag to make grep case insensitive is necessary since some packages are listed in pip freeze with a first capital letter. Then we use the >> symbols to append this new filtered list into the requirements.txt file.

CREATE A BASH FUNCTION TO AUTOMATE THIS PROCESS

Taking this movement just a step further, I thought it would be interesting to have a bash function that, when called with any number of Python package names as arguments, could install them with pip and automatically append their information to a requirements.txt file. So, after doing some research online, I created the bash function reproduced below:

pip_requirements() {if test "$#" -eq 0
then
echo $'\nProvide at least one Python package name\n'
else
for package in "$@"
do
pip install $package
pip freeze | grep -i $package >> requirements.txt
done
fi
}

After you create this function in your terminal session, you will be able to call it as a replacement for pure pip install commands. Here is an example:

pip_requirements django pandas seaborn streamlit

So, with only the command above, you will install these four Python packages and create a clean requirements.txt with only their names and version numbers.

FINAL REMARKS

Now that my requirement.txt files are free from unnecessary information, I think I can sleep better and even have a happier life!

All jokes aside, even though these procedures might seem too much trouble for some people, I do think that a cleaner requirements.txt file combines with Python’s philosophy of removing unnecessary code from our projects. This will also help when we want to check quickly what exact packages the project owners actually installed to build their code.

Thank you so much, dear reader, for having honored my text with your time and attention.

I have many articles written mostly on Python and Django. If you liked this one, consider following me here on Medium and subscribe to receive Medium notifications right after I publish a new article.

Happy coding!

--

--

Python and Django Developer • Data Analyst • BI Consultant • Data Science • Data Engineering • https://linktr.ee/fabriciobarbacena