Create Your requirements.txt Using This Technique
Stop using ‘pip freeze’ without additional filters
INTRODUCTION
The requirements.txt
file is a very important document in a Python Data Science or Machine Learning project since it lists not only the packages necessary to run the code but also registers their respective versions. These data increase the project reproducibility, by allowing other people, for example, to create a new virtual environment on their machines, activate it, and run pip install -r requirments.txt
. Thus the user will have installed locally the same packages, with identical versions, all that done in just a few seconds.
THE PROBLEMS WITH THE TRADITIONAL requirements.txt
FILES
The most common technique to create a requirements.txt
file is to run pip freeze > requirements.txt
when all packages are already installed. However, the problem with such an approach is that it saves into the requirements.txt
not only the Python packages that you actually installed via pip install <package_name>
, but also their own dependency packages. Here is what I mean by that.
Let’s consider the following scenario: in a new virtual environment, I will only install Pandas and Django as my Python extra packages for my project. So, I just run:
pip install pandas django
However, those two important Python modules depend on other packages to work properly. Pandas, for example, is built on top of Numpy, so the former will be automatically installed when pip install pandas
is run so that we can use the latter.
The same happens with Django: when pip install django
is executed, other packages are automatically installed at the same time because Django depends on them to function. If you execute pip freeze > requirements.txt
now, you won’t have a new file with just two lines, but there will be 9 (one for Pandas, one for Django, and seven unnecessary ones for their dependencies).
This is what annoys me most with the pure pip freeze
approach: it pollutes your requirements.txt
with needless information (all the extra dependency packages). Wouldn’t it be better to have a requirements.txt
that actually lists only the packages installed by you using pip
? If your answer is positive, keep reading and I will show you how to do that with the grep
Linux command.
THE WAY I CREATE A requirements.txt
FILE NOW
Since I only installed Django and Pandas, I want just these two to be listed in my requirements.txt
. The following commands do exactly that:
pip freeze | grep -i pandas >> requirements.txtpip freeze | grep -i django >> requirements.txt
Notice that the only difference between these two commands is the package name.
Thus, the new command structure has a pipe (the symbol |
). It allows the output of pip freeze
to be used as input by the grep
command, which will keep only the lines where the words pandas
and django
appear. Adding the -i
flag to make grep
case insensitive is necessary since some packages are listed in pip freeze
with a first capital letter. Then we use the >>
symbols to append this new filtered list into the requirements.txt
file.
CREATE A BASH FUNCTION TO AUTOMATE THIS PROCESS
Taking this movement just a step further, I thought it would be interesting to have a bash function that, when called with any number of Python package names as arguments, could install them with pip
and automatically append their information to a requirements.txt
file. So, after doing some research online, I created the bash function reproduced below:
pip_requirements() {if test "$#" -eq 0
then
echo $'\nProvide at least one Python package name\n'
else
for package in "$@"
do
pip install $package
pip freeze | grep -i $package >> requirements.txt
done
fi}
After you create this function in your terminal session, you will be able to call it as a replacement for pure pip install
commands. Here is an example:
pip_requirements django pandas seaborn streamlit
So, with only the command above, you will install these four Python packages and create a clean requirements.txt
with only their names and version numbers.
FINAL REMARKS
Now that my requirement.txt
files are free from unnecessary information, I think I can sleep better and even have a happier life!
All jokes aside, even though these procedures might seem too much trouble for some people, I do think that a cleaner requirements.txt
file combines with Python’s philosophy of removing unnecessary code from our projects. This will also help when we want to check quickly what exact packages the project owners actually installed to build their code.
Thank you so much, dear reader, for having honored my text with your time and attention.
I have many articles written mostly on Python and Django. If you liked this one, consider following me here on Medium and subscribe to receive Medium notifications right after I publish a new article.
Happy coding!