The world’s leading publication for data science, AI, and ML professionals.

The Single Responsibility Principle in Python

One Principle to Rule Them All

Photo by freestocks on Unsplash
Photo by freestocks on Unsplash

Introduction

The Single Responsibility Principle (SRP) states that a function or a class should have one responsibility. Following this principle will help keep your functions small and manageable.

Breaking this rule is sometimes justified, but the more you rely on an individual function to do, the more difficult it will be to understand, update, change, debug, or refactor it.


Examples of Function Responsibilities

Getting data from a database is a responsibility.

Filtering, sorting, or transforming that data is another responsibility.

Storing data is a different responsibility.

So is presenting that data to the user.

Functions that handle more than one responsibility have more than one reason to change. When you must change a function, the more things that function does, the more difficult it will be to change its implementation without introducing unwanted behavior.


Motivating Example: Extracting Data From Archives

Let’s say that I have a directory with files, and I need to access the data in those files. Some of them are archived in .zip, .tar, or .gz formats, so I also need to extract those archives first.

Photo by Jason Pofahl on Unsplash
Photo by Jason Pofahl on Unsplash

Look at the function I have to accomplish this below:

It seems pretty straightforward, right? It’s short and easy to understand, why split it up?


Short Functions Aren’t Always Enough

Unfortunately, this little function is doing a lot. You could think of this function as having 3 responsibilities in 5 lines of code:

  1. It iterates through the files in the given directory.
  2. It determines if the file is an archive.
  3. If it identifies the file as an archive, it extracts the files it contains.

Complexity Grows Exponentially with More Responsibilities

If this function does everything thing you need it to do in its current form and will never change or be looked at again, its current form would be sufficient. Not every function needs to be continuously split up into smaller sub-functions.

However, think about the following cases:

  • You need to change the regular expression to specify more types of archives that need to be extracted.
  • You have a nested directory structure, such as when an archive contains a directory or another archive.
  • You are developing on Windows and some of the file paths in the archives are longer than the Windows limit of 260, preventing shutil.unpack_archive from extracting the files.
  • You have another filetype other than an archive that you must prepare in a different way.
  • You have an additional archive format that is not handled by shutil.
  • You don’t need to or can’t extract all of the files in the archive, so instead of shutil.unpack_archive, you must use lower level archive-handling functions like those in zipfile, tarfile, and gzip.

Encountering any of these cases, or more likely a combination of them, could make your little function explode in complexity. I’ve encountered all of the above while working on a similar task to our example.

If the 3 responsibilities each have 3 separate cases you need to deal with, that’s 3³ or 27 combinations. Your tidy, little function could soon become a tangled mess.

Photo by Pat Krupa on Unsplash
Photo by Pat Krupa on Unsplash

The function above could be good enough as written if you know for certain that you will not have any updates or tricky edge cases (you don’t). For the rest of us – we are mere mortals and we deal with non-trivial problems, so we must defend against the unforeseen.


Refactoring Based on The Single Responsibility Principle

Let’s say that I don’t need to or can’t extract files from a particular type of archive. Luckily for me, the names of those archives have been prefixed with "skip_".

Now the logic for determining the file paths is getting more complex, so I want to separate my file path generation from the rest of the function.

The following code below splits the above function into three, one for creating the list of file paths, one for determining if it should be extracted, and one to iterate through those files and call the extraction function:

Notice that get_archive_filepaths is actually the same length as our original function! Shorter functions are generally better, but not necessarily in this case.

The 5 lines for building the archive_paths could be made into a one-line list comprehension, but I think the result would be less clear.

There’s still room for improvement with our prepare_files function as its responsibilities grow, but we’re off to a good start. By separating the logic for extracting data and the logic for generating the list of file paths, any changes that are made to one function will not affect the other.


A Rule of Thumb for Using the Single Responsibility Principle

Try to describe the purpose of your function fully and succinctly. If you can’t fully describe what your function does in a short sentence without using the word "and," then your function should probably be split up.

Example: "prepare_files iterates through the files in the given directory to locate the archives and extracts all of their files."


Using the Single Responsibility Principle to Keep Growing Code Organized

Let’s say that I also need to prepare excel files in addition to extracting the archives. Excel files can be in either .xls or .xlsx formats, and I need to convert them into .csv format.

Here’s an outline of the major responsibilities:

  1. Iterate through the files in the directory.
  2. For each file, determine if it is an archive or an excel file.
  3. If it is an archive, extract the files.
  4. If is an excel file, convert to .csv.

Here is one way to structure your functions to each handle one of the above responsibilities:

Now each function has a single responsibility. Since one of the responsibilities is extracting data from a known archive, a call directly to shutil.unpack_archive is appropriate. There is no need to wrap it in your own function unless there are additional things you must do with it.

Notice that I am no longer building a lists of file paths. I could make a list of archive file paths and a list of excel filepaths, similar to the get_archive_filepaths from the first example, but due to the additional file format, it’s simpler to iterate through the files once and prepare the relevant files as soon as the function encounters them.

Photo by Quino Al on Unsplash
Photo by Quino Al on Unsplash

Such a change in the structure is good. Don’t be tied to a structure that no longer makes sense. You’re in control of the code, so change its structure when you find a better way.

A Word of Caution

Remember that premature optimization is the root of all evil. Resist the urge to split up functions based on edge cases that you may never encounter. You could waste time in building something you just aren’t going to need.

Wait until you are forced to account for those edge cases, then think about refactoring.


Happy Coding!


Related Articles