The world’s leading publication for data science, AI, and ML professionals.

All You Need to Know about Writing Effective Comments in Your Code

It's as important as writing the source code

Photo by Thomas Bormans on Unsplash
Photo by Thomas Bormans on Unsplash

Comments are short, programmer-readable explanations or annotations written directly into the source code of a computer program. Although the computer ignores them while executing a program, writing effective comments in your source code may be as important as the actual code itself for the simple fact that software always remains incomplete.

There’s always something that could be done to improve a software product or service, which means a codebase would have to be updated from time to time. It’s near enough impossible to make changes or add new functionality to code that you don’t understand, thus it’s vital that code is always constructed such that it can be read by a human.

"Programs must be written for people to read, and only incidentally for machines to execute."

Comments are employed to notify, warn, and remind others that didn’t write the code [and your future self] of important things that the code is doing. In this article, we are going to be focusing on writing comments in Python.

Table of Contents 
- 5 Unwritten Rules about comments in Python
    - Rule #1 Comments are complete sentences
    - Rule #2 Comments should obey line limits
    - Rule #3 Comments must be on the same indentation level as the code it comments
    - Rule #4 Put a space after the #
    - Rule #5 Links don't replace explanations
- The different types of comments 
    - Inline comments
    - Explanation comments
    - Summarization comments 
    - Legal comments
    - Code tag comments 

5 Unwritten rules about comments in Python

We can write single-line or multiline comments in Python. A single-line comment is defined using the # symbol and ends at the end of the line. Python doesn’t necessarily have a dedicated syntax for multi-line comments so Pythonistas can decide between using multiple single-line comments, known wholly as block comments, or using triple quotes multi-line strings.

# This is a single line comment. 
# This is an example 
# of multiple single-line
# comments that make up a 
# multi-line comment.
"""This is an example
of triple quotes multi-line
strings that make up a
multi-line comment."""

All three scenarios are valid, but you’ll notice the triple quotes multi-line string is much more readable than using multiple single-line comments to define a multi-line comment. Thus, it’s better to use triple quote multi-line strings to define a multi-line comment if it’s required – remember, "programs are written for humans to read."

You may come across people that consider comments as an afterthought or tell you "comments aren’t important," ignore them. Go and read any popular library’s source code to see their significance. Comments are certainly required if you wish for your code to be 1) more readable and 2) more professional.

However, there are 5 golden rules to follow to ensure you’re writing effective code.

Rule #1 Comments are complete sentences

The entire purpose of commenting code is for it to be read by other programmers [and your future self] to gain a better understanding of what’s happening in the code it’s commenting. Therefore, comments should follow proper grammatical rules and include punctuation to ensure it provides clear information back to the reader.

# data from train dir               <-- Not good 
data = load_data()
# Load data from train directory.        <-- Good
data = load_data() 

Rule #2 Comments should obey line limits

Python’s style guide, PEP 8, was created to serve as a set of best practices for Programming in Python. One of the guidelines suggests lines should be limited to 79 characters: this guide applies to source code and comments alike.

# This comment is so long that it doesn't even fit on one line in the code cells provided by medium.    <-- Not good 
# This comment is much shorter and within the limits.   <-- Good

Programmers break the PEP 8 suggested line limit all the time, and that’s okay – it’s just a guide after all. But your comments should still obey whatever line limit you agree upon with your team [or by yourself if you’re working alone].

Rule #3 Comments must be on the same indentation level as the code it comments

Indentation refers to the number of spaces at the beginning of some code; Python uses four spaces as the default indentation to group code. Writing comments on different indentation levels will not cause your program to crash, but it’s much easier to follow when it’s on the same level.

def example_function(): 
# Perform a random calculation.        <-- Not good
    random = 24 + 4
    return random
def example_function(): 
    # Perform a random calculation.        <-- Good
    random = 24 + 4
    return random

Rule #4 Put a space after the

Putting a space after # helps with readability.

#This is valid but not easy to read.        <-- Not good
# This is valid and easy to read.           <-- Good

Rule #5 Links don’t replace explanations

Sometimes we may need to link to external pages to further explain what our code is doing. Merely leaving a link to the page is not as effective as writing why the code is being implemented before linking to the external resource. The reason for this is that pages can be taken down, then you’ll be left with an unexplained link that navigates to nowhere.

Note: _The example code below is a snippet from the Scikit-learn codebase._

# Not good
def _check_input_parameters(self, X, y, groups): 

    --- snip ---
    # see https://github.com/scikit-learn/scikit-learn/issues/15149
    if not _yields_constant_splits(self._checked_cv_orig): 
        raise ValueError(
            "The cv parameter must yield consistent folds across "
            "calls to split(). Set its random_state to an int, or "
            " set shuffle=False."
        )
    --- snip ---
# Good
def _check_input_parameters(self, X, y, groups): 

    --- snip --- 
    # We need to enforce that successive calls to cv.split() yield 
    # the same splits: 
    # see https://github.com/scikit-learn/scikit-learn/issues/15149
    if not _yields_constant_splits(self._checked_cv_orig): 
        raise ValueError(
            "The cv parameter must yield consistent folds across "
            "calls to split(). Set its random_state to an int, or "
            " set shuffle=False."
        )
    --- snip --- 

Note: The good example could have been made better by using triple-quote multiline strings but the convention in the Scikit-learn codebase is to use multiple single-line comments hence why it’s been followed.

One of the first arguments you’ll hear from developers who aren’t as keen on following rules is "you’re overthinking it." To some degree, they have a point: your program will not crash if you decide to go against every rule presented above.

However, a major part of programming is collaboration. As much as you strive to be a great programmer that builds phenomenal things, you should also remember that a large portion of your work will be done as part of a team [in most cases]. Thus, it helps to consider how you can make your code easier for others to understand and one aspect of this is how you write your comments.

The more readable your comments are, the more likely other developers [and your future self] are to read them: your comments are only beneficial if they’re being read. Part of ensuring your comments are read is knowing how to place them effectively.

The different types of comments

Comments serve as an extra form of documentation for your source code. Thus, the general purpose of them is to inform others [including your future self] of why some functionality in the code is implemented in a certain manner, but there are several different ways to deploy comments to achieve this goal.

Let’s take a look at a few of these methods:

Inline comments

Inline comments come at the end of a line of code. There are two main reasons to use an inline comment:

#1 If a variable has been defined but the reason for using a specific object is unclear, you can use an inline comment to justify your decision.

AGE = 18 # The legal drinking age in the UK. 

#2 Another good use of an inline comment is to reduce ambiguity by providing more context to what’s being defined.

day = 3 # Days in a week range from 0 (Mon) to 6 (Sun)
height = 1.75 # Height is given in meters 

You may also see some codebases that use comments to specify a variable’s data type.

"""Code taken from my fraud detection model project.
See: https://github.com/kurtispykes/fraud-detection-project/blob/main/IEEE-CIS%20Fraud%20Detection/packages/fraud_detection_model/fraud_detection_model/train_pipeline.py"""
from config.core import config  # type: ignore
from pipeline import fraud_detection_pipe  # type: ignore
from processing.data_manager import load_datasets, save_pipeline  # type: ignore
from sklearn.model_selection import train_test_split  # type: ignore

The only reason I did this in my project was that I was using an automated tool called typechecks to validate data types – I assume that’s the reason it may occur in other codebases too.

More often than not, you don’t need to specify the data type because it’s obvious from the assignment statement.

Explanation comments

The main goal of a comment is to explain why a specific piece has been implemented in a certain manner. As you’d know, there are several ways to do things so an explanation gives more insight into the programmer’s intentions.

For example, take a look at this comment:

number_of_words *= 0.2 # Multiply the number of words by 0.2. 

The above scenario is a perfect example of a pointless comment. It doesn’t take a rocket scientist to figure out that you’re multiplying the number_of_words by 0.2 so it’s not necessary to state it again in a comment.

Here’s an improved version:

number_of_words *= 0.2 # Account for each word valued at $0.20.  

This comment provides more insight into the intent of the programmer: we now know that 0.2 is the price per word.

Note: _If you’ve read 7 Code Smells You Should Know About and Avoid, you’d know that we could have improved this code more by omitting the magic number by assigning it a constant (i.e., PRICE_PER_WORD = 0.2)._

Summarization comments

Sometimes we have no choice but to use several lines of code to implement some functionality. Helping your colleagues [and your future self] by giving them a summary of what your code is doing is extremely beneficial as it permits them to quickly skim through your code.

"""This is a small piece of functionality extracted from 
the Scikit-learn codebase. See: https://github.com/scikit-learn/scikit-learn/blob/main/sklearn/impute/_knn.py#L262"""
# Removes columns where the training data is all nan
if not np.any(mask):
    return X[:, valid_mask]
row_missing_idx = np.flatnonzero(mask.any(axis=1))
non_missing_fix_X = np.logical_not(mask_fit_X)
# Maps from indices from X to indices in dist matrix
dist_idx_map = np.zeros(X.shape[0], dtype=int)        dist_idx_map[row_missing_idx] = np.arange(row_missing_idx.shape[0])

Summary comments are simply a high-level overview of what is going on in the code. Dotting them in various places in your codebase makes it extremely easy for teammates [and your future self] to quickly skim through code for a better understanding – it also shows you know how the code works.

Legal comments

Depending on where you work, you may have to include copyright, software licensing, and authorship information at the top of your scripts – it’s more of an internal policy than a necessity for all programmers.

Here’s an example from Scikit-learn’s [_knn.py](https://github.com/scikit-learn/scikit-learn/blob/main/sklearn/impute/_knn.py#L262) source file:

# Authors: Ashim Bhattarai <[email protected]>
#          Thomas J Fan <[email protected]>
# License: BSD 3 clause

Code tag comments

It’s not unusual to find short reminder comments scattered around various source files. Developers do this to remind themselves about things they haven’t gotten round to just yet but intend to in the future.

The most common tag you’ll find is the TODO tag.

"""This is an example of a code tag comment in the
Scikit-learn codebase. The full code can be found here:
https://github.com/scikit-learn/scikit-learn/blob/main/sklearn/metrics/pairwise.py"""
def pairwise_distances_argmin_min(
X, Y, *, axis=1, metric="euclidean", metric_kwargs=None
):
    --- snip ---
    else:
        # TODO: once PairwiseDistancesArgKmin supports sparse input 
        # matrices and 32 bit, we won't need to fallback to 
        # pairwise_distances_chunked anymore. Turn off check for 
        # finiteness because this is costly and because array
        # shave already been validated.
    --- snip ---

Notice how the example above provides a clear description of what’s to be done – this is imperative if you’re using code tags.

These comments shouldn’t be used as a replacement for some formal tracker or bug report tools since it’s extremely easy to forget they exist if you’re not reading the section they’re in.

It’s important to note that all of these conventions aren’t strictly enforced by Python. Teams may have their own conventions about commenting code and in those situations, it’s better to follow the provided structure. But always remember that your code is going to be read so making it as readable as possible is going to make that process easier for whoever that person may be.

Are there any types of comments I’ve missed? Leave a comment.

Thanks for reading.

Connect with me: LinkedIn Twitter Instagram

If you enjoy reading stories like this one and wish to support my writing, consider becoming a Medium member. With a $5 a month commitment, you unlock unlimited access to stories on Medium. If you use my sign-up link, I’ll receive a small commission.

Already a member? Subscribe to be notified when I publish.

Get an email whenever Kurtis Pykes publishes.


Related Articles