The world’s leading publication for data science, AI, and ML professionals.

How to Generate Docstrings for Data Science Projects

Generate clear and well formatted python docstrings in seconds

Photo by Kati Hoehl on Unsplash
Photo by Kati Hoehl on Unsplash

Introduction

"I wrote this function 6 months ago and now I can’t remember what it does!" Does this sound familiar? In the midst of rushing for deadlines, we often overlook the importance of good documentation (aka docstrings) for the class, methods and functions that we created.

So what are docstrings?

Docstrings also known as documentation strings are string literals which describes a python class, function or method. Here’s an example of a docstring for a function in Google’s docstring format.

def add_two_values(x:int, y:int = 0) -> int:
    """Add two values and return the sum.
    Args:
        x (int): first value
        y (int, optional): second value. Defaults to 0.
    Raises:
        TypeError: x and y must be integers
    Returns:
        int: summation of x and y
    """
    if not all([isinstance(i, int) for i in [x,y]]):
        raise TypeError('inputs must be integer')
    else:
        z = x + y
    return z

The docstring above provides a general description of the add_two_values function followed by descriptions of the input arguments x and y , the output z and raised errors. Docstrings serve as a guide for developers who are trying to use the class, function or method that you have developed.

In this article, we will look at common python docstring formats and how to generate automated docstring for python functions, class and class methods using autoDocstring and VSCode.

Common Python Docstring Formats

These are some commonly used python docstring formats[1]. In general, the docstring formats include following elements:

  • Description of what the function does
  • Arguments: descriptions and data types
  • Return values: descriptions and data types
  • Description of raised errors

Google

def abc(a: int, c = [1,2]):
    """_summary_
    Args:
        a (int): _description_
        c (list, optional): _description_. Defaults to [1,2].
    Raises:
        AssertionError: _description_
    Returns:
        _type_: _description_
    """
    if a > 10:
        raise AssertionError("a is more than 10")
    return c

NumPy

def abc(a: int, c = [1,2]):
    """_summary_
    Parameters
    ----------
    a : int
        _description_
    c : list, optional
        _description_, by default [1,2]
    Returns
    -------
    _type_
        _description_
    Raises
    ------
    AssertionError
        _description_
    """
    if a > 10:
        raise AssertionError("a is more than 10")
    return c

Sphinx

def abc(a: int, c = [1,2]):
    """_summary_
    :param a: _description_
    :type a: int
    :param c: _description_, defaults to [1,2]
    :type c: list, optional
    :raises AssertionError: _description_
    :return: _description_
    :rtype: _type_
    """
    if a > 10:
        raise AssertionError("a is more than 10")
    return c

PEP257

def abc(a: int, c = [1,2]):
    """_summary_
    Arguments:
        a -- _description_
    Keyword Arguments:
        c -- _description_ (default: {[1,2]})
    Raises:
        AssertionError: _description_
    Returns:
        _description_
    """
    if a > 10:
        raise AssertionError("a is more than 10")
    return c

Generate Automated Docstring in VsCode

In this section we will walk through examples of how to generate automated docstring in VSCode.

Setup

To generate automated docstring in VSCode, we will require

  1. VSCode
  2. autoDocstring VSCode extension installed
  3. Python 3.5 and above

VSCode Settings

autoDocstring allows us to choose from a range of commonly used docstring format from VSCode’s user settings.

Image by author
Image by author

We will be using the default Google docstring format for the rest of the examples.

Docstring for Function

Let’s write a simple function to add two integer values.

def add_two_values(x, y):
    return x + y

To generate the docstring, place the cursor in the line directly below the function definition (i.e. below the def keyword) and perform any one of the following steps:

  1. Start the docstring with either triple double or triple single quotes and press the Enter key
  2. Use keyboard shortcut CTRL+SHIFT+2 for windows or CMD+SHIFT+2 for mac
  3. Use Generate Docstring from VsCode’s command palette
GIF by author
GIF by author

This will populate the function body in the following manner.

def add_two_values(x, y):
    """_summary_
    Args:
        x (_type_): _description_
        y (_type_): _description_
    Returns:
        _type_: _description_
    """
    return x + y

The _summary_, _type_and _description_ are placeholders that we need to replace with actual descriptions.

autoDocString can populate the _type_ placeholder automatically by inferring parameter types from type hints. Lets include type hints for arguments and return values in the function.

def add_two_values(x:int, y:int)->int:
    return x + y

After generating docstring, the function body will be populated in the following manner.

def add_two_values(x:int, y:int)->int:
    """_summary_
    Args:
        x (int): _description_
        y (int): _description_
    Returns:
        int: _description_
    """
    return x + y

Notice that the _type_ placeholders are populated with the data type of the argument and return value now.

Most docstring formats also include description of raised error. Lets raise a TypeError if x or y arguments are not integers.

def add_two_values(x:int, y:int)->int:
    if not all([isinstance(i, int) for i in [x,y]]):
        raise TypeError('inputs must be integer')
    else:
        return x + y

The newly generated docstring will include a subsection which describes the raised errors. This is what we will get after replacing the _description_ and _summary_placeholders.

def add_two_values(x:int, y:int)->int:
    """Add two values and return their sum.
    Args:
        x (int): first value
        y (int): second value
    Raises:
        TypeError: input values must be integers
    Returns:
        int: sum of input values
    """
    if not all([isinstance(i, int) for i in [x,y]]):
        raise TypeError('inputs must be integer')
    else:
        return x + y

Docstring for Class and Class Methods

Similar method can be extended for class and class method docstrings.

GIF by author
GIF by author

Conclusion

In this article, we examined the following:

  1. Importance of docstrings
  2. Common docstring formats
  3. Auto generating docstrings in VsCode with AutoDocString

Docstrings are important component of any code as it help developers to understand the overall functionality of the function, class and module. Imagine the confusion if Data Science libraries such as scikit-learn, pandas and NumPy does not come with docstrings! Happy documenting!


Join Medium to read more articles like this!

Join Medium with my referral link – Edwin Tan

Reference

[1] https://github.com/NilsJPWerner/autoDocstring/tree/master/docs


Related Articles