My first contribution to open-source software

Marc Laforet
Towards Data Science
5 min readMar 12, 2018

--

Open-source software brings together many altruistic programmers that freely distribute their code to the world. As a programmer that leverages open-source software to make a living I have always wanted to make a contribution back to the open source community. This past weekend I achieved this goal during the global pandas documentation sprint.

I know that I didn’t have to wait for this global event to make an open source contribution. In fact, I frequently creep the issue boards of open-source software that I frequently use. However, this being my first time contributing to an open-source project, it really helped having the support from the community along with detailed instructions. I was also given a very specific task, which also helped.

The pydata TO group hosted an event at the hacklab space in Toronto. The organizer assigned me the pandas.DataFrame.all documentation but before I got started I had to setup my environment. The pandas webpage has excellent instructions for doing this found here.

I prefer to manage my virtual environments with venv and although there were a lot of warnings, everything worked pretty well. I could now start writing the documentation, which started as the following:

I ran the following line of code to produce the doc string output.

python ../scripts/validate_docstrings.py pandas.DataFrame.all

Underneath the first set of hash-tags is what the doc-string currently looks like and underneath the second set is the current error list. The first error is:

Docstring text (summary) should start in the line immediately after the opening quotes (not in the same line, or leaving a blank line in between)Use only one blank line to separate sections or paragraphs

I was given a line associated with this doc-string and was confronted with confusion when I read the code.

I was expecting something a little closer to what the actual docs looked like. After some investigation I realized that I was missing the magic within the decorators. The appender decorator appends the content found in the _bool_doc variable. The _bool_doc variable itself has references to variables that are interpolated from the arguments given in the substitution decorator.

When I tracked down the _bool_doc variable, I found the source of the first error starring at me.

The %(desc)s variable placeholder has a blank line after the open quotes. This is not allowed. So I removed this blank line and voila! I had debugged my first error.

I kept working away at these errors and was then presented with a challenge. I was completing the documentation for DataFrame.all but it actually shares documentation with the DataFrame.any method. So when I needed to put in a see also and examples section, I had to add arguments to the instantiation function of these docs. This would allow these two related methods share the template for their documentation but have differences where needed.

I added the examples and see_also arguments to the _make_logical_function and passed them in as variables.

This is an example of a variable that was passed to the argument that generated the docs.

Notice how I added the _all_doc, _all_examples and _all_see_also variables tot he cls.all object.

These changes resulted in the new documentation page looking like the following

python make.py --single pandas.DataFrame.all

An improvement for sure.

The next step was to get this branch ready to be merged. Including making sure that it compiled and passed the pep8 standards.

git diff upstream/master -u -- "*.py" | flake8 --diff

I submitted a pull request and had comments within minutes. The comments ranged from stylistic changes to a request to add information about the all method being applicable to series data structures. Three separate core pandas developers commented on the branch. Although I frequently undergo this process at work it was really cool to do this with people that I’ve never met before across the world.

Finally at 10:42am on March 11th my branch was merged and I officially became a contributor to the pandas project in the 0.23.0 release.

I hope to make more contributions to open-source software and will hopefully contribute to implementation details as well as more documentation.

https://pixabay.com/en/background-art-wallpaper-panorama-3104413/

--

--