The IPO Model

Published in

Towards Data Science

4 min readJun 29, 2017

During my General Assembly Data Science Immersive Course, my colleague David Ortiz suggested I should write a blog post on how I write functions. The model I tend to use when writing a script is the Input Process Output Model (IPO), which helps organize and classify your functions. It can take some effort making your code conform to this model, but the reward comes from having code that is well organized and easy to debug.

The following will be a walkthrough on how I used IPO in my first blog post, Which GA Campus is Most Happening, which analyzed which GA campuses had the most events.

Input

This should be a function or set of functions that loads the data needed to create the desired output. Many times this will be reading in a CSV or using an ODBC connection to get data from a SQL database.

In this case, I created a function called scrape_ga to scrape the individual city event pages on General Assembly’s website. The data that is web scraped is then put into a dataframe for the corresponding city.

Process

This should be a function or set of functions that prepare the data so that the appropriate output could be produced. This may mean cleaning or aggregating the data.

The events data on General Assembly’s website is pretty clean so not much processing needs to be done here. What I did was combined all of the individual city dataframes created from the scraping function, and then deduplicated the resulting master dataframe.

Output

This should be a function or set of functions that creates the desired output with the cleaned processed data. This could be a new spreadsheet or the output of a model or a graphic.

In this case the output was a bar graph that compares the number of events per campus.

Why Use IPO?

Before I learned about IPO, I would only use functions to avoid repeating myself. While it was easy to quickly write my code this way, this did result in some problems.

Organization and Debugging Code

IPO disciplines you in knowing the broad purpose of each of your functions, and organizing them in a logical manner. IPO has a clear flow: input functions feed into the processing functions which feed into the output functions. Following this model will make it easier for your colleagues and your future self to read and modify your code. And if there is an error, then it will be a lot easier to locate where the corrections need to be made.

Keeping the Global Namespace Clean

One of the coolest properties of python are namespaces. If a variable is created within a function, only that function has the ability to use that variable — it is a local variable. But if a variable is created outside of a function, all functions have access to it — it is a global variable.

a = 1 #This is a global variabledef function_1():
    b = 2 #This is a local variabledef function_2():
    print a #This step will work because a is a global variable
    print b #This will result in an error because b is local to        
            #function_1

Keeping the global namespace clean isn’t an issue for short scripts, but for long scripts it can get hard to keep track of all your variables leading to more bugs.

a = 1### 1000 lines of codea = "Cat"### 1000 more lines of codea += 1 #I forgot that I changed a to a string. This will result in        
       #an error.

Modified IPO for an Analytics Context

Much of our work as data scientists is showing our process and results in an easy to read format such as a Jupyter Notebook. I find myself a lot of times writing a line of code and then a using a markdown cell to explain it. Having your code in three or so main IPO functions is not really conducive for this format except for smaller projects.

I use a modified version of IPO in this case where I have a main set of input functions and processing functions, and have all my analysis (AKA my output) done globally. The reasoning for this is that even though 80% of our work as data scientists is getting and cleaning data, stakeholders mainly care about the 20% which is the analysis. I will still comment out my Input and Processing functions, but make sure my analysis gets highlighted by the markdown cells throughout the notebook.

You can see the full Jupyter Notebook for the General Assembly web scraping project on my portfolio. I am a big advocate for IPO because it has helped me write code that will be usable in the long run. It may be tempting to stray away from IPO, but every time I have strayed it has resulted in disorganized code that I eventually have to rewrite into IPO.

The IPO Model

Written by Brendan Bailey