
An important principle of software Programming is the DRY principle. DRY is an acronym for "Don’t Repeat Yourself". The goal of DRY is to avoid needless repetition in software programming. Applications of DRY include implementing abstractions through functions, classes, decorators, class decorators and metaclasses. In this post, we will use a function decorator to wrap and add extra processing to existing functions used for model building.
Let’s get started!
For our example, we will define a decorator function that reports the execution time of an input function. As a data scientist, I often have to consider the execution time of fit and predict calls made in production. Let’s consider this use case.
We will be using the synthetic medical data from Medical Costs Personal Dataset which can be found here. We will define functions for reading data, fitting data and making predictions. We will then define a decorator function that will report the execution time for each function call.
To start, let’s read in our data into a Pandas data frame:
import pandas as pd
df = pd.read_csv("insurance.csv")
Let’s print the first five rows of data:
print(df.head())

Let’s now frame our prediction problem. Let’s use the ‘age’, ‘bmi’, and ‘children’ columns as input features and ‘charges’ as our target. Let’s also split our data for training and testing. First, let’s import some necessary packages:
import numpy as np
from sklearn.model_selection import train_test_split
Next, let’s define our input and out. Let’s also split our data into training and testing sets:
X = np.array(df[['children', 'bmi', 'age' ]])
y = np.array(df['charges'])
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 42)
Here, we select a test size corresponding to a random sample of 20% of the data. Now, let’s put all of this into a single function:
def read_and_split(self, test_size):
df = pd.read_csv("insurance.csv")
print(df.head())
X = np.array(df[['children', 'bmi', 'age' ]])
y = np.array(df['charges'])
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 42)
return X_train, X_test, y_train, y_test
Next, let’s define a function, ‘fit_model’, that we will use to fit our model to our training data. Let’s import the ‘LinearRegression’ module:
from sklearn.linear_models import LinearRegression
In our ‘fit_model’ function let’s define a ‘LinearRegression’ object and fit our model to the training data:
def fit_model():
model = LinearRegression()
model = model.fit(X_train, y_train)
return model
Finally, let’s define a function that will make predictions on the test set:
def predict(input_value):
result = model.predict(X_test)
return result
Now that we have our functions defined, let’s define our decorator function that will report execution times. Our decorator function will be a timer function, called ‘timethis’, and it will take a function as input:
def timethis(func):
...
Next, we will define a ‘wrapper’ function within our ‘timethis’ function:
def timethis(func):
def wrapper(*args, **kwargs):
...
In our ‘wrapper’ function we will define ‘start’ and ‘end’ variables that we will use to record the start and end of a run. In between defining our ‘start’ and ‘end’ variables we will call the input function and store it in a variable called ‘result’:
def timethis(func):
def wrapper(*args, **kwargs):
start = time.time()
result = func(*args, **kwargs)
end = time.time()
The last thing we need to do is place the ‘@wraps’ decorator in the line before our ‘wrapper’ function:
def timethis(func):
@wraps(func)
def wrapper(*args, **kwargs):
start = time.time()
result = func(*args, **kwargs)
end = time.time()
return result
The ‘@wraps’ decorator takes the function passed into ‘@timethis’ and copies over the function name, docstring, arguments list, etc…
We will then print the name of the function and the run time (‘end’ – ‘start’). We also return the input function, which we stored in the ‘result’ variable:
def timethis(func):
@wraps(func)
def wrapper(*args, **kwargs):
start = time.time()
result = func(*args, **kwargs)
end = time.time()
print(func.__name__, end-start)
return result
Finally, the ‘timethis’ function returns the ‘wrapper’:
def timethis(func):
@wraps(func)
def wrapper(*args, **kwargs):
start = time.time()
result = func(*args, **kwargs)
end = time.time()
print(func.__name__, end-start)
return result
return wrapper
Now we can use the ‘@timethis’ decorator on any of our functions. Let’s apply ‘@timethis’ to our ‘read_split’ method. We simply put ‘@timethis’ in the line right before the function we’d like to wrap:
@timethis
def read_and_split(self, test_size):
df = pd.read_csv("insurance.csv")
X = np.array(df[['children', 'bmi', 'age' ]])
y = np.array(df['charges'])
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 42)
return X_train, X_test, y_train, y_test
Now if we call our ‘read_split’ method our decorator ‘@timethis’ should print the execution time:
X_train, X_test, y_train, y_test = read_and_split(0.2)

Let’s do the same for our fit method:
@timethis
def fit_model():
model = LinearRegression()
model = model.fit(X_train, y_train)
return model
model = fit_model()

And for our predict method:
@timethis
def predict():
result = model.predict(X_test)
return result
prediction = predict()

I’ll stop here but feel free to play around with the code and data yourself. I encourage you to analyze the execution times for some other regression models that you can build with this data like random forest or support vector regression.
CONCLUSIONS
To summarize, in this post we discussed function wrappers in Python. To start we defined three functions for building a linear regression model. We defined functions for reading and splitting our data for training, fitting our model to training data, and making predictions on our test set. We then defined a function wrapper that allowed us to report execution time for each function call. I hope you found this post useful/interesting. The code in this post is available on GitHub. Thank you for reading!