Modeling Marketing Mix with Constrained Coefficients

How to fit a SciPy Linear Regression and call R Ridge Regression from Python using RPy2 Interface

Slava Kisilevich
Towards Data Science

--

Photo by Will Francis on Unsplash

The most common approach in Marketing Mix Modeling(MMM) is to use Multiple Linear Regression, which finds a linear relationship between a dependent variable such as sales or revenue, and independent variables including media advertisement channels like TV, Print, and additional variables like trend, seasonality, holidays. One of the questions marketers might have is what effect each media channel has on the outcome. Linear regression estimates coefficients for each of the independent variables and an intercept, which provide the following indications:

  • an intercept shows the mean outcome when media advertisement spend is zero
  • coefficients show the magnitude of change of the outcome with a unit change of the independent variable, and
  • the direction of change: coefficients can be either positive, negative, or zero

However, the assumption in Marketing Mix is that media advertisement channels should have a non-negative effect. That is, with every unit increase in advertisement spend, the sales or revenue should increase, slow down, or at least stay zero (saturation effect). This means that the resulting coefficient for media channels should be positive.

So how can we constrain the coefficients?

In my last article, I used a tree-based algorithm instead of Linear Regression, so there was no issue with coefficients at all

We can define positive priors for coefficients of media channels in a Bayesian context as I did in the following article

In this article, I show two ways to constrain coefficients:

  • by fitting a linear regression using a non-linear least squares curve_fit function from the Python SciPy package as a general solution
  • by using Ridge, a regression with regularization using the R glmnet package wrapped in Python with RPy2

Data

I continue using the dataset made available by Robyn under MIT Licence as in my previous articles for practical and non-trivial examples, and follow the same data preparation steps by applying Prophet to decompose trends, seasonality, and holidays.

The dataset consists of 208 weeks of revenue (from 2015–11–23 to 2019–11–11) having:

  • 5 media spend channels: tv_S, ooh_S, print_S, facebook_S, search_S
  • 2 media channels that have also the exposure information (Impression, Clicks): facebook_I, search_clicks_P (not used in this article)
  • Organic media without spend: newsletter
  • Control variables: events, holidays, competitor sales (competitor_sales_B)

Linear Regression with Unconstrained Coefficients

Let’s first define our independent and dependent variables:

target = "revenue"
media_channels = ["tv_S", "ooh_S", "print_S", "facebook_S", "search_S"]
organic_channels = ["newsletter"]
control_features = ["trend", "season", "holiday", "competitor_sales_B", "events"]
features = control_features + media_channels + organic_channels

We have 11 independent variables in total, 5 of them are media spend channels plus 1 organic channel, and 5 are control variables. Let’s apply a classical multiple linear regression using the statsmodels package and then check if curve fitting with curve_fit gives the same results for unconstrained linear regression

from statsmodels.regression import linear_model 
from statsmodels import tools
data_with_intercept = tools.add_constant(final_data[features])
ols = linear_model.OLS(endog = data[target],
exog = data_with_intercept)
ols_res = ols.fit()
print(ols_res.summary())
print(ols_res.params)
OLS Coefficients

The documentation of the curve_fit function says that it expects a model function:

It must take the independent variable as the first argument and the parameters to fit as separate remaining arguments

The model function describes our linear relationship as:

The model function should take as many parameters as we define and there is no way to make this function adaptable to a different number of parameters. Since we have 11 coefficients plus 1 intercept, we have to provide 12 parameters:

def linear_function(data, a, b1, b2, b3, b4, 
b5, b6, b7, b8, b9, b10, b11):
return a + b1 * data[:, 0]\
+ b2 * data[:, 1]\
+ b3 * data[:, 2]\
+ b4 * data[:, 3]\
+ b5 * data[:, 4]\
+ b6 * data[:, 5]\
+ b7 * data[:, 6]\
+ b8 * data[:, 7]\
+ b9 * data[:, 8]\
+ b10 * data[:, 9]\
+ b11 * data[:, 10]

Call curve_fit by providing the model function, independent variables as NumPy matrix, target variable as NumPy array, and explicitly setting the method to ‘lm’ which uses linear least squares optimization

curve_fit_coefs, _ = curve_fit(f = linear_function, 
method = "lm",
xdata=final_data[features].values,
ydata = final_data[target].values)
#print coefficients
pd.DataFrame(curve_fit_coefs, index=["const"] + features, columns=["coefficient"])
SciPy curve_fit Coefficients

We see that we get the same coefficients in both cases. The intercept (const) and search_S media channel are negative. Let’s constrain the intercept and all media channels to be non-negative. There is some interesting discussion on Robyn's Github about whether intercepts should be non-negative or unconstrained. Check this thread.

Linear Regression with Constrained Coefficients

First, we need to define a helper function to quickly generate the constraints in the format that curve_fit accepts: an array with the length equal to the number of parameters

bounds: 2-tuple of array_like, optional: Lower and upper bounds on parameters. Defaults to no bounds. Each element of the tuple must be either an array with the length equal to the number of parameters, or a scalar (in which case the bound is taken to be the same for all parameters). Use np.inf with an appropriate sign to disable bounds on all or some parameters.

def prepare_bounds(intercept_value, 
control_length,
media_length,
control_value,
media_value):
lower_bounds_array = []
lower_bounds_array.append(intercept_value)
for i in range(control_length):
lower_bounds_array.append(control_value)
for i in range(media_length):
lower_bounds_array.append(media_value)

return lower_bounds_array

Let’s prepare the constraints:

  • The lower bounds for intercept and media channels are 0, and -infinity for control variables.
  • The upper bounds are infinity for all variables.
lower_bounds_array = \
prepare_bounds(intercept_value = 0,
control_length = len(control_features),
media_length = len(media_channels) + len(organic_channels),
control_value = -np.inf,
media_value = 0)
upper_bounds_array = \
prepare_bounds(intercept_value = np.inf,
control_length = len(control_features),
media_length = len(media_channels) + len(organic_channels),
control_value = np.inf,
media_value = np.inf)

The lower bounds are:

lower_bounds_array
#[0, -inf, -inf, -inf, -inf, -inf, 0, 0, 0, 0, 0, 0]

The upper bounds are:

upper_bounds_array
#[inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf]

Let’s apply curve_fit with constraints

curve_fit_bounded_coefs, _ = \
curve_fit(f = linear_function,
xdata=final_data[features].values,
ydata = final_data[target].values,
bounds = (tuple(lower_bounds_array),
tuple(upper_bounds_array)))
#print coefficients
pd.DataFrame(curve_fit_bounded_coefs, index = ["const"] + features, columns = ["coefficient"])
SciPy curve_fit Constrained Coefficients

This time all the parameters of the model are positive.

This solution has several disadvantages:

  • The model function requires the exact number of parameters to be defined in advance, which makes it difficult in scenarios when we want to experiment with a different number of parameters
  • Classical linear regression is prone to overfitting, doesn’t work well with correlated or redundant variables, and doesn’t generalize well. It is often the case in Marketing Mix Modeling that variables are correlated or when the data is limited and better modeling approaches such as Ridge regression are required. Ridge regression has been used in real-life scenarios to overcome the challenges of overfitting and multicollinearity by using a regularization technique that penalizes the coefficients by reducing their magnitude. The degree of penalization is controlled by the lambda parameter.

Ridge Regression with Constrained Coefficients using R glmnet

Why did I decide to use R glmnet in Python? I simply couldn’t find any better solution. I could find glmnet wrappers in Python but due to Fortran dependencies, I couldn’t compile them on my Windows machine. Besides, glmnet comes with two handy functions out of the box: cv.glmnet which performs cross-validation and determines the optimal lambda parameter, and the glmnet function that builds the final model. Both functions perform data standardization and allow controlling the sign of the coefficients and the intercept. I had only to figure out how to call R code from Python. Luckily, there is an RPy2 package, Python interface to the R language, precisely for this:

rpy2 is running an embedded R, providing access to it from Python using R’s own C-API through either:

a high-level interface making R functions an objects just like Python functions and providing a seamless conversion to numpy and pandas data structures

a low-level interface closer to the C-API

Installing R and RPy2

Of course, you should have R installed. Then install RPy2:

pip install rpy2

Check if RPy2 recognizes your R installation by importing RPy2:

from rpy2 import robjects as ro
from rpy2.robjects import pandas2ri
from rpy2.robjects.packages import importr
import rpy2.situation
base = importr("base")
glm = importr("glmnet")

If import went smooth, get some information about your R installation:

for row in rpy2.situation.iter_info():
print(row)

If there were some errors during import, you may try the following steps:

  • check if the etc/Rprofile file doesn’t load any library
  • create a R_HOME environment with the path to the R installation

Wrapping the glmnet package

The wrapping process consists of three steps

  • Writing a function in R that calls Ridge regression and returns the output we are interested in
  • Writing Python-to-R type converters for Pandas DataFrames, Boolean, and Float types
  • Combining all in a Python function that prepares the data types, calls the R code, and returns the results

Step 1: Writing an R code: fitting a Ridge regression in R

RPy2 provides rpy2.robjects.r function that takes any string and evaluates it as R code:

run_glmnet_in_r = \
ro.r("""

#pure R code
function(x_train,
y_train,
x_test,
lambda,
is_intercept,
lower_limits,
upper_limits,
alpha = 0){
mod = glmnet(x_train,
y_train,
family = 'gaussian',
alpha = alpha,
lambda = lambda,
intercept = is_intercept,
lower.limits = lower_limits,
upper.limits = upper_limits,
type_measure = 'mse')
coefs = as.data.frame(as.matrix(coef(mod)))
names(coefs)[1] = "value"
y_pred = predict(mod, s = lambda, newx = x_test)
return (list(coefs = coefs, y_pred = y_pred))

}
""")

I defined a pure R function that fits Ridge regression and expects 8 parameters:

  • x_train — training set
  • y_train — response / target variable
  • x_test — test set whose predictions we get from the fitted model
  • lambda — the value of lambda that we get from cv.glmnet
  • is_intercept — whether we want the intercept to be fitted (unconstrained) or set to 0
  • lower_limits, upper_limits — vector of lower/upper limits for each variable
  • alpha — glmnet fits Ridge, Lasso, or Elasticnet regressions controlled by alpha. When alpha 0 — Ridge regression is fitted.

Returned is the list of coefficients and test set predictions.

Step 2: Writing Python-to-R type converters for Pandas DataFrames, Boolean, and Float types

The following code converts Pandas DataFrame into an R Data Frame:

with ro.conversion.localconverter(ro.default_converter + pandas2ri.converter):
r_df = ro.conversion.py2rpy(x_train)

However, glmnet expects the matrix as input. I had to write my own converter that takes an R Data Frame and converts it into a Matrix:

data_frame_to_r_matrix = ro.r('function(x) as.matrix(x)')

Vectors of target values, lower and upper limits are converted using RPy2:

r_y_train = ro.vectors.FloatVector(y_train)

r_lower_limits = ro.vectors.FloatVector(lower_limits)
r_upper_limits = ro.vectors.FloatVector(upper_limits)

A float scalar value of lambda is converted as if it was a vector with a single value:

lambda_best_float = ro.vectors.FloatVector([lambda_best])

The boolean variable is_intercept is converted similarly:

is_intercept_bool = ro.vectors.BoolVector([is_intercept])

The conversion from an R list into a Python dictionary requires a custom solution:

def convert_r_list_into_dictionary(r_list):
py_dict = {}
#for name in coefs.names:
keys = r_list.names
for i in range(len(keys)):
if isinstance(r_list[i], ro.vectors.DataFrame):
with ro.conversion.localconverter(ro.default_converter + pandas2ri.converter):
py_dict[keys[i]]=r_list[i]
elif isinstance(r_list[i], ro.vectors.FloatVector):
array = np.array(r_list[i])
if len(array) == 1:
array = array[0]
py_dict[keys[i]] = array
else:
py_dict[keys[i]] = r_list[i]
return py_dict

Step 3: Combining all steps in a Python function that prepares the data types, calls the R code and returns the results

def run_glmnet(x_train, 
y_train,
x_test,
lambda_best,
is_intercept,
lower_limits,
upper_limits,
alpha = 0):
#type conversions
with ro.conversion.localconverter(ro.default_converter + pandas2ri.converter):
r_df = ro.conversion.py2rpy(x_train)
r_test_df = ro.conversion.py2rpy(x_test)

r_x_train = data_frame_to_r_matrix(r_df)
r_x_test = data_frame_to_r_matrix(r_test_df)

r_y_train = ro.vectors.FloatVector(y_train)

r_lower_limits = ro.vectors.FloatVector(lower_limits)
r_upper_limits = ro.vectors.FloatVector(upper_limits)

lambda_best_float = ro.vectors.FloatVector([lambda_best])

is_intercept_bool = ro.vectors.BoolVector([is_intercept])

#call glmnet
mod = run_glmnet_in_r(r_x_train,
r_y_train,
r_x_test,
lambda_best_float,
is_intercept_bool,
r_lower_limits,
r_upper_limits,
alpha)
#prepare the results
mod = convert_r_list_into_dictionary(mod)

mod["coefs"] = mod["coefs"].reset_index()

return mod

Fitting the Ridge Regression

Let’s define the lower and upper bounds:

# Set lower and upper bounds for feature coefficients
lower_limits = np.array([-np.inf for _ in range(len(control_features))] + [0 for _ in range(len(media_channels) + len(organic_channels))])
upper_limits = np.array([np.inf for _ in range(len(control_features))] + [np.inf for _ in range(len(media_channels) + len(organic_channels))])
print(lower_limits)
print(upper_limits)
#[-inf -inf -inf -inf -inf 0. 0. 0. 0. 0. 0.]
#[inf inf inf inf inf inf inf inf inf inf inf]

This time we define constraints only for the variables. The intercept is explicitly controlled by the function parameter.

Next, we cross-validate to find the best lambda parameter:

cv_glmnet = run_cv_glmnet(x_train = final_data[features], 
y_train = final_data[target],
is_intercept = True,
lower_limits = lower_limits ,
upper_limits = upper_limits)
print(cv_glmnet["lambda_best"])
#166018.4312548283

Finally, we fit Ridge regression using the lambda we found:

results = run_glmnet(x_train = final_data[features], 
y_train = final_data[target],
x_test = final_data[features],
lambda_best = cv_glmnet["lambda_best"],
is_intercept = True,
lower_limits = lower_limits,
upper_limits = upper_limits)
results.keys()
#dict_keys(['coefs', 'y_pred'])

The final coefficients are:

results["coefs"]
Ridge Coefficients

Conclusion

In some situations, following a business logic as in Marketing Mix Modeling, we want to force linear regression to find a solution with positive coefficients. However, not all statistical packages support this functionality and some workaround is needed.

I showed two ways to constrain coefficients by using a SciPy non-linear least squares optimization function curve_fit and Ridge regression from an R glmnet package. The latter approach requires wrapping an R code using an RPy2 interface and allows calling arbitrary R functions from Python.

The complete code can be downloaded from my Github repo

Thanks for reading!

--

--