The world’s leading publication for data science, AI, and ML professionals.

The Effect of Fast-shipping

Harvard IACS Capstone Project, Fall 2020

Making Sense of Big Data

Team members: Yixing Guan, Paul-Emile Landrin, Jordan Turley, Lin Zhu

Advised by: Nick Stern, Chris Tanner, Nathaniel Burbank

Disclaimer: The views expressed in this blog are those of the authors and are not endorsed by Harvard or Wayfair.

Introduction

Wayfair is an American e-commerce company that sells furniture and home-goods. Their digital platform offers 14 million items from more than 11,000 global suppliers.

Fast-shipping products are small-parcel products that can be delivered to a portion of Wayfair’s customers in two business days or less. However, providing fast shipping products will incur more cost on Wayfair’s side. Therefore, we are interested in investigating if adding a fast shipping option to a product would increase profits concerning this product. In particular, we would like to:

  1. Estimate the average boost in sales across all products, for each category, and for each individual product.
  2. Measure the marginal effect of sales boost: how the sales boost changes as the portion of products offered with fast shipping increases.
  3. Identify key product characteristics that will make the product sales more sensitive to fast shipping.

Experiment Design & Data Collection

For every user, they are given an **** s-value which is an integer between 25 and 75. This number directly corresponds to the probability that a user sees a fast shipping flag for an item if it is available. When a user browses for an item and it has fast shipping, each item has an S% chance of showing the fast shipping flags. The s-value for each user is randomly determined when that user sees fast shipping for any given item.

The data we were given is in an aggregated form. We have data for every (item, shipping flag, s-value) combination, so we know attributes for every item, at every s-value, when it was seen with and without fast shipping. An example is given below with two fake items so the reader can understand the exact format.

Examples to illustrate the exact format of the data we have
Examples to illustrate the exact format of the data we have

For every item we have 51 (corresponding to each s-value from 25 to 75) times 2 (fast shipping flag true/false) rows = 102 rows.

The exact attributes we are given are as follows:

  1. wfsku – an internal unique item identifier
  2. s-value – the s-value assigned to the customer viewing the item
  3. cartshowshipflag – whether or not the item was seen with the fast shipping flag
  4. impressioncnt – the number of search page impressions this item had during the experiment
  5. avgsortrank – the average sort rank of the item when seen on search pages
  6. pdpcnt – the number of product display page hits the item received, i.e. the number of search page clicks the item received
  7. numord – the number of orders that this item was contained in
  8. qtyord – the quantity of this item that was ordered

The difference between numord and qtyord is as follows: if one user places an order for three identical chairs, then numord = 1 and qtyord = 3.

We are also given generic attributes about the item like the product’s class and market category name, review count, average rating, average order weight, as well as numbers for 90 days before the experiment corresponding to impressions, product display page hits, number of orders, and quantity ordered.

Data Exploration & Filtering

A preliminary analysis of the data collected is performed before we start to pick and train models for this project. Here we highlight some of the key problems and findings, along with how we address them.

Skewed Data

Figure 1: skewed data distribution of impressioncnt, pdpcnt, numord and qtyord. (Numerical scales omitted per partner's request in some figures. Unless otherwise mentioned, all scales omitted are linear. For this figure, the y-axis is in log-scale.)
Figure 1: skewed data distribution of impressioncnt, pdpcnt, numord and qtyord. (Numerical scales omitted per partner’s request in some figures. Unless otherwise mentioned, all scales omitted are linear. For this figure, the y-axis is in log-scale.)

As shown in Figure 1, customers bought very large quantities of popular products during the experiment, overwhelming the sales of the remaining products. Similar things happen to product impression count, product display page hit count, average sort rank, etc. When we use these variables in modeling, taking a log or using some other method of re-scaling will be necessary to avoid the very few large values dominating the model. In addition, we have to make sure the vast amount of close-to-zero values does not overwhelm the rest of the values.

The Linear Scaling Pattern

Figure 2: how sales with and without fast-shipping change as s-value increases for category class ID 12.
Figure 2: how sales with and without fast-shipping change as s-value increases for category class ID 12.

As shown in Figure 2, for many categories of products, sales with fast shipping within that category increase linearly as s-value increases; sales of products without fast shipping decrease linearly as s-value increases. This is a mechanical artifact of how the experiment was run. In addition, the increase in sales of products with fast shipping is always larger than the decrease in sales of products without fast shipping as s-value increases.

Data Filtering

Since data are highly skewed, we have to filter out the noise and outliers in our data to avoid undefined or extreme value estimates. Here we list the filters we used in this project and provide justification for all of them:

  1. minimum number of orders and quantities sold with and without fast-shipping: avoid undefined/extreme sales multiplier
  2. consistent with past 90 days sales records: suppose a product sold N90 copies in the 90 days leading up to the experiment(with fast-shipping), N0 copies without fast-shipping and N1 copies with fast-shipping during the experiment, then we enforce N0+N1≥ (1-t) 0.7 41/90 *N90, where t is the maximum variation ratio in sales we can tolerate(set to be 0.5 in this project). Here 41 is the number of days this experiment spans, 0.7 is the percentage of the customers directed into this experiment.
  3. top 50% popular products based on impressions or sales: popular products receive most customer exposure and generate the majority of product sales. As a result, our stakeholder cares more about popular products, and these popular product data are less susceptible to noise.

Literature Review

Here we provide first a brief review of the statistical models, packages and notations we use in this project before we present the modeling approaches we take.

Regression Models

To study the effect of fast-shipping, we begin with linear regression with respect to the number of orders. Then we move on to Poisson regression with respect to the number of orders, since the variable we are regressing over is always an integer. Finally, we use binomial regression to regress the number of (completed/successful) orders given how many times a product was loaded onto the search page, to take into account the increased amount of exposure as s-value increases.

Heterogeneous Treatment Effect & The EconML Package

While regression indeed provides us a way to study the effect of fast-shipping, a more formal approach to this type of problem is usually heterogeneous treatment effect analysis and inference. The heterogeneous treatment effect analysis takes into account the fact that the interaction between treatment and various other attributes are not necessarily linear, and provides a formal mechanism to model the interaction term. This section will provide a basic introductory overview of the package EconML, a powerful python library that enables us to perform heterogeneous effect analysis in various settings with inference analysis, along with the notation we will use extensively when we later present the models to estimate the heterogeneous treatment effect.

Let Y(T) be the response variable, T be the treatment indicator, and X be the heterogeneity indicator, the heterogeneous treatment effect from t0 to t1 is expressed as τ(t0, t1, x)=E[Y(t1)-Y(t0) | X=x].

We assume that we have data of the form: {Yi(ti), ti, xi, wi}:

  1. Yi(ti): observed outcome for the chosen treatment → numord, qtyord
  2. ti: the treatment → cartshowshipflag (1,0)
  3. xi: covariates used for heterogeneity (i.e., the variable that might interact with ti) → S-value
  4. wi: other parameters that might affect our response variables → generic product attributes like ninetydaycnt, rating, price, etc.

We assume the following structure for our target quantity τ (heterogeneous treatment effect):

  1. Y=H(X,W)T + g(X, W, ϵ)
  2. T=f(X, W, η)
  3. τ(t1, t0, x)=EH(X,W) | X=x

Here η and ϵ are the noise factors added, and H, g, f are functions. The BaseCATEstimator class from the EconML package will provide us with the confidence interval of treatment effect with a modest amount of coding.

Tree-based Models

  1. Bayesian Additive Regression Trees

Parametric models do have limits as they cannot fit many types of functions, contrary to tree methods. Bayesian Additive Regression Trees sum the contribution of sequential weak learners like Gradient Boosting Tree. The main difference between the former with the latter is that a prior distribution is used to regularize the trees (e.g., it controls the depth of the trees, it tends to shrink the spread from the mean within each leaf, etc.). This built-in penalty avoids having to tune any hyper-parameters and play with regularization. The other main benefit of this model is that we can obtain a richer set of information from the computed posterior distributions. An MCMC sampler is needed and used to grow new trees and draw samples.

  1. Honest Trees and Causal Forest

These models have been designed for causal inference to measure treatment effect. Honest trees are called ‘honest’ because they are trained on one part of the data and evaluated in the other one. More precisely such a tree is built to minimize a metric that takes into account the variance in each leaf and the covariance among leaves, and then it evaluates the treatment effect in each leaf on the remaining part of the data. These trees are designed to balance accuracy while not having too much uncertainty. A Causal Forest is an aggregation of such trees. Like in a Random Forest, the individual Honest Trees can be grown with different sub-samples of the data and the features to ensure some independence between their results. The advantage of using Causal Forest for our problem is that we can first benefit from the non-linearity to fit our data. Secondly the ‘honesty’ component would enable to have a model robust to outliers and will limit measuring noise while evaluating the treatment effect.

Aggregation-based Approach

One natural way to estimate the average effect of fast-shipping is to simply calculate it as the aggregation of total sales with fast-shipping divided by total sales without fast-shipping.

Task 1: Average Boost in Sales

Without filtering out products with little or no sales, the final value we get is 1.25x (i.e. a 25% increase in sales). This number is probably an overestimate, since we are not taking into account the increased product exposure due to fast-shipping.

Figure 3: per category and per product sales multiplier distribution
Figure 3: per category and per product sales multiplier distribution

We have also calculated the average sales multiplier for each category, and for each product, and show in Figure 3 the distribution of them. Here as we narrow the scope from all items to a specific category and product, the sales data gets much noisier. Hence, we have filtered out some data to avoid undefined or extreme sales multiplier values based on the following criteria:

1) for a category, if either the total sales of products within that category with fast shipping or without fast-shipping is lower than a threshold (set to be 100 for Figure 3), we will skip over that category.

2) for a product, if either the total sales of the product with fast shipping or without fast-shipping is lower than a threshold(set to be 40 for Figure 3), we will skip over that product.

Task 2: Impact of S-value

As a reminder, the goal of task 2 is to measure how the boost in sales changes as the portion of products offered with fast shipping changes. We used three different models to try to answer this question: linear regression, Poisson regression, and binomial regression.

As a naive baseline model, we calculated the total number of orders and the total quantity of items ordered for each s-value and regressed the sum on the s-value. This is visualized below in Figure 4.

Figure 4: linear regression.
Figure 4: linear regression.

However, this model is considered naive because we do not consider the number of impressions. At a higher s-value, items receive higher exposure because there is a boost in the search sorting algorithm when fast shipping is shown. Because of this, we would automatically expect more orders at a higher s-value, so these numbers are representing the combined effect of two different treatments: the sort boost and the shipping flag.

We also used Poisson regression to repeat the above model, since the number of orders/quantities sold is always of integer type. We found that an increase of one to the s-value results in a multiplicative increase of about 1.001 to both the number of orders and quantity of items ordered. This doesn’t sound like a lot, but if we consider moving from S = 25 to S = 75, this is a multiplicative increase of about 1.05. However, with this model as well, we are not incorporating the exposure, so we can’t directly use this result to answer task 2.

To incorporate the exposure level, we considered a binomial regression model. Let Yi be the total number of orders at S = Si, and let Ni be the exposure at S = Si, so we can say Yi = Binomial(Ni, pi), where pi depends on Si. We would intuitively expect that at a higher s-value, we would receive more orders due to the sort boost from fast-shipping.

However, we found that this is not the case. Consider Figure 5 below.

Figure 5: binomial regression issues
Figure 5: binomial regression issues

In these plots, we sum the total number of orders, total number of product display page hits, and total number of search page impressions. We plot the number of orders divided by the exposure level – product display page hits on the left, and search page impressions on the right, which gives us the rate of product display page hits or impressions that result in orders. We see that as the s-value increases, we see no increase or a decrease in the rate. We looked at specific categories, specific items, and items with 10 or more orders, and they all followed similar trends.

The first explanation we could think of was that the user may experience choice paralysis when offered several items with fast shipping. If you are shopping for a couch and see several with fast shipping, it may be harder to decide than if you only see one couch with fast shipping. Another explanation was that these two rates might behave differently depending on whether the product received the fast-shipping treatment or not. Intrigued by what we found above, we decided to take a deeper look at the relationship between different conversion rates and fast-shipping, and led to a different modeling approach which we present in a later section.

Task 3: Product Characteristics Sensitive to Fast-Shipping

The main goal of this task is to identify key product characteristics that will make the product sales more sensitive to fast-shipping.

  1. Distribution of Product Characteristics

One way to approach the problem above is to first categorize all products into three groups:

  1. the group of products that received at least 10% increase in sales if offered with fast-shipping,
  2. the group of products that received at least 10% drop in sales if offered with fast-shipping, and
  3. the group of products not in the first two groups(i.e. not very sensitive to fast-shipping).

Then for each product characteristic, we visualize the distribution of this product characteristic for each one of the three groups above. A clear separation of distributions would mean that this product characteristic can make a product more sensitive to fast-shipping qualitatively.

Figure 6: Distribution of selected product characteristics
Figure 6: Distribution of selected product characteristics

Figure 6 shows the clear separation of distributions for three product characteristics: review count, impression count (the number of times a product is loaded onto a page) and product price. We can see that it was more likely for a product to receive at least a 10% drop in sales if offered with fast-shipping when it had fewer reviews or impression count, or a lower price. However, this approach cannot quantitatively determine how changes in one product characteristic might affect the final sales multiplier, and we tackle this problem in the next section.

  1. Baseline Model for the Treatment Effect – Linear Regression

In this section, we develop a baseline model with linear regressions to quantify the average change in treatment effect with the variation of each quantitative variable (e.g., the number of reviews, the average ratings, the weight, the previous observations from the 90 days before the experiment, …).

In this model, we sum up the quantity ordered over the different S-values and keep one variable for each regression in order to isolate its effect. To be more precise, the predictor variables for each regression model are the fast shipping binary variable (treatment), the product characteristic to evaluate, and the interaction between the two. We choose as our output variable the quantity ordered.

Figure 7: coefficients of interaction between treatment and normalized product characteristics in linear regression models.
Figure 7: coefficients of interaction between treatment and normalized product characteristics in linear regression models.

Figure 7 displays the coefficient of the interaction term between the treatment and each product characteristic. All product characteristics have previously been normalized. This coefficient can be interpreted as the average increase of quantity ordered if the product characteristic feature increases by one standard deviation. We run simulations with random Gaussian variables to understand the significance of these coefficients. We observe that the characteristics which caused the most change in the average treatment effect are the ones related to popularity (90-day order count, product review count, 90-day product display page click count, …).

The values of the interaction coefficients are small compared with the range of quantity orders, it can be explained by the fact that linear regression gives equal weights to all the products. Therefore as the distribution of orders per product is skewed (most products have low sales), the model does not give lots of credit to the few products which may have received a higher difference of treatment effect. To improve our study, we should consider non-linear models and potentially include cross-interaction between different variables and the treatment effect.

  1. Tree-based Models

Before we define the treatment effect for the tree-based models, we need to first introduce some notations:

  • N: number of orders generated by the 70% of customers in the experience
  • N0: number of orders without the fast-shipping treatment
  • N1: number of orders with the fast-shipping treatment
  • N = N0 + N1

Suppose fast-shipping has no effect at all, then N0 = 49% N, N1 = 51% N

If the experiment runs in the same conditions as the 90-day period before, then N ≈ 70% 41/90 N90, where N90 is the number of orders in the 90 days before the experiment.

With these notations, there are two ways to define the treatment effect:

  1. N1-N0: the absolute increase in terms of the number of orders. This is the most straightforward definition of treatment effect, but we find that this defined variable is highly correlated with the popularity of the products (lots of orders). The results generated by both BART and Causal forest are similar to what the baseline model has already identified: the key characteristics are the ones related to the popularity of the product.
  2. N1/N1-N0/N0: the rescaled percentage of increase in sales. However, the division makes the treatment effect defined in this fashion highly sensitive to noise for products with little sales, as shown below in Figure 8. We find that in the end, the predictions generated by tree-based models trained using this definition of treatment effect are only valid for popular products, and the identified key characteristics are again the ones related to the popularity of the product.

We obtained similar results with BART and Causal Forest on estimating the treatment effect in terms of rescaled percentage of increase in sales. The figure 8 presents the predicted treatment from the Causal Forest model compared to the actual rescale difference in sales along the number of orders in the 90 days prior to the experiment. We observe that the treatment effect is always slightly positive, which lets us think that a negative treatment effect is only noise. We were able to identify that the top predictors for our model were the predictors related to the popularity of a product as shown in Figure 9. The confidence intervals from these models were far from accurate as most observed points did not fall into the 95% margins around their corresponding predictions. Therefore we avoided drawing conclusions from these forest models on the impact of product characteristics on the treatment effect.

Figure 8: Predictions generated by tree-based models. Note the variance of the treatment effect decreases as the 90 day order count(the popularity of the product) increases.
Figure 8: Predictions generated by tree-based models. Note the variance of the treatment effect decreases as the 90 day order count(the popularity of the product) increases.
Figure 9: Key product characteristics identified by tree-based models. Note that the top 3 key characteristics: product display page click count, review count, and average sort rank all correlate with the popularity of the product.
Figure 9: Key product characteristics identified by tree-based models. Note that the top 3 key characteristics: product display page click count, review count, and average sort rank all correlate with the popularity of the product.

Issues

We now summarize the drawbacks of the aggregation-based approach before we address them using the conversion rate-based approach in the next section:

First, as mentioned above, this approach does not take into account the fact that the sort boost provided by fast-shipping could have also increased the exposure of the product and thus the final sales number. We have to model the interaction among fast-shipping, product exposure and sales if we want to quantify only the effect of "pure" fast-shipping.

Second, the filtering threshold values we use before computing the sales multipliers to filter noisy products are somewhat arbitrary. When we look at these estimates on subcategories, some of these estimates are sensitive to the threshold used.

Third, due to a minor flaw in the implementation of this experiment, the treatment is slightly asymmetric: when we aggregate over all s-value the number of customers that has or has not seen the fast-shipping flag for this product, we will find approximately 51%, rather than 50% of customers were shown the fast-shipping flag. We have to factor in this somewhere in our sales multiplier estimates.

Conversion Rate-based Approach

In response to the issues above, after careful consideration, we decide to view the effect of fast-shipping as a composition of the following two effects:

  1. Better impressioncnt to pdpcnt conversion rate: If loaded onto a page, a product is more likely to be clicked to product display detail page due to the fast-shipping badge
  2. Better pdpcnt to numord/qtyord conversion rate: If product display detail page is clicked and loaded, a product is more likely to be ordered due to the offer of fast-shipping

Then for each product we do the following steps to measure the effect of fast-shipping:

  1. Calculate the impressioncnt to pdpcnt conversion rates, respectively with and without fast shipping: x0 and x1
  2. Calculate the pdpcnt to numord/qtyord conversion rates, respectively with and without fast shipping: y0 and y1
  3. The conversion rate multiplier estimate is then x1y1/(x0y0)

This approach addresses the three main issues of the aggregation-based approach: 1) It is in some sense "impression-independent": suppose a product p is loaded t times onto the page. Then without fast shipping it will be sold x0y0t times; with fast shipping it will be sold x1y1t times. The sales multiplier estimate is x1y1t/(x0y0t)=x1y1/(x0y0) independent of the impression count t. 2) The necessity of filtering is greatly reduced. For example, in this framework, we no longer need to worry about the cases where a product only made one or two sales without fast-shipping, yet made tens or hundreds of sales if offered fast-shipping. 3) The aforementioned 51/49 bias is no longer a concern anymore, since it only has an impact on the impression/exposure a product received, and this approach is to some extent "impression-independent".

Task 1: Average Boost in Sales

Figure 10 shows the new per product sales multiplier distribution defined as the conversion rate from impression to quantities ordered. Interestingly, only about 30% of the products will receive sales boost effect if offered fast-shipping under this framework. For these products that receive sales boost effect, the average sales multiplier estimate is 1.46x (i.e. a 46% increase in sales). The average sales multiplier estimate becomes 0.85x (i.e. a 15% decrease in sales) if all products are included.

Figure 10: per product sales multiplier distribution under the new framework
Figure 10: per product sales multiplier distribution under the new framework

To make sense of this number, we also look into the x0, y0, x1, y1 values for each product defined as above, and find that:

  1. For only 40% of the products, customers are more likely to make an order to purchase a product if they are at the product display page and the concerned product offers fast-shipping.
  2. For only 20% of the products, customers are more likely to click on the product display pages if these products are loaded onto the page shown to the customers. The remaining 80% of the products cannot efficiently utilize the sort boost caused by fast-shipping. If these products are shown to more customers at the search page due to the sort boost, less portion of the customers will click on them.

Task 2: Impact of S-value

Figure 11 shows how different conversion rates change as s-value increases from 25% to 75%. Here the conversion rates are averaged across all products. We decide to take the average across all products because for most (product, s-value) pairs, the product display page clicks and quantities sold are 0. As a result, the conversion rates for that (product, s-value) pair are 0.

Figure 11: Conversion rate: (a) from impression to quantities ordered, (b) from impression to product display page clicks, (c) from product display page clicks to quantities ordered
Figure 11: Conversion rate: (a) from impression to quantities ordered, (b) from impression to product display page clicks, (c) from product display page clicks to quantities ordered

Several important observations can be made in Figure 11. First, the conversion rate from impression to quantities ordered without fast-shipping and the conversion rate from product display page clicks to quantities ordered without fast-shipping are both decreasing as s-value increases, as can be seen from Figure 11(a). This makes sense: as customers find more and more products offered with fast-shipping, not having "fast-shipping" definitely makes a product less appealing to customers.

Second, the conversion rate from impression to product display page clicks remains largely unchanged with and without fast-shipping as s-value increases, as can be seen from Figure 11(b), contrary to the conversion rates in Figure 11(c). Since we view the effect of fast-shipping as a composition of the effect of improved conversion rate from impression to product display page clicks, and from product display page clicks to quantities ordered, we can deduct that in this project, the effect of fast-shipping has a much larger impact on the conversion rate from product display page clicks to quantities ordered, than the conversion rate from the impression to product display page clicks (nearly negligible).

Third, the conversion rate from impression to quantities ordered with fast-shipping increases as s-value increases, as can be seen from Figure 11(a). As discussed above, we attribute this increase largely to the increase of the conversion rate from product display page clicks to quantities ordered with fast-shipping. Initially, we expect this conversion rate from impression to quantities ordered with fast-shipping to decrease as s-value increases with the following reasoning: as s-value increases, more choices with fast-shipping are available to customers. As a result, fast-shipping as a bonus point will weigh less in customers’ decision process, which leads to lower conversion rates from the impression to quantities ordered with fast-shipping for any given product. However, in light of what we find above, we come to the realization that when customers pick from a range of online options, product attributes like price, cover photo, review rates are in general factors much more important than fast-shipping. Therefore, our initial reasoning does not hold.

Task 3: Product Characteristics Sensitive to Fast-Shipping

The conversion rates from product display page to order and from impression to order ranges from 1e-6 to 1 even among the 50% most popular products. With such variability, we decide to measure the treatment effect on the logarithm of these two conversion rates. For instance, the treatment effect on the logarithm of conversion rate from the product display page clicks (PDP) to order rate is:

where N⁰ and N¹ are respectively observed quantities without and with the treatment (i.e., without and with fast shipping).

Figure 12 below shows this variable against the logarithm of the PDP count (Product Display Page which reflects the popularity of a product to some extent). Except for the most popular products, the variance of the treatment effect on this variable does not vary as much with the popularity of a product as we have previously seen with the rescaled number of orders as the output variable. The mean of this treatment effect is negative (close to 0.050), hence on average the PDP conversion rate is 9.2% lower when offered with fast shipping than without. This trend can be explained by the fact that PDP count is more sensitive to fast shipping than the number of orders is, i.e., in percentage the PDP count increases more with fast shipping than the number of orders does. Hence the conversion rate on average decreases with fast shipping. We observed similar patterns for the impression to order conversion rate as well, which decreases on average by 32%.

Figure 12: the observed Treatment Effect on the log PDP Conversion Rate v.s. the log of PDP (for each top 50% most popular product)
Figure 12: the observed Treatment Effect on the log PDP Conversion Rate v.s. the log of PDP (for each top 50% most popular product)

We then evaluated multiple models by splitting our filtered dataset into a training and a testing set. We compare the MSE score and the accuracy of the 95% confidence interval accuracy (i.e., the ratio of observed points that are in the confidence interval of the predictions). The results are shown below in Figure 13.

Here Linear DRL, based on the Doubly Robust Learning (DRL) algorithm, is a method that first applies arbitrary Machine Learning models to fit the treatment and response, and then uses a linear model to predict the response residuals from the treatment residuals. We estimate our predicted outcome Y with particular treatment T as:

where T is the one-hot-encoding of the discrete treatment , p(Xi,Wi) is the propensity model calculated by running a classification to predict T from (X,W), g(X,W) is a regression model such that g(X,W) = E[Y |T,X,W], and we can learn the treatment Θ(X) by simply regressing Y(i,1)-Y(i,0). For this task we focus on investigating the following parameters of this model:

  1. model regression : Estimator for E[Y |T,X,W]. Trained by regressing Y on (features, controls, one-hot-encoded treatments) concatenated.
  2. ϕ(X) (Features): Used to create composite features in the final CATE regression

In conclusion, we first train a regression model g(X,W), by running a regression of Y on (X,W), and a propensity model p(Xi,Wi), by running a classification to predict T from (X,W). Then construct the doubly robust random variables as described above and regress them on X.

Figure 13: model performance comparison
Figure 13: model performance comparison

After experimenting with various combinations for Linear DRL, we found that Linear DML with WeightedLassoCV and a second degree of polynomial feature performs best, as shown in Figure 13.

Figure 14: predictions by different models v.s. the observed effect
Figure 14: predictions by different models v.s. the observed effect

Figure 14 shows the prediction distribution of different models compared against the true distribution of the observed treatment effect. We can see all the models’ predictions underestimate the true treatment effect, indicated as a left-shifted center on the graph. Overall, the DRL learner with polynomial features has the widest distribution compared to other models, but the prediction result is still centered around zero. A detailed visualization of the prediction by Linear DRL versus the observed effect is shown in Figure 15. If we investigate the treatment effect coefficient for our model as shown in Figure 16, we find that there is no significant features that are highly predictive of our treatment effect according to the p-values under this framework (must be smaller than 0.05). In addition, among all characteristics, 90 day order count, a product attribute related to product popularity, still seems to be the most important one, albeit not highly predictive of the treatment effect.

Figure 15: predictions by Linear DRL v.s. the observed effect
Figure 15: predictions by Linear DRL v.s. the observed effect
Figure 16: Linear DRL coefficients
Figure 16: Linear DRL coefficients

However, when we continue to filter out unpopular products, our LinearDRL learner has a significant improvement over the distribution and the accuracy of confidence intervals as shown in Figure 17 and Figure 18. Yet the linear model and the coefficients remain largely unchanged. The causal forrest method produces similar results, though it is not performing as well as the Linear DRL method under this framework.

Figure 17: predictions by different models v.s. the observed effect, with the unpopular products filtered out
Figure 17: predictions by different models v.s. the observed effect, with the unpopular products filtered out
Figure 18: predictions by Linear DRL v.s. the observed effect, with the unpopular products filtered out
Figure 18: predictions by Linear DRL v.s. the observed effect, with the unpopular products filtered out

Conclusion

We study the effect of fast-shipping in this project. We start with the aggregation-based approach before we move on to the conversion rate-based approach. With the aggregation-based approach, without taking into account how sort boost could have contributed to the final sales, we find that fast-shipping can bring about a 25% increase in sales, which is probably an over-estimate, and the number of orders and quantities sold scale linearly s-value increases. With the conversion rate-based approach, we find that if we fix the amount of exposure received by a product, fast-shipping can generate 46% more sales for products whose conversion rates do not degrade with fast-shipping. The effect of fast-shipping has a much larger impact on the conversion rate from product display page clicks to quantities ordered, than the conversion rate from the impression to product display page clicks. The conversion rate from impression to quantities ordered with fast-shipping increases as s-value increases. Finally, under both frameworks we find that product attributes related to the popularity of the product are key characteristics that make the product sales more sensitive to fast-shipping.

Acknowledgments

We would like to thank Nathaniel Burbank(Wayfair), Nick Stern(Harvard), and Chris Tanner(Harvard) for their support and guidance throughout this work.

In partnership with Wayfair
In partnership with Wayfair

Poster Link: https://drive.google.com/file/d/15Jqb_gmwF_gK_zQ0lm0REZWzZGAJ-nHA/view?usp=sharing

Short Poster Presentation Video Link: https://drive.google.com/file/d/1lsmx6X2JEzI3ouCF2in2_0IFAstBgjP0/view?usp=sharing

Reference

  1. Chipman, H. et al. "BART: Bayesian Additive Regression Trees." The Annals of Applied Statistics 4 (2010): 266–298.
  2. Wager, S. and S. Athey. "Estimation and Inference of Heterogeneous Treatment Effects using Random Forests." Journal of the American Statistical Association 113 (2015): 1228–1242.
  3. EconML: Python SDK, developed by the ALICE team at MSR New England
  4. GRF: Generalized Random Forest, a pluggable package for forest-based statistical estimation and inference.

Related Articles