Analytics to measure user experience
A/B testing is a method used to test whether the response rate is different for two variants of the same feature. For instance, you may want to test whether a specific change to your website like moving the shopping cart button to the top right hand corner of your web page instead of on the right hand panel changes the number of people that click on the shopping cart and buy a product.
A/B testing is also called split testing where two variants of the same web page are shown to different samples from your population of visitors to the website at the same time. Then, the number of conversions are compared for the two variants. Generally, the variant that gives a higher proportion of variants is the winning variant.
However, as this is a data science blog, we want to ensure that the difference in proportion of conversions for the two variants is statistically significant. We may also want to understand what attributes of the visitors is driving those conversions. So, let’s move on to your data problem.
The Data Problem
- An A/B test was recently run and the Product Manager of your company wants to know whether the new variant of the web page resulted in more conversions. Make a recommendation to your Product Manager based on your analysis
- The CRM Manager is interested in knowing how accurately we can predict whether users are likely to engage with our emails based on the attributes we collected about the users when they first visit the website. Report back to the CRM Manager on your findings.
The Dataset
Four datasets are provided.
- Visits contains data from 10,000 unique users and has the following columns:
- user_id: unique identifier for the user
- visit_time: timestamp indicating date and time of visit to website
- channel: marketing channel that prompted the user to visit the website
- age: user’s age at time of visiting website
- gender: user’s gender
- Email engagement contains data on those users that engaged with a recent email campaign. The file contains the following columns:
- user_id: unique identifier for the user
- clicked_on_email: flag to indicate that the user engaged with the email where 1 indicates that the user clicked on the email
- Variations contains data indicating which of the variations each user saw of the A/B test. The file has the following columns:
- user_id: unique identifier for the user
- variation: variation (control or treatment) that the user saw
- Test conversions contains data on those users that converted as a result of the A/B test. The file contains the following columns:
- user_id: unique identifier for the user
- converted: flag to indicate that the user converted (1 for converted
Importing the dataset and cleaning
I always start by first combining the files using a primary key or a unique identifier. I then decide what to do with the data. I find this approach useful as I can get rid of what I don’t need later. It also helps me view the dataset on a holistic level.
In this instance, our unique identifier is user_id. After merging the files using the following code,
merge_1<-merge(variations_df,visits_df,by.x="user_id",by.y="user_id")
merge_2<-merge(merge_1,test_conv_df,by.x="user_id",by.y="user_id",all.x=TRUE)
merge_3<-merge(merge_2,eng_df,by.x="user_id",by.y="user_id",all.x=TRUE)
I discovered that I had to create my own binary variable for whether or not a user converted and whether or not they had clicked on an email. This was based on their user ID not being found in the test_conversions.csv and email_engagement.csv files. I did this by replacing all "NA"s with 0’s.
merge_3$converted<-if_else(is.na(merge_3$converted),0,1)
merge_3$clicked_on_email<-if_else(is.na(merge_3$clicked_on_email),0,1)
merge_3$converted<-as.factor(merge_3$converted)
merge_3$clicked_on_email<-as.factor(merge_3$clicked_on_email)
The next task was to convert variables like visit time into information that would provide meaningful information on the users.
merge_3$timeofday<- mapvalues(hour(merge_3$visit_time),from=c(0:23),
to=c(rep("night",times=5), rep("morning",times=6),rep("afternoon",times=5),rep("night", times=8)))
merge_3$timeofday<-as.factor(merge_3$timeofday)
Now, that the data had been cleaned it was time to explore the data to understand whether there was an association between user conversion and the variation they visited on the website.
Data Exploration and Visualization
The simplest aspect of the data to check for is to determine whether there is indeed a difference in the proportion of users that converted based on the type of variation they viewed. Running the code provided at the end of the blog post gives the following graph and proportions:
control : 0.20 treatment : 0.24

Statistical testing for significance of A/B Testing
To test whether the difference in proportions is statistically significant, we can either carry out a difference in proportions test or a chi-squared test of independence where the null hypothesis is that there is no association between whether or not a user converted and the type of variation they visited.
For both tests, a p-value < 0.05 was observed indicating a statistically significant difference in proportions.
I went a step further and ran logistic regression to understand how the other attributes of the users contributed to the difference in proportions. Only the type of variation and income (p-values less than 0.05) appeared to contribute to the difference in conversion proportions. A calculation of McFadden’s R-squared tells us that only 12.94% of the variation in proportions can be explained by the variation type and user attributes provided within our dataset. Hence, my response to the Product Manager would be as follows:
There is a statistically significant difference in conversion rates for those that visited the treatment variation vs the control variation. However, it is difficult to understand why this is the case. It would be best to repeat this test 2–3 more times to cross-validate results.
Exploratory Data Analysis to understand drivers of user engagement with emails
Barplots were produced to check for a visual relationship between user attributes and whether or not they clicked on an email.



While running the exploratory data analysis, I noticed that the age was missing for 1,243 users. These users were omitted from analysis as I cannot impute their ages without any knowledge. Boxplots and numerical summaries were produced to understand any difference in average age of users that clicked on emails.
It was found that those that clicked on emails ("1") on average had higher income than those that didn’t. However, both groups have very high standard deviations, thus income does not appear to be a useful indicator.
Using statistical modelling for significance testing
The dataset was randomly split into training (70%) and test (30%) sets for modelling. Logistic regression was run to determine which attributes had a statistically significant contribution in explaining whether users clicked or did not click on an email.
The model was trained on the training set and predictions were carried out on the test set for accuracy. An ROC curve was generated by plotting the true positive rate (TPR) against the false positive rate (FPR) at various threshold settings. The AUC is the area under the ROC curve. As a rule of thumb, a model with good predictive ability should have an AUC closer to 1 (1 is ideal) than to 0.5. In our example, we have an AUC of 0.84, showing pretty good accuracy.

Though the score is good, it would be good to carry out some form of cross-validation to validate the results further and ensure reproducibility.
A summary of the logistic regression model confirms what we saw visually that the top predictors of the likelihood of a user clicking on an email are:
- channel
- age
- gender
My response to the CRM Manager would be that the top predictors of email conversion are age (older users are more likely to click), channel (PPC being popular amongst users that click) and gender (males are more likely to click than females). However, I would like to validate these results via a larger sample to allow for cross-validation.
Final Thoughts
Hopefully, this blog post has demystified A/B testing to some extent, given you some ways to test for statistical significance and shown you how exploratory data analysis and statistical testing work together to validate results.
Please note that a very small sample size was used in this example (around 4000 users) and as such it did not make sense to run and train a complex machine learning algorithm.
I would love your feedback and suggestions and all useful code is provided below and on github for download. 🙂
https://gist.github.com/shedoesdatascience/de3c5d3c2c88132339347c7da838a126