In our previous article, we explained how to use PyMC3 to do Bayesian A/B testing for discrete variables. In this article, we will do the same thing for continuous variables.
This article is structured as follows.
- An overview of the BEST method (Bayesian supersedes the t-Test) invented by John Kruschke around 2013.
- To make statistical inferences, we need data! We will explain how to use a Python package called Beautiful Soup to scrape data from the internet. In our case, we will scrape NBA salaries from EPSN.com to do our analysis.
- A concrete example of the BEST method on NBA players’ salaries. More precisely, we will compare the salary of NBA players across different positions using the second section’s data.
As the second section is mainly about getting the relevant data, those who just want to see how BEST works can skip it. The data, which we scraped, is available in this Github repository.
Before we dive into the details, we want to give a small remark about our analysis. From our discussions with Austin Rochford, who is a big fan of the Philadelphia 76ers, we learn that the relationship between salaries and positions should be correlative, not causal. One particular reason is that salaries are entirely not randomized as they depend on past performances. We are thankful to Austin for his expertise feedback on our results.
Part 1: Bayesian supersedes the t-Test
As we explained in the first part, classical t-Test has several drawbacks, especially from a business perspective. To overcome these issues, in 2013, John Kruschke at Indiana University invented a new procedure for A/B testing from a Bayesian perspective.
As we explained in the first part, classical t-Test has several drawbacks, especially from a business perspective. To overcome these issues, in 2013, John Kruschke at Indiana University invented a new procedure for A/B testing from a Bayesian perspective.
In this procedure, the data are described by a Student-t distribution. This choice is arguably more informative than using a normal distribution because it can represent distributions with a tail, which is true in our case.
We need three parameters to describe a Student-t distribution: the mean μ, the standard deviation σ, and the degree of freedom ν. Kruschke used the following prior distributions for these parameters.
- μ follows a normal distribution.
- σ follows a uniform distribution
- ν follows a shifted exponential distribution.
Note that the ν-parameter controls the normality of the data: when ν>30, the Student t distribution is close to a normal distribution. However, if ν is small, Student t-distributions have a heavy tail. This is why we choose the parameter 1/29 in the prior distribution of ν: it is a "cut-off" for normality.
The following graphical representation summarizes the above description.

Part 1.1. Region of practical equivalence.
Region of practical equivalence (ROPE) is the region around the null value containing values that are negligibly different from the null value. The choice of ROPE depends on the particular application. For example, suppose we suspect that by having a shorter description of products, people are more likely to purchase. In that case, we can do an A/B testing to compare this new strategy and the old one (a procedure for how to do so is explained in our previous article). In the Bayesian framework, we are interested in whether the mean difference for average orders for the two versions is reasonably close to 0. If we are very confident that the new strategy would work, we can choose a large ROPE around 0. On the other hand, if we are less confident, we can choose a small ROPE.
For more details on ROPE, see this.&text=In%20other%20words%2C%20it%20checks,null%20region%20(the%20ROPE).).
Part 1.2. Highest Density Intervals.
The definition of the Highest Density Intervals (HDI) is rather technical. Roughly speaking, HDI is the interval that contains points that have higher probability density than points lying outside it. Like confidence intervals, we have the option of choosing how much the HDI covers the entire population. Typically, we can choose an HDI that covers 95% of the distribution.
A more general concept in the Bayesian world, which is directly similar to confidence intervals, is the concept of credible intervals. We will not go into details about this concept; readers are encouraged to see the details here.
Part 1.3. BEST decisions.
One the ROPE and HDI are specified, we can use them to make statistical decisions.
- If ROPE lies outside of HDI, then we reject the null hypothesis.
- If ROPE contains the HDI, then we accept the null hypothesis.
- Otherwise, we withhold our decision.
In the last case, the BEST test says that there is not enough evidence to make a conclusive decision. There are several options for us to do in this case. For example, we can change the ROPE and/or HDI. More practically, we can obtain more samples.
Part 2: Web-scraped NBA’s salary data
In this section, we explain how to use the Python Package Beautiful Soup to get data from the internet. We will use the ESPN website to obtained salaries of NBA players during the 2019–2020 season. First, we need to import several packages (for this section, the relevant ones are BeautifulSoup, Tabulate, and requests.)
import pandas as pd
import requests
from bs4 import BeautifulSoup
from tabulate import tabulate
import numpy as np
import Pymc3 as pm
import scipy.stats as stats
import matplotlib.pyplot as plt
%matplotlib inline
from IPython.core.pylabtools import figsize
import theano.tensor as tt
from scipy.stats.mstats import mquantiles
We use the following function to get the raw data.
def get_raw_data(link):
"""
Given a link, get the raw data from ESPN.com
"""
#get the raw data from the website
res = requests.get(link)
soup = BeautifulSoup(res.content,'lxml')
table = soup.find_all('table')[0]
df= pd.read_html(str(table))[0]
#remove irrelevant rows
df=df.drop(df[df[1]=='NAME'].index)
#change the name of the columns
df.columns=['RK', 'Name', 'Team', 'Salary']
#Make the RK column be the index
df.set_index('RK', inplace=True)
return df
For example, let us apply this function to this link.
link="http://www.espn.com/nba/salaries/_/seasontype/3"
df1=get_raw_data(link)
df1.head()
The result is

This is not a complete list of all players and their salaries. There are 14 other pages. Fortunately, there is a pattern when the page is between 2–14. We can use this fact to get the data quickly.
df =df1
for i in range(2,15):
#get the data for page i
df_i=get_raw_data(link= f"http://www.espn.com/nba/salaries/_/page/{i}/seasontype/3")
#combine all the data
df=pd.concat([df, df_i], axis=0)
Next, we need to clean this data. Firstly, we have to get the positions of NBA players from this data. As we can see, this information is contained in the name column. We can use the following function to extract it.
def get_position(text):
return text.split(', ')[-1]
Secondly, the salaries have dollar symbols; we should convert them into the float data type. To avoid working with big numbers, we will use a million dollars as the unit. The following simple lines of code do these cleaning.
df['Position']=df['Name'].apply(get_position)
df['Salary']=(df['Salary'].apply(convert_string_to_float))/10**6
df.head()

We can save the data to a CSV file for future uses.
df.to_csv('NBA_salary')
Part 3: NBA salaries across different positions
We will use the data that we obtained in the second step to do our analysis. We will take a visualization approach to help readers get an intuitive understanding of our analysis.
First of all, let us get all the positions in this dataset.
figsize(12.5,4)
positions=df['Position'].value_counts()
positions.sort_values(inplace=True)
bar=sns.barplot(x=positions.index, y=positions)
plt.xlabel("Position")
plt.ylabel("Number of players")
plt.title("Number of players by positions")
plt.show()

There are seven different positions:
- Shooting Guard (SG)
- SF (Small Forward)
- PF (Power Forward)
- C (Center)
- PG (Point Guard)
- G( Guard)
- F (Forward)
Amongst those, SF and SG have the most number of players. The sample sizes for Foward and Guard positions are significantly smaller than other positions.
Let us compare the salary’s mean for each type of position.
figsize(12.5,4)
salary=df.groupby('Position')['Salary'].mean()
salary.sort_values(inplace=True)
bar=sns.barplot(x=salary.index, y=salary)
plt.xlabel("Position")
plt.ylabel("Salary, million dollar")
plt.title("Average salary across different positions")
plt.show()

We see that, on average, Point Guard and Center players earn the most. On the other hand, Small Guard players earn the least. We can also use boxplots to get a broader picture.
figsize(16,6)
my_order = df.groupby("Position")['Salary'].median().sort_values(ascending=False).index
sns.boxplot(data=df, x='Position', y='Salary', order=my_order)
plt.title("Salary for different positions")
plt.show()

For all positions, the salaries have tail distributions. Furthermore, salary distributions for PG and Center are quite similar. Overall, it seems that SG earns the least.
Part 3.1: Shooting Guard and Point Guard
In the first part, we compare the salaries for the Shooting Guard and Point Guard positions. As the above plots show, we expect that, on average, PG players earn more than SG players.
First, we filter out the data for the SG and PG positions. We also compute the pooled sample mean and pooled sample variance, as discussed in [1].
SG=df[df['Position']=='SG']['Salary']
PG=df[df['Position']=='PG']['Salary']
pooled_mean=np.r_[SG, PG].mean()
pooled_std=np.r_[SG, PG].std()
variance=2*pooled_std
We set prior distributions for the mean, variance, and the degree of freedom variables.
with pm.Model() as model_1:
mu_A=pm.Normal("mu_A", pooled_mean, sd=variance)
mu_B=pm.Normal("mu_B", pooled_mean, sd=variance)
std_A=pm.Uniform("std_A", 1/100, 100)
std_B=pm.Uniform("std_B", 1/100, 100)
nu_minus_1=pm.Exponential("nu-1", 1.0/29)
Next, we fit the data into our model.
with model_1:
obs_A=pm.StudentT("obs_A", mu=mu_A, lam=1.0/std_A**2, nu=nu_minus_1+1, observed=SG)
obs_B=pm.StudentT("obs_B", mu=mu_B, lam=1.0/std_B**2, nu=nu_minus_1+1, observed=PG)
start=pm.find_MAP()
step=pm.Metropolis(vars=[mu_A, mu_B, std_A, std_B, nu_minus_1])
trace_1=pm.sample(25000, step=step)
burned_trace_1=trace_1[10000:]
Once this is done, we have samples for the posterior distributions of our variables. We can then plot them to compare the salaries of SG and PG players.
First, we plot the mean salaries for these two positions.
figsize(12.5, 4)
SG_mean=burned_trace_1['mu_A']
PG_mean=burned_trace_1['mu_B']
plt.hist(SG_mean, bins=40, label=r'Posterior distribution of $mu_{SG}$')
plt.hist(PG_mean, bins=40, label=r'Posterior distribution of $mu_{PG}$')
plt.title('Posterior distributions of average salaries for SG and PG')
plt.legend()
plt.show()

This plot shows that PG players earn more than SG players on average. What about the salary fluctuations for these positions?
figsize(12.5, 4)
SG_std=burned_trace_1['std_A']
PG_std=burned_trace_1['std_B']
plt.hist(SG_std, bins=40, label=r'Posterior distribution of $sigma_{SG}$')
plt.hist(PG_std, bins=40, label=r'Posterior distribution of $sigma_{PG}$')
plt.title('Posterior distributions of standard derivation derived from PyMC3')
plt.legend()
plt.show()

For the PG position, even though the average salary is higher, it also fluctuates more.
Let us apply the best BEST for ROPE=[-0.1, 0.1] and 95% HDI. This is best done by the following visualization.
figsize(12.5, 4)
difference=PG_mean-SG_mean #difference of the means
hdi=pm.stats.hpd(difference, hdi_prob=0.95) #the 95% HDI interval of the difference
rope=[-0.1, 0.1] #the ROPE region
plt.hist(difference, bins=50, density=True, label='Differene of the mean salaries')
plt.title('Posterior distribution of the the difference of the means')
plt.vlines(hdi[0], 0,0.6, linestyle='--', color='red', label='HDI')
plt.vlines(hdi[1], 0, 0.6, linestyle='--', color='red')
plt.vlines(rope[0], 0, 0.6, linestyle='--', color='black', label='ROPE')
plt.vlines(rope[1], 0, 0.6, linestyle='--', color='black')
plt.title('Posterior distribution of the the difference of the mean salaries for PG and SG')
plt.legend(loc='upper right')
plt.show()

As the ROPE is entirely outside the HDI interval, we reject the null hypothesis: we are confident that, on average, PG players earn more than SG players. We note that the HDI, in this case, is
[0.5580346 , 3.57072747]
Therefore, even if we take a wider ROPE such as [-0.5, 0.5], we can still reject the null hypothesis. Furthermore, as discussed in the previous article, the Bayesian perspective allows us to answer more complicated questions. For example, how confident are we in saying that a PG player earns 50% more than an SG player?
figsize(12.5, 4)
rel_difference=100*(PG_mean-SG_mean)/SG_mean #difference of the means
prob=len(rel_difference[rel_difference>50])/len(rel_difference)
plt.hist(rel_difference, bins=50, density=True, label='Relative differene of the mean salaries')
plt.title('Posterior distribution of the the relative difference of the means')
plt.vlines(50, 0,0.011, linestyle='--', color='red', label='HDI')
print(f"The probability that a PG player earn 50% more than an SG player is: {prob}")
plt.show()
The probability that a PG player earn 50% more than an SG player is: 0.8830833333333333

We are pretty confident that PG players earn 50% more than SG players.
Part 3.2: Small Forward and Power Forward
We can use the framework in the previous section to do a similar analysis.
SF=df[df['Position']=='SF']['Salary']
PF=df[df['Position']=='PF']['Salary']
pooled_mean=np.r_[SF, PF].mean()
pooled_std=np.r_[SF, PF].std()
variance=2*pooled_std
with pm.Model() as model_2:
mu_A=pm.Normal("mu_A", pooled_mean, sd=variance)
mu_B=pm.Normal("mu_B", pooled_mean, sd=variance)
std_A=pm.Uniform("std_A", 1/100, 100)
std_B=pm.Uniform("std_B", 1/100, 100)
nu_minus_1=pm.Exponential("nu-1", 1.0/29)
obs_A=pm.StudentT("obs_A", mu=mu_A, lam=1.0/std_A**2, nu=nu_minus_1+1, observed=SF)
obs_B=pm.StudentT("obs_B", mu=mu_B, lam=1.0/std_B**2, nu=nu_minus_1+1, observed=PF)
start=pm.find_MAP()
step=pm.Metropolis(vars=[mu_A, mu_B, std_A, std_B, nu_minus_1])
trace_2=pm.sample(25000, step=step)
burned_trace_2=trace_2[10000:]
We plot the histograms for the posterior distributions of the mean salaries for SF and PF positions.

Unlike the previous comparison, we see that the two posterior distributions overlap more. We can study the difference between these two distributions. To start, we use ROPE=[-0.1, 0.1] and the 95% HDI interval.

We do not have enough evidence to either reject or accept the null hypothesis with these ROPE and HDI choices by the decision rules mentioned above. Let us take a closer look at the HDI.
array([-0.44822444, 1.34450148])
We see if we stick with this HDI, then unless we choose a very wide ROPE (and hence, accept the null hypothesis), our data is not enough to make a conclusive decision.
Part 3.3: Center vs Power Forward
By similar lines of codes, we can compare the salaries for center and power forward positions. We will plot the difference in mean salaries with respect to several choices and HDI and ROPE.


We see that if we choose ROPE=[-0.1, 0.1] and the 95% HDI interval, our data cannot make a final decision. However, suppose we lower the HDI to 80%. In that case, we are confident that center players earn more, on average than forward power players.
Conclusions
We hope that there are several things that readers could learn from this article. First of all, BEST provides a general framework to do Bayesian A/B testing, which customizes clients’ business approaches. Second of all, BEST can be conveniently implemented with PyMC3. Last but not least, for the NBA, we are pretty confident that Point Guard players earn more than Shooting Guard players on average. We are less confident in saying that Center players make more than Power Forward players. However, we do not have enough evidence to distinguish Small Foward and Power Forward Players’ salaries.
References
[1] John K. Kruschke, Bayesian Estimation Supersedes the t-Test
[2] PyMC3 tutorial on BEST, https://docs.pymc.io/notebooks/BEST.html