How to create a Binomial distribution graph using Plotly, Python

Muhammad Tariq
Towards Data Science
5 min readApr 26, 2020

--

Binomial Distribution along with gradient at any given slope

Sometimes, Python graphs are necessary elements of your argument or the data case you are trying to build. This tutorial is about creating a binomial or normal distribution graph. We would start by declaring an array of numbers that are binomially distributed. We can do this by simply importing binom from scipy.stats.

from scipy.stats import binom
n = 1024
size = 1000
prob = 0.1
y = binom.rvs(n, prob, size=size)

This chunk of code generating a thousand numbers with 1024 turns with 0.1 as the probability of success. Once done with this, next we want to count the frequency of numbers in the array. We can do this by making this a data frame and create frequency bins using logic below.

import numpy as np
import pandas as pd
# Creating X array for numbers between the maximum and minimum values of y and making it a dataframe
x = np.arange(y.min(), y.max())
xx = pd.DataFrame(x)
# Making y a dataframe and generating an empty array yy
d = pd.DataFrame(y, columns = ['Data'])
yy = []
# Calculating frequency of all numbers between maxiumum and minimum values
for k in range(len(x)):
yy.append(d[d['Data'] == x[k]].count()[0])
# Making frequency data frame and concatenating it with the xx
freq = pd.DataFrame(yy, columns=['Frequency'])
data = pd.concat([xx, freq], axis=1)
data.columns = ['Score', 'Frequency']

Now we have the score and frequency bins. We can use this data to generate a binomial graph using plotly.graph.

import plotly.graph_objects as go
fig = go.Figure(
# Loading the data into the figur
data=[go.Scatter(x=data['Score'], y=data['Frequency'],
mode="lines",
line=dict(width=2, color="blue"))],
# Setting the layout of the graph
layout=go.Layout(
xaxis=dict(range=[y.min(), y.max()], autorange=False),
yaxis=dict(range=[data['Frequency'].min(), data['Frequency'].max()], autorange=False),
title="Binomial Curve",
updatemenus=[dict(
type="buttons",
buttons=[dict(label="Play",
method="animate",
args=[None])])]
))
fig.show()

The following graph is generated using the code above.

As we can see, the curve demonstrates a basic binomial behavior with a lot of noise, random hovering above and below the expected paths. But it can be easily turned into a complete binomial graph by repeating this procedure a lot of times and averaging out the results. In my case, I performed the above-mentioned step 1500 times. Please check out the code below:

n = 1024
size = 1000
prob = p / 10
x = np.arange(70, 135)
yy = []
tt = []
# Repeating the step 1500 times, rest code is same as above
for a in range(1500):
y = binom.rvs(n, prob, size=size)
d = pd.DataFrame(y, columns = ['Data'])

for k in range(len(x)):
yy.append(d[d['Data'] == x[k]].count()[0])

tt.append(yy)
yy = []
y = []
kk = pd.DataFrame(tt).T
y = kk.mean(axis=1)
N = len(y)

The above code generates a new array “y”, which has averaged frequencies of all the scores in the data. We can generate a new data frame and see the data using the code below:

data = pd.DataFrame([x,y]).T
data.columns = ['Score', 'Frequency']
data.head()

plotting this data again using the plotly code, we get a good looking binomial graph.

With the above graph in our hand, we can play with the data a little more, like we can animate the gradient of the graph, or its derivative at every point using plotly as well.

Derivative at any point can be calculated numerically using the formula shown below.

We can implement this formula using pandas to calculate the value of gradient at all relevant points.

# Declaring an empty array
deri = []
# Setting first derivative to zero
fir = 0
deri.append(fir)
# Calculating the derivative all points other than first and last points
for a in range(1,64):
diff = (data['Frequency'][a+1] - data['Frequency'][a-1])/2
deri.append(diff)
# Setting last derivative to zero
end = 0
deri.append(end)
der = pd.DataFrame(deri, columns = ['Derivatives'])
data = pd.concat([data, der], axis = 1)

Please note that we have deliberately kept zero as a value of the first and last points in the data. This is done since the derivative formula requires preceding and proceeding values. The preceding value is missing for the first value and the proceeding value is missing for the last value. Henceforth both are kept to zero for convenience.

Now that we have derivatives, we need to calculate the starting and ending coordinates of the gradient line we need to animate on plotly.

sx = []
sy = []
ex = []
ey = []
Gap = 3.5
for b in range(0,65):
#Computing Start Coordinates
ssx =data['Score'][b] - Gap
sx.append(ssx)
ssy = data['Frequency'][b] - Gap * data['Derivatives'][b]
sy.append(ssy)
#Computing End Coordinates
eex = data['Score'][b] + Gap
ex.append(eex)
eey = data['Frequency'][b] + Gap * data['Derivatives'][b]
ey.append(eey)
cord = pd.DataFrame([sx, sy, ex, ey]).T
cord.columns = ['XStart', 'YStart', 'XEnd', 'YEnd']

Now that we are done, we can visualize the resulting animations.

Binomial Distribution along with gradient at any given slope

Further, If we want to mark regions onto this figure and add text to that, we can easily do this using the snippet below.

fig.add_trace(go.Scatter(x=[70,85,85,70], y=[0,0,50,50], fill='toself', mode='lines', line_color='#FF5A5F', opacity = 0.3))
fig.add_trace(go.Scatter(x=[85,102,102,85], y=[0,0,50,50], fill='toself', mode='lines', line_color='#C81D25', opacity = 0.3))
fig.add_trace(go.Scatter(x=[102,119,119,102], y=[0,0,50,50], fill='toself', mode='lines', line_color='#0B3954', opacity = 0.3))
fig.add_trace(go.Scatter(x=[119,135,135,119], y=[0,0,50,50], fill='toself', mode='lines', line_color='#087E8B', opacity = 0.3))
fig.add_trace(go.Scatter(
x=[77.5, 93.5, 110.5, 127],
y=[40, 40, 40, 40],
mode="text",
name="Regions",
text=["Low Risk", "High Risk", "Stabilization", "Recovery"],
textposition="top center",
textfont=dict(
family="sans serif",
size=20,
color="black"
)
))
fig.show()

We end up the following figure.

If you are still curious regarding what I’m up to, please check out my next blog here, where all these techniques are used to visualize “Global Status of Covid-19". Stay tuned, cheers.

A full Jupyter notebook implementation of Binomial Curve can be found here.

--

--

Data enthusiast transforming raw data into actionable insights. Skilled in Python, SQL, ETL, and Data Pipelines.