As the sun began to dim and the city lights came to life, the inevitability of a late night in the office settled in. I found myself in a race against time. A crucial sales presentation was looming less than a day away, and success hinged on an unfulfilled request – an insightful Data visualization that could succinctly convey the underlying methodology of our new TV attribution model.
I needed something complex enough to show off, but simple enough to convey most of the information without explanation. Not an easy task with a ticking clock. Thankfully, a beacon of artificial intelligence, my trusty new assistant – ChatGPT – sat ready to simplify the task at hand. Its main mission: To help me translate data that was no more than a concept in my head into an eloquent visual narrative, capable of captivating even the most data-averse minds in the room.
This might be a little high drama, but I was in a bit of a crunch. For a sales deck, this needed an overly simple example which I couldn’t readily find in our database, so made-up example data it was.
I knew in my head exactly what I wanted to display – a web traffic baseline, a spike in web responses following a TV ad, and some way to convey in little to no words that in this example we would attribute the difference between the baseline and the spike.
Matplotlib charts are solid in an academic paper but can require an overly deft hand to produce something both complex and truly flashy. I often turn to Power BI for the out-of-the-box looks and quick charts, but it isn’t ideal to work with when you need to formulate something from scratch with no existing data.
I’d been working with Plotly for a while, which seemed like a happy medium. However, I wasn’t confident enough with the package to put my thoughts into code without spending precious minutes combing through documentation and Stack Overflow. So, with time short, I started my first prompt.
I’m creating a visualization to display web traffic response following a TV spot airing. Let’s start by representing web traffic as a per minute count of site visits over the course of 30 minutes. The first ten minutes need to be a random number between 0 and 4. The second ten minute period needs to be random numbers between 4 and 20, with the numbers distributed in what looks like a right skewed histogram. The third ten minute period is the same as the first ten minute period. Please create that data and chart it for me using a line graph in Plotly.
The prompt returned the following, with my only edits in the update_layout function:
import numpy as np
import plotly.graph_objects as go
# Set seed for reproducibility
np.random.seed(42)
# Create arrays with the specified distributions
first_ten = np.random.uniform(low=0, high=4, size=10)
last_ten = np.random.uniform(low=0, high=4, size=10)
middle_ten = np.random.gamma(shape=2, scale=2, size=10) + 4 # shift the gamma distribution
# Combine the arrays
measurements = np.concatenate((first_ten, middle_ten, last_ten))
# Create x values
x_values = np.arange(1, 31)
# Create the line chart
fig = go.Figure(
data=go.Scatter(
x=x_values,
y=measurements,
mode='lines+markers')
)
fig.update_layout(
title='Ad Influence on Web Traffic',
xaxis_title='Time',
yaxis_title='Visits per Minute',
width=1000,
height=600
)
fig.show()
I was already impressed here as I wasn’t expecting the right-skewed histogram lookalike request to be handled so well. Maybe I’d even be finished before the after-work session down the block was over.
Next, I needed something to represent the web traffic baseline. In reality, this could be as simple as a rolling median, but asking for that here won’t work with the limited length of data. Instead, I figured the median of the first and third sets of ten observations would probably convey the right idea.
Please calculate the combined median of the two arrays first_ten and last_ten. Plot the calculated median as a constant horizontal line.
import numpy as np
import plotly.graph_objects as go
# Set seed for reproducibility
np.random.seed(42)
# Create arrays with the specified distributions
first_ten = np.random.uniform(low=0, high=4, size=10)
last_ten = np.random.uniform(low=0, high=4, size=10)
middle_ten = np.random.gamma(shape=2, scale=2, size=10) + 4 # shift the gamma distribution
# Combine the arrays
measurements = np.concatenate((first_ten, middle_ten, last_ten))
# Create x values
x_values = np.arange(1, 31)
# Calculate combined median
combined_median = np.median(np.concatenate((first_ten, last_ten)))
# Create the line chart
fig = go.Figure()
# Create the line chart
fig = go.Figure(
data=go.Scatter(
x=x_values,
y=measurements,
mode='lines+markers')
)
# Add median line
fig.add_trace(
go.Scatter(
x=x_values,
y=[combined_median]*len(x_values),
mode='lines',
name='Baseline',
line=dict(color='firebrick', width=2, dash='dash'))
)
fig.update_layout(
title='Ad Influence on Web Traffic',
xaxis_title='Time',
yaxis_title='Visits per Minute',
width=1000,
height=600
)
fig.show()
I then just changed the name of the median line trace from "Median" to "Baseline".
Now the idea of "lift" comes through if you’re familiar with marketing models, however, for the untrained eye this needs some more focused elements; the less that needs explaining, the more effective a visual is. The main things missing at this point were a chart element that implies causality of the spike in web traffic, and a way to convey the volume of traffic attributed to the ad.
I started with the causality element.
At minute 10 in the data, a TV spot airs and drives a spike in web traffic. What kind of chart element would you suggest to convey not just that the ad ran that minute, but that the ad is the causal factor behind the spike?
The response suggested a vertical line and annotation and provided the code update.
import numpy as np
import plotly.graph_objects as go
# Set seed for reproducibility
np.random.seed(42)
# Create arrays with the specified distributions
first_ten = np.random.uniform(low=0, high=4, size=10)
last_ten = np.random.uniform(low=0, high=4, size=10)
middle_ten = np.random.gamma(shape=2, scale=2, size=10) + 4 # shift the gamma distribution
# Combine the arrays
measurements = np.concatenate((first_ten, middle_ten, last_ten))
# Create x values
x_values = np.arange(1, 31)
# Calculate combined median
combined_median = np.median(np.concatenate((first_ten, last_ten)))
# Create the line chart
fig = go.Figure()
# Add scatter plot
fig.add_trace(
go.Scatter(
x=x_values,
y=measurements,
mode='lines+markers',
name='Web Traffic')
)
# Add median line
fig.add_trace(
go.Scatter(
x=x_values,
y=[combined_median]*len(x_values),
mode='lines',
name='Baseline',
line=dict(color='firebrick', width=2, dash='dash'))
)
# Add vertical line at minute 10
fig.add_shape(
type='line',
x0=10, y0=0,
x1=10, y1=max(measurements),
line=dict(
color="Purple",
width=3,
dash="dot",
)
)
# Add annotation to indicate TV spot
fig.add_annotation(
x=10,
y=max(measurements),
text="TV ad aired",
showarrow=True,
arrowhead=1,
ax=-100,
ay=-25,
font=dict(
size=16,
color="Black"
),
)
fig.update_layout(
title='Ad Influence on Web Traffic',
xaxis_title='Time',
yaxis_title='Visits per Minute',
width=1000,
height=600
)
fig.show()
I changed only the font size of the annotation.
Now we had a clearly defined cause and point of occurrence for the spike. The chart still needed something to denote the volume of traffic attributed. I already knew what I wanted for this – to shade the area between the baseline and web traffic for the duration of the "lift window" (ten minutes). I went back to my AI colleague to implement this.
I want to display the volume of web traffic above baseline that is attributed to the ad. Please add green shading between the baseline line and web traffic per minute line. Only shade the area between the 10th and 20th minute of the data. Also add a legend item for the shaded area.
import numpy as np
import plotly.graph_objects as go
# Set seed for reproducibility
np.random.seed(42)
# Create arrays with the specified distributions
first_ten = np.random.uniform(low=0, high=4, size=10)
last_ten = np.random.uniform(low=0, high=4, size=10)
middle_ten = np.random.gamma(shape=2, scale=2, size=10) + 4 # shift the gamma distribution
# Combine the arrays
measurements = np.concatenate((first_ten, middle_ten, last_ten))
# Create x values
x_values = np.arange(1, 31)
# Calculate combined median
combined_median = np.median(np.concatenate((first_ten, last_ten)))
# Create the line chart
fig = go.Figure()
# Add scatter plot
fig.add_trace(
go.Scatter(
x=x_values,
y=measurements,
mode='lines+markers',
name='Web Traffic')
)
# Add median line
fig.add_trace(
go.Scatter(
x=x_values,
y=[combined_median]*len(x_values),
mode='lines',
name='Baseline',
line=dict(color='firebrick', width=2, dash='dash'))
)
# Add shaded region for the influence of the ad
fig.add_trace(
go.Scatter(
x=list(x_values[9:20]) + list(x_values[9:20])[::-1],
y=list([combined_median]*11) + list(measurements[9:20])[::-1],
fill='toself',
fillcolor='rgba(0,200,0,0.2)',
line=dict(width=0),
hoverinfo="skip",
showlegend=True,
name='Ad Impact')
)
# Add vertical line at minute 10
fig.add_shape(
type='line',
x0=10, y0=0,
x1=10, y1=max(measurements),
line=dict(
color="Purple",
width=3,
dash="dot",
)
)
# Add annotation to indicate TV spot
fig.add_annotation(
x=10,
y=max(measurements),
text="TV ad aired",
showarrow=True,
arrowhead=1,
ax=-100,
ay=-25,
font=dict(
size=16,
color="Black"
),
)
fig.update_layout(
title='Ad Influence on Web Traffic',
xaxis_title='Time',
yaxis_title='Visits per Minute',
width=1000,
height=600
)
fig.show()
I felt like this was getting to a good place, but I still wasn’t happy with the "clarity" of what I was trying to convey. I made some additions based on the code I’d already been given.
I added a constant vertical line and annotation to denote the lift window cutoff, and changed the colors of the ad aired and window cutoff lines to green and red, respectively. I also changed the baseline constant line to purple and styled it as dotted to better match the other constant lines.
I then wanted one final element – shading for the baseline region. Wanting to tap the immaculate performance so far, I gave this one to my assistant.
Please add purple shading to the entire baseline region. Add an annotation saying "Baseline" at minute 15, underneath the baseline constant horizontal line.
Note here that I haven’t referenced any coordinates or specific numbers for the location of the shading.
import numpy as np
import plotly.graph_objects as go
# Set seed for reproducibility
np.random.seed(42)
# Create arrays with the specified distributions
first_ten = np.random.uniform(low=0, high=4, size=10)
last_ten = np.random.uniform(low=0, high=4, size=10)
middle_ten = np.random.gamma(shape=2, scale=2, size=10) + 4 # shift the gamma distribution
# Combine the arrays
measurements = np.concatenate((first_ten, middle_ten, last_ten))
# Create x values
x_values = np.arange(1, 31)
# Calculate combined median
combined_median = np.median(np.concatenate((first_ten, last_ten)))
# Create the line chart
fig = go.Figure()
# Add baseline region
fig.add_trace(
go.Scatter(
x=list(x_values) + list(x_values)[::-1],
y=[combined_median]*len(x_values) + [0]*len(x_values),
fill='toself',
fillcolor='rgba(128, 0, 128, 0.2)',
line=dict(width=0),
hoverinfo="skip",
showlegend=True,
name='Baseline')
)
# Add scatter plot
fig.add_trace(
go.Scatter(
x=x_values,
y=measurements,
mode='lines+markers',
name='Web Traffic',
line=dict(color='#636EFA'))
)
# Add median line
fig.add_trace(
go.Scatter(
x=x_values,
y=[combined_median]*len(x_values),
mode='lines',
name='Median',
line=dict(color='Purple', width=2, dash='dash'))
)
# Add shaded region for the influence of the ad
fig.add_trace(
go.Scatter(
x=list(x_values[9:20]) + list(x_values[9:20])[::-1],
y=list([combined_median]*11) + list(measurements[9:20])[::-1],
fill='toself',
fillcolor='rgba(0,200,0,0.2)',
line=dict(width=0),
hoverinfo="skip",
showlegend=True,
name='Ad Impact')
)
# Add vertical line at minute 10
fig.add_shape(
type='line',
x0=10, y0=0,
x1=10, y1=max(measurements),
line=dict(
color="Green",
width=3,
dash="dot",
)
)
# Add annotation to indicate TV spot
fig.add_annotation(
x=10,
y=max(measurements),
text="TV ad aired",
showarrow=True,
arrowhead=1,
ax=-100,
ay=-25,
font=dict(
size=16,
color="Green"
),
)
# Add vertical line at minute 20
fig.add_shape(
type='line',
x0=20, y0=0,
x1=20, y1=max(measurements),
line=dict(
color="Red",
width=3,
dash="dot",
)
)
# Add annotation to indicate lift window cutoff
fig.add_annotation(
x=20,
y=max(measurements),
text="Lift window cutoff",
showarrow=True,
arrowhead=1,
ax=100,
ay=-25,
font=dict(
size=16,
color="Red"
),
)
# Add annotation for lift
fig.add_annotation(
x=15,
y=combined_median + (max(measurements[9:20])-combined_median)/4,
text="Lift due to Ad",
showarrow=False,
font=dict(
size=18,
color="Green"
),
)
# Add annotation for baseline
fig.add_annotation(
x=15,
y=combined_median/2,
text="Baseline",
showarrow=False,
font=dict(
size=18,
color="Purple"
),
)
fig.update_layout(
title='Ad Influence on Web Traffic',
xaxis_title='Time',
yaxis_title='Visits per Minute',
width=1000,
height=600
)
fig.show()
The capture of the baseline shading request without being provided specific values is top-tier, and at this point, I was beyond impressed. Happy with the visualization, I dropped the chart into the presentation, fired off a Teams message, and with a fond look at the ChatGPT tab, closed my laptop. As I was packing up, a notification from the after-work group chat flashed across my phone screen.
Saw the presentation message, thought you said you’d be going late? We’ll have a pint waiting for you.
Cheers, ChatGPT.