Move from data to visualization and say more with less while adding depth to some basic designs.

"Some books are to be tasted, others to be swallowed, and some few to be chewed and digested; that is, some books are to be read only in parts; others to be read, but not curiously; and some few are to be read wholly, and with diligence and attention." ~ Sir Francis Bacon
The vast majority of our data exploration is relatively simple. We need to compare two values or we need to see how our data appears in context. So we quickly whip up a bar chart or line graph in our notebook of choice, nod our head with a knowing grunt, and move on to the next item on our list.
Sometimes we need to dive deeper. We need to wring just a little more out of the information delivered. We need to offer up our visualization to someone else and not waste their time with it. In other cases we just need to make the most use of limited space.
In any case, it requires us to think a bit more permanently about what we’re doing. It may be with us for a while.
For this article, I’m going to take you on a little journey of putting together a slightly more complex visualization that tells a story. Don’t worry. Although I’ve used that keyword "complex", this will be useful. Maybe not all-in-one as a direct copy and paste, but at the very least as a mechanism to build out your own tool kit and maybe churn up some creative ideas.
Just to manage expectations, this is not a complete application walk-through. That would be a huge undertaking. What this does is walk you through the process to get to a functional end result using one of my own specific applications. The goal is to highlight some key concepts I believe are missing in the most common locations for searching out answers. I can only hope it helps someone even if it’s just in one small way.
Why Dash?
When I first started out in doing data work in Python, I really dug into Matplotlib. It made a lot of sense and fit with the way I was Programming. It was very easy to whip up some nice plots quickly that could be output for other purposes like publishing or distribution. One day I needed something – exactly what, I can’t remember, but it led me to Dash.
Previously, I had come across both Plotly and Dash in my travels across the ‘net, but didn’t pay it too much mind. I wanted to get things done and not spend time learning the nuances of yet another library. But this time was different. The intro was quick and easy and it peaked my interest. So I dipped my toe in the water with Plotly.
I was sold. In about three lines of code, I had a stunning (to me at least) plot that did what I needed. After a while, I ventured into Dash because I saw the benefit of whipping up my own dashboards quickly.
My first few attempts were utter failures. It was a simple case of my reach exceeding my grasp. I was attempting to leap over the simple and basic to land on my complex and ideal. But, I did get it soon enough and was able to build a usable platform for what I needed.
Dash has provided me with a relatively lightweight, responsive platform that does not require major infrastructure or technical plumbing to get quick results. The fact it works hand-in-hand with Plotly is a bonus.
My one complaint about the platform is that the documentation is often lacking. It has many jumps that take it from very simple to more advanced intermediate without a bridge in between. It can be very challenging to track down what’s missing. Like all areas of learning where the answer is wrestled away from the knowledge keepers, it’s a well earned and rewarded victory, but it’s also a major pain in the butt.
That’s a big reason I decided to put this article together.
About the Example
To some, this might be an esoteric example, but I hate using the cliche sets like Titanic survival or Botany or some of the other tried and true tropes. For this, I am using data from my own collected data stores.
As a finance nerd, I have a special affinity for the futures market, specifically precious metals. However, as a business weenie, I like to see the interplay between raw material pricing and consumer retail price. Therefore, my data consists of both market data and item-level retail pricing from multiple distributors and covers a 90 day period. Market data was collected on a 5-minute interval while retail data was collected on 15-minute intervals.
Market Data was pulled from a table constructed as datetime, open, high, low, close, volume (i.e., pretty standard market data format). Retail data was comprised of datetime, item, distributor, price.
For the most part, I have gone to great lengths to ensure my data is coming in clean, so there isn’t too much processing on ingestion. The only modification in data I performed was to write view that time-shifted the retail price timestamp to the nearest 5-minute interval. This allowed for a clean join between the market and retail data tables. For purposes of analysis, it was an inconsequential change.
Getting the Data
All art requires some raw material. In this case, our raw material is the data. This was procured from the database with a simple query:
SELECT
date(silver.datetime) as datetime,
DATE_PART('week', CURRENT_DATE) - DATE_PART('week', silver.datetime) + 1 AS week_number,
item,
price,
close AS spot
FROM retail_silver_norm_time
JOIN silver ON (retail_silver_norm_time.datetime = round_time(silver.datetime))
WHERE
silver.datetime BETWEEN CURRENT_DATE-90 AND CURRENT_DATE+1
ORDER BY datetime
Let’s break this down.
From retail_silver_norm_time is the normalized retail silver price view. The silver table just contains market data. These two tables are joined using their date stamps. Because they come from different sources, the date stamps have different precisions, but using the round function solves that challenge.
From that new Cartesian product, I only want data that has a time stamp covering a 90 day window. I am using a PostgreSQL database so I use the built-in identifier "CURRENT_DATE". This is a handy feature that most databases have analogues.
Note, my window extends up to and beyond today. I do this as a bit of a failsafe in case I get an odd date stamp. While it isn’t terribly common, it has been known to happen from time to time when dealing with data without a quality of service guarantee.
Now, with the entire base of the query defined, what specifically do I want from the big table? Here I’ve selected the silver table’s datetime that extracted to just the date portion. I keep the name the same just for consistency sake. I then calculate a value for "week number". This is going to be used in the final product. After that, I pull the three other main pieces of data – item, retail price, and spot price.
This gives everything I need to build the rest of my plot.
Pre-processing
As my data comes out of the database, I try to get it as close to directly useful as possible. However, I have this nasty habit of using the results of the same query in multiple places. This is no different, so I need to do a little data wrangling to get everything I need.
For this portion, I take the results of my query and push it into a Pandas dataframe.
Note: Something you might have noticed by now is that I’m not going deep into the items that can easily be found elsewhere. If you don’t know how to get data from a query into a data frame, there are some wonderful tutorials (much better than this) on the first page of a quick search.
Anyway, this is the code I’m using to run my process:
df = dbf.query_return(qu.silver_regression_query1)
df.columns = ['datetime', 'week_number', 'item', 'price', 'spot']
df = df_cleanup(df)
df = silver_normalizing(df)
df['ps_gap'] = df.price - df.spot
df.ps_gap = df.ps_gap.round(2)
df['date'] = pd.to_datetime(df['datetime']).sub(pd.Timestamp('2021-03-01')).dt.days
df.date = df.date / 10
df.date = df.round()
silver_reg_item = df['item'].unique()
Let me walk through the above code.
Line one and two is simply setting up the dataframe as the query results are returned by my function. Line three sends the dataframe through a generic cleanup function to fill in gaps and just perform some routine processes that I like to have in this application. Line four is a conversion function that normalizes the item prices into a consistent scale so I can get an apples-to-apples comparison. Just another nuance of the data within the application.
Lines 5, 6, and 7 are additions of data columns and formats. Line six creates a calculated column for the difference between spot and retail pricing. Line six rounds that value to 2 digits. Line seven creates a calculated column denoting days since March 1st. Lines 8 and 9 are modifications on that calculation from line 7. They put the calculated date into a proportion of 10 and then round it off to the nearest integer.
The reason these modifications are performed aren’t clear and likely confusing, but they do have a purpose as we build out the chart. Also note that many of these can be consolidated in various ways to just performed at runtime. I chose, for purposes of this example, to break them out to make the entire scope more iterative and obvious to follow. Consolidate as you choose.
The final line creates a new dataframe comprised of only the unique values of the item list. This provides a quick and easy way to filter by item and doing it here eliminates the need for another query.
Dash Constructs
Here is where this can get a bit convoluted for some. But, I’ll walk through it in detail.
Right now, we have a data set stored in a pre-processed dataframe that includes the population of our data. This is the sum total of our data to be worked with. I’ve let you in on my data that’s building this, but you can use your own data just as easily. The important part is to get to a set of data that is ready to go.
What we are trying to get to is a filtered view of that data that is visualized in a chart. To do this, we need a few mechanisms.
First, we need somewhere to hold the chart at the end of it all. In my case, I use the bootstrap templates (since my front end design skills are nothing to write home about) and populate a modal. I bring it up just to note it’s possible but the core of making this work is just three basic constructs:
- dcc.Graph – this provides a placeholder and an id to target.
- dcc.Dropdown – this control just gives us a place to make our item selection.
- dcc.RangeSlider – this control allows us to filter by date.
dcc.Dropdown
There isn’t much special here beyond what many tutorials have covered. The biggest aspects are 1) the id and 2) the loop to populate the values.
dcc.Dropdown(
id="silver_regresssion",
options=[{
'label': i,
'value': i
} for i in silver_reg_item],
value='One Ounce Generic Silver')
The id is designated as "silver_regression". Note this because it will come back later.
The list is populated with silver_reg_item, which is that dataframe we created in line 10 of pre-processing.
dcc.RangeSlider
The rangeslider is also added to not only filter by item (from the dropdown) but limit the view by date. Or in this case, not so much date but more in weeks from the current week.
dcc.RangeSlider(
id='silver_week_slider',
min=df['week_number'].min(),
max=df['week_number'].max(),
value=[
df['week_number'].min(),
df['week_number'].max()
],
marks={int(n): n-1 for n in df['week_number'].unique()},
step=None
)
Like before, these are taken pretty verbatim from the examples.
This is a decent example of why, in pre-processing hard-coded values were used. As you can see in the min, max, and value fields, all of those are functions based off the dataframe column values. These could have been performed on the fly, but it would make the code a lot more non-obvious.
As you create various displays, for your own sanity and those who will come after you, try to think through the entire process first. An odd step early can often save some headaches later.
dcc.Graph
dcc.Graph(id='silverReg')
Seriously, as I said before, the Graph object just gives us an id to target.
The Callback
By a long shot, callbacks were the most difficult thing for me to wrap my head around. They did not seem intuitive and the explanations didn’t seem adequate. It was only after A LOT of trial and error that something finally clicked. I also know I can’t be the only one. So let me try to make this much clearer in a way I wish it had been when I was learning.
The callback decorator uses a few statements to create the plumbing between the interface components and the logic performed. Using this arrangement, we can make some amazingly complex applications that are highly interactive.
Each statement in the decorator is denoted by its function: Input, Output, or State. Input is data feeding into the callback. Output is data leaving the callback. State is the current state.
For each statement we have two dimensions – the id of the relevant component and the data package. This is why all the components needed their id clearly defined and called out.
Here is my callback decorator:
@app.callback(
dash.dependencies.Output('silverReg', 'figure'),
[
dash.dependencies.Input('silver_regresssion', 'value'),
dash.dependencies.Input('silver_week_slider', 'value')
]
)
We can see that we have two input statements. These correlate to the dropdown id and rangeslider id respectively. The second item in the list is the value.
Similarly, our output statement notes the target id and the data being returned will be a figure. This matches up with our dcc.Graph object.
Beyond the Decorator
Once we plumb in the input and output and create that link between the body of the application and our logic statements, we can build in the backend. This is where the metal meets the road so to speak and takes the form of a function.
I am going to piece this out to walk you through the logical decisions and hopefully understand what is going on and why:
def update_graph(silver_reg_item, week_num):
if silver_reg_item is None:
silver_reg_item = "One Ounce Generic Silver"
if week_num is None:
first_week = 52
last_week = 0
else:
first_week = week_num[0]
last_week = week_num[1]
Our first line is the basic function definition. It takes two arguments, which is convenient since our decorator grabbed two inputs (please note humor).
The rest of the top lines exist to set a default state. In Dash applications, all callbacks fire on startup, so you have to account for that.
The first conditional is to define a default item. The second conditional creates a default range. Since there are defaults within the components, this shouldn’t be an issue, but in the event it is, it creates a nice soft state that won’t break anything.
Processing
It seems like data processing is just never done, right? That’s true here as well. Since we filtered our dataframe into an item and date specific one, we need to make all that happen and then perform some direct calculations:
df1 = df[df['item'] == silver_reg_item].copy()
df1 = df1[(df1['week_number'] >= first_week) & (df1['week_number'] <= last_week)]
y_max = df1.price.max()
x_min = df1.spot.min()
X = df1['price'].values.reshape(-1,1)
Y = df1['spot'].values.reshape(-1,1)
linear_regressor = LinearRegression()
linear_regressor.fit(X,Y)
df1['Y_pred'] = linear_regressor.predict(X)
This segment does a few things. First, the top two lines perform the filters. Line one filters the main dataframe by item and creates a copy. We create a copy so we don’t lose that original master dataframe and can continue to pull from it. Line two filters the new dataframe by the range values. This leaves us with just the abbreviated dataframe.
Next, I wanted a linear regression on the data. This is done via the SciKit Learn library and requires the dataframe columns be converted to numpy arrays.
Line three and four isolates two values from the dataframe for min and max. These get used a little later.
Lines five and six reshape the dataframe columns for spot and price into the necessary arrays. Lines seven and eight call the function and line nine generates a column with values based on it. This can also be done directly in Plotly, but I wanted to perform the function like this instead.
The Chart
And here we are. The code we’ve been building up to:
fig = px.scatter(
df1,
y='price',
x='spot',
size=(df1.date * df1.date) * 2,
opacity=.5,
color='ps_gap',
trendline="lowess",
trendline_color_override="purple",
hover_data=["datetime", "price", "spot"],
color_continuous_scale=px.colors.diverging.RdYlGn_r,
range_color=[df1.ps_gap.min(),df1.ps_gap.max()]
)
This is the main plot. It is a plotly express object and uses the df1 dataframe we created earlier in the process above. As a scatter plot, it uses spot against retail price.
If I left the plot there, it would be a very boring scatter that would be informative, but not as much as it could be. This is where I added some dimensionality.
Using the size parameter, I set up a method where using the dates as preprocessed the size varied based on distance. As the date was farther away (or closer to March 1st) the size would be smaller. The more recent data points would be larger. At a glance I could see relevance based on size alone.
Second, I wanted a clear and concise way to identify the difference. For this, I used color and the color gradients. Color of each marker is based on the "ps_gap" value however those colors are specifically chosen using the "px.colors.diverging.RdYlGn_r" scale. This is a relatively high contrast scale with good distinctions. The "_r" at the end reverses the gradient. Finally, the range is dynamic and based on the minimum and maximum values in the ps_gap column.
A couple of details in this section are that I defined the opacity to be .5 to provide some intensity variation in clustered groups and I added a default LOWESS trend line. The trend line is a smoothing trend line that compliments the linear regression calculated above.
fig.add_traces(go.Scatter(
y = df1['price'],
x = df1.Y_pred,
name='Regression',
showlegend=False,
line_width=2,
line_color='black',
))
The next addition is the linear regression. For this, we simply add a trace to the figure. You might notice this is a standard plotly Graph Object and not a plotly express object. Yes, you can mix and match.
This trace takes the retail price as the Y value and sets the X value equal to the predicted value from the calculation. (Yes, I’m aware of what’s going on there)
A challenge I realized when I printed my chart to a file is that it tended to lose context. I did not have the benefit of the drop down or any other identifiers to tell me what I was looking at. So, the easy way to solve that challenge was with an annotation:
fig.add_annotation(
y=y_max,
x=x_min,
text=silver_reg_item + "<br>" +
str(df1.datetime.min()) + " through " +
str(df1.datetime.max()),
showarrow=False,
font_size=10,
bordercolor='rgba(255,255,255,0.8)',
bgcolor='rgba(235,171,52,0.5)',
borderpad=5
)
This annotation sets the position based on the min-max values we defined in the processing portion. This tends to get it out of the way without a lot of complex hassle while not modifying the axis.
The final step is just some formatting:
fig.update_traces(marker=dict(
line=dict(
width=1,
color='black')),
selector=dict(mode='markers'))
fig.update_layout(
title="Silver Spot vs. Retail Relationship",
plot_bgcolor="#FFFFFF",
xaxis=dict(
title="Spot",
linecolor="#BCCCDC"
),
yaxis=dict(
title="Retail",
linecolor="#BCCCDC"
),
coloraxis_colorbar=dict(
title="Retail-Spot Variance",
ticks="outside",
tickprefix="$")
)
The first update outlines the traces. This lets them be a little easier to distinguish. The second update on the layout sets the titles, axis details, and formats the colorbar.
The last thing we do is return the completed figure:
return fig
This will take that completed Plotly figure and push it out to the placeholder in dcc.Graph as defined in the decorator. Dash then does the rest.
The Final Result

This above picture is what the chart looks like in the application. At the top, we can see the dropdown and at the bottom we can see the range slider. However, when we save the image, we get the following:

The saved chart, as mentioned previously, has a lot less context. This makes the annotation much more important.
Final Thoughts
We took a long way to get where we are, but I wanted to make sure that the focus wasn’t just on the dimensions of the data presented, but also the behind the scenes. The final chart is both simple and complex. A good mix.
At a casual glance, we can see the relationship between the data as we would expect from this type of scatter plot. However, if we really examine it, we can be drawn in by not only the axis values but the sizes and colors as well. It rounds out the story with more elements that are just as critical.
As a thought experiment, if we left off size and color, what would we need to present that data? How many other charts? What types of charts? Would it be effective or would the real estate cost be too expensive?
Visualization is like writing. We stagger our sentence structure. Some thoughts are quick and obvious. Others are dense and need careful consideration. But we pace everything so our audience comprehends what we need them to.
Sometimes, a very simple and direct visual is enough. Other times we need to dig into complexity. The best method, especially when constructing a cohesive dashboard or report is to build into what we need to say. This requires tastefully composing our data in the proper ways to get our conclusions across – or enable our audience to draw their own conclusion.
At the end of it all, experiment. Some ideas will work and others will fall flat. But, if you caught what I was throwing out here, you have the beginnings of concepts and tools to build out your own test environment to explore your data.