
Control chart
Statistics
"a chart on which observed values of a variable are plotted, usually against the expected value of the variable and its allowable deviation, so that excessive variations in the quality, quantity, etc., of the variable can be detected" [1]
Introduction
During my bachelor’s studies, one insightful Uni course was "Introduction to Quality Assurance". Essentially, this was yet another statistics course wrapped in a different name but aimed to transfer knowledge of techniques and instruments for tracking production process quality.
Among others, control charts were one of these instruments, as they show the following:
- how process measures evolve,
- if process measures are within expected control limits,
- and which measures should be improved to improve the production process quality.
In other words, they are process pictures that estimate process quality by setting the following elements for observed measures:
- centre line – an average historical value of the measure,
- upper control limit (UCL) – set three standard deviations (+3σ) above the centre line, and
- lower control limit (LCL) – set three standard deviations (-3σ) below the centre line.
And these three elements are easily calculated and can be used for tracking the targets of the continuous KPIs in various business areas, not only production.
With this said, this post aims to show the implementation methodology of the control charts using the BigQuery statistical aggregate function for calculating the standard deviation and finding the UCL and LCL values.
The work presented here is a follow-up of the previous development for setting the KPI targets using the BigQuery ML and the hard-coded Looker approach:
Analog to the previous post, the data will be modelled and visualized with the Looker instance.
Let’s start. 🙂
Use case explanation
In the past work, we focused on presenting the methodologies for tracking marketing costs
, [cost revenue ratio](https://www.indeed.com/career-advice/career-development/cost-revenue-ratio#:~:text=What%20is%20a%20cost%20revenue,including%20marketing%20and%20shipping%20costs.)
, and [return on ad spend](https://www.klipfolio.com/metrics/marketing/return-on-ad-spend-roas)
KPIs.
Now, we will focus on the [Cost per Order (CPO)](https://www.klipfolio.com/resources/kpi-examples/ecommerce/cost-per-order)
**** KPI and present a methodology for creating a statistical control chart for tracking the CPO
values.
Before we start explaining the new development, the following implementation notes should be mentioned:
- From the business perspective, the future values of
Cost per order
KPI shouldn’t diverge too much from their historical values, and that’s why monitoring its targets with statistical control limits is feasible. - Multi-series dataset will again be used in the development.
- The development focus is placed on the methodology rather than the datasets, and the underlying data sources won’t be shared.
Development methodology
The picture below presents a 3-step methodology for data modelling and visualization of the control charts in Looker.
![Break down of the development methodology [Image by author]](https://towardsdatascience.com/wp-content/uploads/2022/07/1Sz9ze4-Q8HZldy-11k52cw.png)
In the first step, we will extend the existing LookML view with a new calculation of the CPO
measure. Following this step, we will calculate the CPO
upper and lower control limits. Lastly, we will visualize the insights and present a tutorial for creating the control charts in the Looker.
The first two steps are intended for Looker developers, while the last step is intended for business users.
Step 1/3: LookML modelling: Extend the existing view with CPO calculation and develop a new view for calculating statistical control limits
The tasks of the first step are straightforward:
- Extend the view
[kpi_targets_and_actuals](https://gist.github.com/CassandraOfTroy/5e206447fb715fa189bfb643de2c604a#file-kpi_targets_and_actuals-sql)
withCPO
calculation. - Create a new view
[cpo_targets](https://gist.github.com/CassandraOfTroy/6d375f59431d0a6bd579e00d33c26266)
for calculating the upper and lower statistical control limits.
The formula for calculating the CPO
is a ratio of the actual performance marketing costs and the total number of orders created in the observed timeframe:
measure: cost_per_order {
label: "CPO Actual"
description: "Cost per Order = PM Costs / Total Order Number"
type: number
value_format_name: eur
sql: ${pm_costs_actual}/NULLIF(${total_number_of_orders},0);;
}
After we created a new measure, the development of a derived table view from the existing data model [marketing_control_charts](https://gist.github.com/CassandraOfTroy/494188438a739c098b3f8adefc72a5c6#file-marketing_control_charts-model)
follows.
We will select the needed dimensions and measures from the model and create a derived column with the Bigquery statistical function to calculate the CPO
standard deviation.
The functionality of the newly created view can be summarized as follows:
- We used the STDDEV_SAMP statistical aggregate function, which returns the sample (unbiased) standard deviation of the values.
- In addition to calculating standard deviation, we calculated the
CPO
average, and from these two measures, we calculated the UCL and LCL values. - Measures
cpo_lcl_2sigma
andcpo_ucl_2sigma
were calculated to set early alerts of the possibleCPO
outliers.
Let’s move to the next step to show how to extend the existing data model with the new development.
Step 2/3: LookML modelling: Extend the existing data model with the new view
In this step, we created a new data model version, i.e. [[marketing_control_charts](https://gist.github.com/CassandraOfTroy/494188438a739c098b3f8adefc72a5c6#file-marketing_control_charts-model)_v2](https://gist.github.com/CassandraOfTroy/d52fd4c87b662afe469d3c37cd3bc20b)
, to show the code difference between the old model marketing_control_charts
.
In the new model, we added a join
on the two views:
- the "old" view
kpi_actual_vs_target
– holds theCPO
actual values, and - the "new" view
cpo_target
– contains theCPO's
statistical control limits.
The join between views was created on the series identifiers (shop_brand
and marketing_shop_types)
and not on the date
identifier, as we want to have the flexibility to observe the changes in statistical control limits and actual CPO
values in distinct timeframes.
With this explained, we can move to the last step. 🙂
Step 3/3: Looker: Create a Control Chart
The insights presentation step is finally here. With the previously developed data model, business users can explore it and create a control chart.
Tutorial:
- Select filters:
CPO Target Date
,Order Date Date
,Shop Brand
. - Select dimensions and measures:
Order Date Date
,Shop Brand
,CPO Actual
,CPO Average
,CPO LCL
,CPO UCL
,CPO 2 sigma LCL
,CPO 2 sigma UCL
. - Select the
Table (Report)
visualization type to present the outcomes.
![Looker: Control chart for monitoring CPO values [Image by author]](https://towardsdatascience.com/wp-content/uploads/2022/07/1um39e7WhZcqOIkFCvpKWmg.png)
Functionality:
- with the above-described setup, we can track
CPO
daily for each shop (series) in the dataset, - two dates in the filter section, i.e.
CPO Target Date
andOrder Date Date
, enable us to track the actualCPO
values agnostic of the date of theCPO
targets (statistical control limits). For example, we can compare the actualCPO
values in the current month to the targetCPO
values in the directly previous month or the same month in the last year, - 2 sigma UCL and LCL are calculated and can be used to alert business users early of the possible
CPO
outliers.
Conclusion
The goals of this post can be summarized as follows:
- The development aim was to present the methodology for creating control charts with BigQuery statistical functions and Looker instance.
- The business aim was to provide a simple framework for tracking the KPI scorecard.
Overall, the "big" goal was to create a business value by enabling end-users to identify the KPI outliers early. In addition, the proposed methodology aims to decrease reaction time for taking actions to keep the KPIs within control limits.
With these notes, we are concluding our post. Like always…happy coding! 🙂
References:
[1] Collins Dictionary, accessed: June 11th 2022, https://www.collinsdictionary.com/dictionary/english/control-chart