Getting Started

Sensitivity analyses involve varying a system’s inputs to assess the individual impacts of each variable on the output and ultimately provide information regarding the different effects of each tested variable. Sensitivity analyses are typically used in a variety of disciplines such as in business for financial modeling, or in engineering to optimize efficiency in a given system. If used correctly, the sensitivity analysis can be a powerful tool for revealing additional insights that would have otherwise been missed.
While data scientists are great at modeling and creating actionable information based on the understanding and interpretation of datasets or workflows, the sensitivities of basic inputs are often ignored. Conducting a simple sensitivity analysis could add value to a Data Science project by providing additional information to stakeholders for making more informed decisions. While implementing sensitivity analyses would not be feasible or desirable for certain tasks, they could serve as an additional exploratory tool for data scientists to derive additional insights from multivariate datasets.
In this tutorial, we will go over a simple sensitivity analysis using some real gemstone data. First, we will conduct an exploratory data analysis on the diamonds dataset so that we can better understand the results of the sensitivity analysis which will be discussed in the section after.
Exploring the diamonds dataset
We’ll be using a well-known gemstone dataset that is available within R or could also be found on Kaggle.
The dataset contains 53,940 round-cut diamonds and measures various attributes for each diamond, we will be focusing on the following five:
- Price is the dollar amount for the diamond measured in $USD and ranges from $326 to $18,823
- Weight is the mass of the diamond, measured in carats (one carat is equal to 0.2 grams) and ranges from 0.2 to 5.01 carats
- Clarity quantifies how clear a diamond is based on the quantity, location, and type of inclusions it contains
- Color measures the degree to which the diamond has a slight stain or is colorless
- Cut refers to the quality of the diamond cut as it has a significant impact on the diamond’s optical properties
For more information on diamond attributes or if you are curious to know about how they are quantified, refer to this link.
Attribute correlations and apparent relationships
Correlation coefficient matrices are often the first tool used when determining relationships between variables. Below is the correlation coefficient matrix for the five diamond attributes we are considering.

We get some correlations that we expect like the correlation coefficient of 0.92 between diamond price and weight making it clear that the weight of the diamond has the biggest impact on its price. However, we also get some unusual trends such as the small negative correlations between diamond cut, color, and clarity with price. This negative correlation is due to lighter diamonds having better cuts, color, and clarity when compared to heavier diamonds thus resulting in the misleading correlations of how improving these individual attributes results in a decreased price.
A closer look at the data to reveal additional insights
We can use a scatter plot and implement some multidimensional visualization techniques to better understand how the diamond attributes are related. Below is a log-log scatter plot showing the relationship between diamond weight, color, and clarity with price.

In the figure above we now realize that while holding other attributes constant, price increases when the value of any other attribute increases. Specifically, in the above figure we observe:
- The positive slope in the data indicates that price increases with weight
- For any single diamond weight, increasing only the color or clarity also increases the price, this is most clearly seen as the color and size of markers increases along the x-axis (price) at any single, constant weight value on the y-axis
Note that to improve the visualization above, the data was sorted based on clarity which controls the size of the markers and this allowed us to plot smaller dots on top of bigger dots. Unfortunately, given the amount of data, plotting smaller dots atop larger ones conceals how the lighter diamonds typically have greater clarity and color as they are hidden behind many smaller dots.
Now that we better understand how the diamond attributes relate to one another, we can conduct our Sensitivity Analysis.
Sensitivity analyses
There are a lot of different types of sensitivity analyses we could do, here we will present a couple of practical techniques which have a wide range of applications: (1) How to compare and contrast the effect of each input on the output, and (2) Conducting a what-if analysis.
Note that the dominant effect of diamond weight on the price will make the effect of all other attributes insignificant. To allow for a clearer comparison with the other attributes, we will only consider diamonds within ±10% of the mean weight.
Assessing input sensitivities on the output
Typically the most common approach is to hold all the attributes at their mean value while varying just one of the inputs to assess the effect of changing just one variable. More advanced analyses could include varying multiple inputs at the same time to study the combined effect of multiple variables.
In this example, we will vary one attribute at a time at multiple steps to assess the overall sensitivity of each variable. Adding half a step means we are using the midpoint value between the mean and maximum for a single variable and adding a full step means we will use the maximum value for that variable. The figure below illustrates the concept behind the stepwise increasing and decreasing variable value from the mean to the maximum or minimum.

Note that more or fewer steps could be used depending on how finely or coarsely you want to model the sensitivity of each variable. In this example, we will be using the -1, -1/2, +1/2, and +1 steps as shown above.
After iterating through all the attributes and varying single variables by increasing and decreasing their value by our predefined steps, we can plot the effect each input had on the output. One good approach to visualize this information would be to use a diverging bar chart as shown in the example below for the diamond attribute effects on the price at each tested step.

From the above figure, we can summarize that varying clarity has the largest effect on changing the average diamond price by a range of ~65%. We can also report that varying the cut had the smallest influence on the average diamond price by causing a change within a total of ~17%. The sensitivity effects of each attribute could provide stakeholders with actionable information as they now know which variables are most sensitive to changing the output.
We can also visualize the attribute sensitivity information in a way that allows us to better compare the variables to each other. Below we plotted the contribution to the price change of each attribute for our four tested steps.

The plot above allows us to more easily determine how the relative effect of each attribute changes as we go from the minimum to maximum value of each variable. This plot informs us that for the tested steps, the cut and weight are somewhat constant while clarity and color vary significantly. In this example, the color seems to be the most important attribute when closer to the minimum values with a decreasing trend where clarity eventually overtakes it as the most impactful variable.
When all attributes are at their maximum, diamond clarity seems to contribute to over 50% of the price change. Along with the diverging bar chart, this highlights that clarity is the most sensitive input as it has the largest effect on the output diamond price.
To simplify things a step further and narrow down on the average effect of each attribute on the diamond price, we can use a simple pie chart as shown below. We simply take the average of each individual change in diamond price due to varying each input by the four steps.

Based on the pie chart above we can clearly see that the effect of each input on the diamond price decreases in the order: clarity, color, weight, and cut. We can also report the effect of each attribute relative to one another. These types of figures would be the simplest way to communicate the findings of a sensitivity analysis to executives or stakeholders.
What-if analysis
A what-if analysis is commonly used to model how changing specific variables affects the outcome. Note that this is more of a forward modeling approach because it is independent of the dataset.
Let’s say 100K diamonds are sold at an average of $5K resulting in a gross profit of $500M and we want to determine how the gross profit would change by varying the amount and average price of the diamonds we sell. This is easily done by creating a table and assigning each variable to a column or row. The table is then simply filled by taking the product of the corresponding column and row header values. Below is the table generated from our synthetic what-if analysis problem.

The table above can allow for more informed decisions to be made in terms of supply and demand, or setting pricing and quantity targets. In this example, if the number of diamonds sold can be increased by an additional 30K, even if the average price goes down to $4,500, the gross profit would rise by an additional $85M. Comparatively, if the average price is increased to $6,500, even if the number of diamonds sold goes down to a total of 90K, the same additional profit would be achieved.
Conclusion
Having conducted the sensitivity analysis on the diamonds dataset, we have obtained various additional insights which we otherwise would not have such as:
- Diamond clarity and color are the two most significant attributes which influence the price while weight and cut are the two least significant
- Using the sensitivity information, the average consumer could make the decision to sacrifice a small amount of clarity and color to buy a much bigger diamond with a significantly higher cut quality at the same price
- Synthetic scenarios can be run by varying certain diamond attributes or supply and pricing parameters to generate the lowest risk business model based on input sensitivities
The sensitivity analysis is a great tool for deriving more insights and knowledge from multivariate datasets.
The sensitivity analysis would best serve as an additional exploratory tool for analyzing data. Rather than simply reporting outputs from a model, data scientists could implement sensitivity analyses to provide their executives or stakeholders with additional actionable information based on the influence of the specific inputs.
While the example presented here used a real gemstone dataset, the same approach can also be used when modeling or forecasting a system with synthetic data.