If you have spent time learning Tableau, you have seen multiple references to the Sample Superstore data set. This data set comes pre-packaged with Tableau, containing information about a fictitious business’s products, sales, and profits. I’m an applied social scientist, so I am not interested in this content area. I remember cringing every time I heard the phrase, "sum of sales."
Perhaps later than I should have, I realized that my aversion to this data set was detrimental to my skill development. Taking the time to understand the Sample Superstore data set resulted in immediate and ongoing learning gains. I have also elevated the importance of the data set in the data visualization courses I teach. As my students are also in the applied social sciences, many of them express the same aversion that I experienced in the early stages of my learning.
This article aims to help new learners think about the value of this data set. I explain why you should learn about and use this data set, even if the content area doesn’t align with your interest.
1. You can better understand Tableau documentation
To be effective with complex software, you need to understand the documentation. I think the documentation for Tableau is excellent. Most examples and explanations are built on the Sample Superstore data set. Thus, if you don’t have an understanding of the data set, you will struggle in your efforts to use the documentation.
2. You have more learning examples from the DataFam
The Tableau community, often called the DataFam, provides learning activities and examples based on the Sample Superstore data set. The rationale is simple: Everybody has access to this data set, allowing learners to reproduce whatever is demonstrated. The data set contains all types of data, so just about any chart can be created.
3. You can stay out of the weeds
A deep understanding of a specific content area allows you to think carefully and critically about the various nuances of data. At the same time, getting into the nuances – you can get into the weeds very quickly and easily. Sometimes the depth of knowledge can interfere with learning a specific technique or procedure because you are focused on the wrong thing.
I try to explain to my students that working with the Sample Superstore data set is like performing core drills when learning a sport. If you are learning a racquet sport like tennis or badminton, you must develop the basic strokes. Specific drills and exercises can help build the basic strokes more quickly than learning through actual gameplay. I think the same applies to building data skills.
4. You can receive technical assistance without sharing sensitive data
My research involves working with sensitive data that I cannot share, which is typical of human subjects research. So, the usual approach to seeking assistance is to create a small fictitious dataset that is modeled after the problem I am trying to solve. Because the Sample Superstore data set contains just about every type of data, I can often model my problem using this data set, saving a lot of time. I don’t have to spend unnecessary time creating a new data set. And the broader DataFam is also familiar with this data set, allowing others to focus their time on solutions rather than understanding a new data set.
5. You can build your substantive thinking outside
Again, I am an applied social scientist and don’t find analyzing the sales of things like tables, chairs, copiers, and envelopes very interesting. And I certainly don’t expect others to be interested in my study area. However, keep in mind that a significant difference exists between things that are interesting and things that are useful. Learning to use data outside my area has proven beneficial in my data consulting work with nonprofit organizations, where I am routinely exposed to many new issues. New learners can use the Sample Superstore data set to expand their substantive thinking.
Limitations and practical next steps
While I am advocating for new learners to spend time working with the Sample Superstore data set, I think highlighting a few limitations is essential for building an overall learning strategy. The Sample Superstore data set is very clean, which is important for its intended purpose. At the same time, this is a limitation since it doesn’t reflect what you are likely to encounter in the real world. New learners who work almost exclusively with the Sample Superstore data set will soon face an unavoidable reality. When working with real-world data, a significant amount of time developing data visualizations requires considerable attention to preparing data. In fact, in most of my projects, this is where most of my resources are spent.
While I am a fan of data visualization, I strongly recommend that new learners actively and continually build Data Preparation skills as part of their overall learning plan. Most time and resources are often devoted to data preparation in data-intensive products. Being involved in data preparation helps promote a deep understanding of your data, which is essential to creating compelling visualizations. New learners of Tableau should spend time familiarizing themselves with a relatively new tool called Tableau Prep, which has become a vital tool for my data science toolbox. New learners can use Tableau Prep to prepare other data sources, enabling them to apply the skills acquired by working with the Sample Superstore data.