A Simple Yet Effective 5-Step Framework to Master Data Visualization

Let’s slowly break down the framework using an example and create the visualization. Screenshots included.

Anushiya Thevapalan
Towards Data Science

--

Image by Freepik on Freepik

Visualizing the data has always fascinated me. To present my thoughts and findings in visual images and narrate a story has always been my strength.

We all know the importance of data visualization in the era of big data.

Your professor loves a well-designed presentation for the classes. Your boss loves when you uncover patterns, trends, and business insights through your visualizations. You have seen those visualizations at the new iPhone launches, haven’t you?

You might be awestruck when people narrate a compelling story through visualizations but trust me; It’s not rocket science. I can do this consistently because I had developed my own 5 step framework over time and rely on this process to get it right every time.

The key is to take the framework practice repeatedly until it becomes natural for you to create the “magic.”

Instead of just putting it out there, I’ll take you through the framework using an example.

I’m picking the US Superstore dataset, which consists of a list of transactions of an e-commerce platform from 2014 to 2018 and is available on Kaggle. Go ahead, download the data (all you need is a free Kaggle account), and follow the example.

1. Be clear on the purpose

This is something many people neglect at the beginning, including me. Even before being clear with what you are about to visualize, you start going through the data. You spend hours understanding the data to realize later; the effort has less value.

As a first step, let’s make the purpose clear.

  • What question are you trying to answer?
  • What do you want to convey through your visualization?
  • How will your visualization help the audience?
  • What are you trying to accomplish with the visualization?

These simple questions will help you be productive in the next stages.

Applying this to the example:

In our example, the US Superstore dataset, some possible problems you might want to solve.

  1. How do sales vary between states?
  2. How much profit is made from each product?
  3. Which product category is generating high profit?
  4. The percentage of profit contribution by each product?

These are few examples of problems, and there can be many more. Moving forward, we’ll be using the first problem, How do sales vary between states, as the example.

2. Understand your data

Now that you are clear with the purpose or the problem you are about to answer, you should understand the data presented to you.

You need to understand the dataset very well. The dataset might have hundreds of columns and might be overwhelming at a glance. But take the time to make yourself familiar with the variables, what each variable represents, and the significance of the variable in the dataset.

Now that you have a clear purpose in your mind (from step 1) and have understood what each variable represents, you will be able to filter out the columns you will need for your visualization.

Understanding the dataset will also clarify whether you can use the data as it is or whether you should perform any modifications.

Applying this to the example:

While inspecting the data, you could probably ask these questions for better understanding.

  1. What does each variable in the dataset represent?
  2. What are the variables needed to answer the question?
  3. What are the modifications required to solve the problem?
  4. What are the variables you will need to plot?

Now that you have a clear purpose and understanding of the data let’s move to the next step. Shall we?

3. Define your audience

Understanding your audience is vital as it gives you the idea of the visualization which you should come up with.

For example, if your audience is fellow data scientists, you could probably use matplotlib as your visualization tool. But let’s say you are asked to present it to a business analyst or a salesperson; creating visualization with matplotlib would be a blunder. You could probably use MS Excel or some advanced tools like Tableau and PowerBI.

If you are pitching your data to your customers, you might want it to be as attractive as possible. In this case, you might not want to use MS Excel, but choose to go with Tableau or PowerBI.

Knowing your audience is not limited to the selection of tools you use but also to the title, captions you use.

Applying this to the example:

Few questions you could ask yourself to understand your audience better

  1. Who is your target audience? (or for whom are you creating the visualization?)
  2. Is your target audience technical?
  3. What competency do they possess in interpreting data?
  4. In which form do they want the visualization to be? (e.g., Online dashboard, MS Excel sheet, presentation)

In our case, let’s assume the salespeople as our target audience. We can choose to create Tableau dashboards to show how the sales vary between different US states.

4. Develop your visualization

The first 3 steps would have given you a clear picture of the visualization you are about to create. Now it’s time to get your hands dirty with developing the visualization.

Type of visualization

Choosing the right type of visualization is essential. If you don’t make the right choice in the type of visualization, all the effort you have put in so far will end up in failure.

I could probably write a whole article on choosing between different visualization types for the problem, but I’ll keep it simple for this article.

  1. Bar charts: You can use bar charts when you want to perform a comparison on your data.
  2. Line charts: These charts can be used to visualize trends in data over time. For example, the price of a product over the year, daily profit generated over the year.
  3. Pie charts: These are used to show the composition. To explain the percentages of the whole. For example, the percentage of profit contributed by each product.
  4. Maps: You can choose to use maps to visualize data based on geographical location. This will give the end-user a better understanding of the location.
  5. Gantt charts: It’s a widespread type of graph used to visualize a project schedule or activity over time.

The types of visualizations are not limited to the above five. For simplicity, I have mentioned only the above five. Once you have chosen the right type of chart, pay attention to the color and scale selection.

Applying this to the example:

In the example we were looking at, we could create a bar chart similar to the one shown below.

Image created by the Author

5. Test and Improve

In this stage, you test your implementation to make improvements. It’s very common for humans to make assumptions when developing. To reduce this bias, you should always look forward to feedback from your colleagues.

Applying this to the example:

In our example, we could improve the bar chart created in the above step to a Map. (I intentionally created the bar chart in the previous step to show the possible enhancements that can be made with the feedback).

Image created by the Author

Let’s recap to remember it forever

The simple five-step guide can save you hours on creating your visualizations.

  1. Be clear on the purpose: Make yourself clear with the problem you are about to solve.
  2. Understand your data: Inspect the data and have a clear picture of what each variable represents. Filter out the variables depending on the problem.
  3. Define your audience: Knowing your audience will assist you in deciding the tools, phrases to use in the development stage.
  4. Develop your visualization: Now, you exactly know what to create. Choose the right type of visualization for the dataset and start creating.
  5. Test and improve: Get feedback on the visualizations developed and improve on them. Finally, share it with the stakeholders.

So there, you have it. It’s not magic. It’s simple. You’ll know how effective it is when you put this to practice. All the best!

Thank you so much for reading this far. I hope you enjoyed reading and this article has added some value to you. I’d love to hear your feedback on how I can improve. Looking forward to seeing your stellar visualizations!

--

--