The six traps in data analysis and how to escape them

Julia Luz
Towards Data Science
7 min readJan 4, 2021
Image: SOURCE

Anyone who works with data analysis knows the importance of following a process to conduct their work. This way, we guarantee we do not skip important steps, we can have a clear notion of the timeline — where we started, where we want to go — and what were the challenges found along the way.

A defined process also helps to report the current status of the analysis, and it helps us organize ourselves to deliver a better quality analysis.

We have several references in literature we can use. The Women In Data page published a pyramid that shows several interesting points in a process of analysis.

There are six steps: Frame the Problem, Collect the raw needed for your problem, Process the data for analysis, Explore the data, Perform the in-depth analysis, and Communicate results of the analysis.

Image by Author, inspired by source[https://www.linkedin.com/posts/women-in-data_what-is-your-data-science-process-do-you-activity-6707328826029965313-Vz59/]

Each step has its purpose and plays a key role in generating valuable information.

In this article, we’ll cover the steps that data analysis typically follows, as well as the traps that come along and how to escape them…

The trap of each step!

In the image below we can have an initial idea of this relationship between steps and traps:

Image by Author, inspired by source[https://www.linkedin.com/posts/women-in-data_what-is-your-data-science-process-do-you-activity-6707328826029965313-Vz59/]

As we can see, not all steps contain traps and some have more than one.

Now that you have an overview of the data analysis process, it’s time to go deeper into each step and understand each one!

The Steps

Step 01: Frame the Problem

Here is the key to all the questions that will come after starting the analysis.

When I start an analysis, I like to think that to facilitate the understanding of the work we must first list two points:

1- What questions do we want to answer?

At the end of our analysis, what questions do we want to have answers to? This directs the next steps very well. Based on this, we will know better which strategy to adopt for our analysis.

At this moment it is important to talk and align with those involved, what were the questions that resulted in this request for analysis, how the demand was born, etc.

Besides, it makes us wonder if our time spent will generate value.

2- What are the actionable insights with the conclusion of the analysis?

Depending on the answer to the questions we want to answer, what do you want to do with that information? What decision making do you want?

Without these answers, you can fall into the trap of “I don’t know which problem to solve” and start analyzing without direction. Also, given the risks of bad communication, the analysis conclusion can differ from what was expected…

These answers will also greatly guide the choice of data, assumptions that will be used, which sampling method to choose, and also question the problem: Are we working on the right problem? Are there other perspectives to consider?

Therefore, spend the necessary time in this step, because the way we identify the problem will determine the context, objective, significance, and scope of the analysis.

The important thing is to keep in mind the answers to the questions mentioned, and when new doubts arise, we will be able to summarize the objective of the analysis and reinforce the decisions you have taken.

Also, bear in mind that it’s possible that new questions might come across and that these questions will complement the scope and strengthen the premises and methodologies adopted.

Step 02: Collect the raw needed for your problem.

In this step, we need to collect data from the sources to answer the questions raised.

A good exercise that I do before extracting data or creating a database is to draw the final output I believe will answer these questions, whether on paper or even excel. This way, we can visualize repeated or unnecessary data that can hinder their analysis, either in the extraction performance or confusing the public.

And here the trap “Bring data to solve all problems in the universe” can come across. Sometimes, when we start extracting data, it is common to keep excess data with thoughts like: “What if they ask for that data? Better to leave it here in the database .. ”, and suddenly we have a lot of data that doesn’t help to answer the question and it give us more work because one thing is a fact for those working with data analysis is:

If you are going to work with that data, it is your responsibility to ensure quality.

Why are we going to waste time doing quality analysis of data that won’t help?

Step 03: Process the data for analysis

In this step, we will understand the extracted, do data quality, and clean up the data.

It seems like a simple step, but the data is hardly consistent. On the contrary, a lot of data has different formatting than what you need: fields are null or different than expected, different categories from what the team thought they would have, and so on.

It is part of our job to raise these errors and, more than that, share them with those responsible to ensure that it is visible to the company. Thus, it is easier to prioritize the delivery of this correction, other areas are also aware and the company is more confident that it is not using the wrong data.

Step 04: Explore the data

With the data extracted and validated it is time to enter the world of exploration. At this stage, we will most likely want to understand the relationship between two variables, analyze their behavior in different data clusters, compare different scenarios, etc.

When you start to share the first insights into your analysis, many other questions arise:

- “Why don’t you analyze this variable too? “

- “Why don’t you simulate a result with this premise?

- “Why don’t you change the date range? “

And so on … and in those moments we can enter the so-called “Down the Rabbit Hole”, inspired by the novel “Alice in Wonderland”.

In the novel, Alice falls into a rabbit hole that transports her to a fantastic place populated by peculiar creatures, revealing a logic of the absurd, in the end when she is attacked by the Queen’s soldiers, Alice wakes up, discovering that the whole trip was a dream…

What do you mean?

That sometimes we enter a cycle of analysis and more analysis, which lasts a long time without realizing it.

Naturally, curiosities arise with the results, but we must always remember the question:

Will this additional analysis help us answer the main questions?

If so, spend some time, but give yourself considerable time to not get stuck. If not, write it down on your to-do list.

Another common trap: “Bad Moment”. It’s about suffering from these various questions, start to doubt the objective of the analysis, if you are following the right path … This is quite common, and you often end up suffering alone. At these times, it is very important to remain calm and share these pains and insights with the team. They will help you to validate, raise new hypotheses, and reinforce the focus of the analysis.

Take advantage of all available channels of communication for this: meetings, e-mails, etc.

One more trap: “I have no results”.

Usually, a data analysis task takes a long time, especially when we are dealing with data with errors, or different business rules, and it often hits a feeling that we are not delivering.

However, it is necessary to remember that identifying errors in data or even doubts regarding graphics are considered deliveries as well.

We don’t need to conclude the analysis just to have this feeling of closing.

Step 05: Perform in-depth analysis

It is important to go into detail in the analysis, both to validate the data and to support the results and conclusion. This avoids biasing your analysis.
You need to understand what are the variables that reflect different behaviors in the data and analyze the information at that level.
For example, in an analysis of real estate sales, the value of a property can vary a lot by state, city, and neighborhood level, so it’s essential to factor each of these in the analysis.

Step 06: Communicate results of the analysis

No analysis is completed if there is no communication of the results about the questions and the insights.

This gives you a feeling of closure, and here’s the last trap: “Don’t believe in yourself”. Don’t be ashamed of presenting results. If you have gone a long way in your analysis and discussed the methodologies and strategy adopted, you will certainly be prepared for any questions that may appear.

If there are any questions for which you do not know the answer, it is okay to say that you do not know and you will research. Recognizing that you don’t have all the answers is part of your maturity as a professional.

Conclusion

Everything I’ve said so far was based on my experience, the lessons I learned from each analysis I did, and how I deal with each trap today.

I believe that whatever your job is, should be encouraging, something that always motivates you, so you must overcome these moments that can hold you or even make you give up.

There will always be challenges to overcome, and it’s okay! Stop and reflect: How did you overcome? Next time you will already know how to escape.

Did you have any questions, do you remember any other trap or do you have any suggestions? Post in the comments!

Thanks to Manu and Madu!

--

--