Office Hours

I’m going to explain how you can add value as a data analyst to your current/future potential employer without actually ‘analyzing’ data!
‘Boom!’ – Your Mind, Because It’s Just Been Blown
What’s This Crazy Talk?!?
Acquire, prep, and explore are three of the earlier stages of the data pipeline. ‘Ok, but how does that help me?’, you might ask. You’re an analyst, not an engineer. All you need is data in a usable form and you can throw all sorts of tables together and make some sweet viz’s. Right?
‘Weeeelllllll…’ – Me, about to break some bad news to you
What Is Usable Data?
During my time at a super great data science bootcamp, Codeup, I was taught a very valuable mantra to live by: ‘It depends on the application.’ Usable data is dependent on its use. Anyone can ‘Select *’ from any table they have access to or take an Excel file a colleague uses and start making all sorts of bar and line graphs. This has the potential to start a lot of unnecessary fire drills.
This next part will focus on you being able to explore your data and being able to understand if anything seems awry. This is where you can show that you belong in the world of making good use of data. All of the visualizations in the world and in-depth analysis mean nothing if the data you’re working with has large issues within it. Analytic leaders would agree with this statement.
Side note: This is the not fun part of data science that you hear about. This is also a very important part of data science where you should focus attention.
Let’s Make This Not So Abstract
These next few paragraphs might get a little boring, but they’re very much needed for context.
I’m going to show you a couple of examples of data in the ‘Hotel Booking Demand’ data set from Kaggle. This data set consists of rows/observations from two hotels, a city hotel and a resort hotel. Each row represents a reservation. Example columns allow you to see the arrival/check-in date of the guest, the number of guests (adults, children, babies), the ADR (average daily rate), and for which of the two hotels the reservation is scheduled.
For this, I’m going to focus on a column that is a boolean indicating whether or not the guest has previously stayed at the hotel (0 is a new guest, 1 is a repeat). There are two other columns that give values for the number of previous reservations a guest cancelled and the number the guest did not cancel.
Logic tells us that a guest that is new would not have stayed at the hotel before. Additionally, guests that are repeat guests would have previous visits. Well, we see rows where this isn’t the case.


In the real world, you could probably go talk to an engineer or the person that supplied the data to check the logic on the code. In this example, you can’t.
‘Garbage in, garbage out.’ – A whole heap of data folk
A big part of being an analyst is making sure that the data you use has passed sanity checks. The tricky part here is that there might be a bajillion different ways to look at the data for business purposes. You and your crew most likely won’t even be able to think of every possible item to check when building tables. Things become more obvious once you start using the data with a purpose in mind.
Now, We Do Focus On Visualizing Data
This next part focuses more on what stakeholders would find important in this type of data set. There is also a focus on creativity and the ability to transform the data into a data set to answer your particular question.
We have two hotels and we can easily see how many check-ins each hotel has each month. The data set consists of just over two years of observations.
If you’re a regional manager, how focused are you going to be on the number of check-ins of two different hotels?
Why I wouldn’t care as much as about the number of check-ins:
- Two hotels most likely have a different amount of rooms available.
- If one hotel is more transient (shorter stays) hotel and another hotel consists of longer stays, the check-in aggregates would really highlight a particular hotel.
Visualizing check-ins isn’t an apples to apples approach.
I would imagine that someone in this role would be more concerned about the occupancy rate % each hotel is running, number of rooms occupied / number of available rooms. Let’s look at the difference between check-ins and occupancy rate.


The results between the two hotels are very different in these images. Basically, the city hotel does have more estimated rooms and has guests with shorter reservations. Because of that, it dwarfs the resort hotel in the first image for check-ins.
Take a look at the Kaggle notebook to get more details about the data cleaning performed and transformations made to get to a data set that allowed for these visualizations.
To further get an idea of the Data Analytics world, this read focuses on more than just the standard tools you might be expected to know to be a great data analyst. Learn about some of the lesser discussed characteristics that will help you in the long run. As always, keep on learnin’!