Office Hours

What helped me become a better Data Analyst

Wenling Yao

Published in

Towards Data Science

11 min readApr 17, 2021

My blog journey continues!

Since I will start my MBA journey this August, I will be away from the workplace for 10 months. I definitely want to continue to work in a data-related role afterward, so I’d like to write a summary of what I found really helpful during my first years as a Data Analyst— also for my own record. There are a lot of useful resources on the Internet from which I have learned a lot. Hence, I try my best to keep this post as unique and personal as possible to reduce overlaps.

Before I dive into those tips, let me share my definition of a good Data Analyst, i.e., where I wanted to be and have been working towards in the last couple of years. Note that this might be very different from what a specific company or in general the market expects from a Data Analyst. 😃

Back to the end of 2019, I wrote the following New Year’s resolution on my LinkedIn with an amazing view from Cabo da Roca (https://www.linkedin.com/posts/wenling-y-25146887_wordoftheyear-2020resolution-activity-6618244271386447873-bz3D), where I already had a rough idea of what in my view makes a good data person. Those points are still valid: helping stakeholders to ask the right question, telling a good data story, and learning fast.

In my first blog post, I share some good practices on data storytelling. Here I would like to expand on some other tips. They are:

🧭 Find the “compass” KPIs and use them as benchmarks for quick sanity checks when you generate new insights. This allows for quick learning in a new context.

🔢 Back up yourself with data whenever possible. I mean … there is a word “data” in our titles, right? 😉

⛳️ Start with the goals. Always. This helps you identify your real value as a data person beyond “writing queries, creating dashboards, and sending spreadsheets”.

🙌 Establish your accountability, pivot yourself as an equal, competent, and trustworthy business partner and grow the same accountability from your coworkers.

Find your own compass(es) in the ocean of data.

During the five years with my current employer, together with my team I have created and shared tons of dashboards and reports, built a company-wide OKR system, created a payback calculator for our board, I could proudly say that I have become one of those who are most familiar with our data. Nevertheless, still every day I feel that I am learning something new about our business because we can always find a new lens through to look into our data.

Navigating through this complexity while maintaining our quality standards, i.e., to adapt ourselves into a new context and generate insights for our stakeholders quickly and accurately, is definitely a challenge. One thing that helped me a lot is to keep in mind 5–7 high-level KPIs and always use them as benchmarks for quick sanity checks when I generate some new insights.

How do I decide which KPIs to remember? The AARRR framework provides a good direction:

Acquisition: what is the size of our total customer base? How many new customers are acquired each month?
Activation: what is the conversion rate of our signup funnel? How many customers sign up at least once per month?
Retention: what is the Day1 retention rate of our app (%)?
Referral: how many customers (%) are acquired through our referral program? How many customers (%) recommend their families & friends to use our product?
Revenue: how many customers (%) have generated revenue for our business? What is the average of our monthly revenue in the last 12 months?

Note that we don’t have to remember the very accurate figures but the orders of the magnitude. Depending on the size of your business, for an absolute number, we may only need to recall if it is 10K, 50K, 100K, or 1M and for a percentage 10%, 50%, or 90%. Does this sound familiar to you? Yes, this is one application of Fermi Estimate!

The last reminder: in a rapidly growing business, these numbers would need to be refreshed regularly. :)

Why is this helpful? Imagine now I am working on an urgent ad-hoc request. I wrote a query to look at a customer segment I have never looked at before. The query output suggests that 70% of this segment generated revenue. Hold on — I remember that this ratio should be roughly 15% across the whole customer base. 70% vs 15%… that sounds too good to be true! Immediately I suspect that something is off with my query and I quickly figure out that I missed a WHERE clause. Phew! That 15% in my mind helped me with a quick sanity check and saved us from a potentially inaccurate deliverable.

This is why I call these figures “compass(es)” — I may not bear in mind the exact coordinates while traveling over the sea, but as long as I know where is the south I can always quickly find my way out.

Back up yourself with data whenever possible.

There is a word “Data” in my title and I know what it makes others expect from me. Even when I do not deliver any data or insights, I always make sure my arguments are backed up by data whenever possible. The word “data” here does not necessarily mean that you have to know the exact figures or mathematical formula. It is more about specifying what data you would look at and how you would use them to support your arguments and make better decisions.

I want to use one interview question as an example to explain how this works. During one interview I was asked how I would help assess the effectiveness of a catalog campaign. First of all, I started by defining the success metric as the conversion rate of the campaign, i.e., out of those who receive the catalogs how many make purchases afterward. Then the interviewer asked me, “Sounds great. How would you determine which purchases are triggered by the catalogs?” — This is essentially the tracking and collection of data. I proposed we embed a promotional code in the catalog for users to enter during checkout and use this code to identify purchases from catalog recipients. The interviewer nodded, and chased with another question, “what if some users don’t use the promotional code?” I answered, “we can apply an attribution window, that is, we attribute all purchases from the recipients within X days after the catalogs are sent as purchases from that campaign” — This is where human logic kicks in to cover the limitations of data tracking. The interviewer agreed to that proposal, and asked the final question, “how would you determine X?”. I said, “I would look at historical data of similar campaigns for a benchmark. If I see that 90% of purchases from these recipients were made within 14 days after the catalogs are sent, then I would take 14 days as the attribution window because with that I can cover at least 90% conversions. If no historical data is available, I would design a small-scale experiment to collect that data.”

I really enjoyed this interview question not because I scored all points, but because it was going so natural as a conversation with my coworkers in daily life and it covered a simple flow of using data to support decision making. No complex calculation is required at all. :)

Start with the goals. Always.

When I started as a junior data analyst, I viewed my top priority as delivering exactly what people ask of me. If someone came to me and asked for a monthly sales report, I made sure I deliver the accurate figures — not monthly customer figures, not daily sales trends, but exactly the amount of sales we closed in each month.

It did not take long before I got the impression that this may not be the best work mode. Sometimes people ask for a monthly sales report, I deliver it and both of us are happy. However, the next day they come to me again and ask “Hey, actually I need this report to design a new bonus scheme for our sales agents. Can we break it down by agent name?” For sure! But wouldn’t it be even better if I can have a good understanding of my colleague’s goal beforehand so that we can together come up with a better solution on our first try?

Sometimes I also get requests that do not make full sense in their original forms. For this, I can share an even more concrete example. In our current setup, my team is responsible for helping Product Managers set up success metrics and data tracking for their features. One day a PM came to me and asked me to implement a backend event that is triggered when a certain document is SENT to a user. The first check I did together with him is to see what KPIs he was looking to measure with the tracking. It turned out that he wanted to see out of those users who SEE the document in their accounts, how many would take a certain action. Then I explained to him that we actually already have a frontend event in a place that tracks users’ view on that document and that the new backend event he suggested is not necessary and it won’t help measure the KPI he wants to look at (as it tracks the SEND action rather than VIEW action). My colleague eventually took my suggestion. By doing so, I actually did not deliver the thing that my stakeholder asked me to do, but did I deliver any value? I think yes! I helped my colleague clarify a myth around the KPI definition and save us one task that would cost engineering capacity — we have really severe resource constraint on this matter just like every startup :)

My takeaway from this example is that as a data person (be it data analyst, data scientist, data product manager… all those data-related non-engineering roles seem to have increasingly blurry boundaries these days), sometimes you create value for your coworkers and your business not by executing exactly their wishes but by helping them achieve their ultimate goals. This starts by figuring together with them what their goals really are instead of taking their requirements as they are presented in the first place. Actually, I will quote one of my supervisors, “The difference between junior data analysts and senior data analysts is that: juniors always do what they are told to do yet seniors don’t.”

One last bonus point on this topic: understanding the goal of your stakeholders, that is, how the data or insights you deliver will help them make their decisions and what kind of decisions they are, definitely helps you understand your real value as a data person and boost your confidence. Your real value is not demonstrated by how many dashboards or analyses you have delivered, but by the impacts on business outcomes and user experiences that are inspired and driven by your deliverables.

Establish your accountability as a trusted business partner.

We’ve and seen heard the word “stakeholder management” in many places — actually, so many that to be honest, I start to get a bit fed up with it. 😅 There are many great pieces of content on the internet about this area, so here I just want to share two things that I found really useful in earning the trust of my stakeholders as a data person. In my first blog post How I do Data Storytelling, I mentioned one tip “Know your data through and through” as one way to maintain people’s trust in the data, the two things I shared here would be a little more generic and can be transferable to other professions as well.

One thing is to keep your main stakeholders well informed about the major milestones and blockers while working on rather big analyses or projects. “Big analyses” refers to the type of analyses that usually have: 1) explorative and open-end research questions, 2) high uncertainty about the quality of source data and thus deliverables (Remember — Garbage in, garbage out 😉), and 3) a rather long timeline for delivery (e.g., >1 week).

What I usually do in this case is to schedule 2–3 touchpoints along the timeline with my stakeholders where I give a brief description of where we stand, what we have achieved, and what is blocking us. It does not have to be a meeting — a short email can do the job as well. The goal of this touchpoint is not to simply let people know “hey we are doing our job”, but to seek instant feedback about our methodology and delivery as well as to seek support when we are stuck. By doing so, not only my stakeholders but also myself would have more transparency into the delivery, more flexibility to adapt the scope, more valuable feedback for improvements, and more support if we just ask for it! In the end, together we will make better-informed decisions.

Another thing that I found valuable is to always close the loop. Every day we deliver new insights and data, sparkling exchanges take place, new ideas and feedback come to our tables. In a rapidly changing workplace, it is very easy to lose traces and end up with an unresolved backlog — that makes me feel unachieved. Hence, most of the time I make sure I get feedback or acknowledgment for my deliverables (at least for those with high-level goals behind them). If I deliver any insights or data, more than sending over an excel sheet or PDF, I would always invite for feedback (e.g., “let us know if you still have any questions”), and I would not consider a task as resolved until I get a confirmative “yes, we don’t have any questions for now”. This helps me keep a maintainable backlog as I know there is hardly any debt but just more new ideas to be explored. Eventually, this contributes to the efficiency and reliability of my work mode, as I don’t have to worry that one day some unresolved task would pop up like a mine and block other deliveries.

In fact, this is not only about establishing my own image as an equal, competent, and trustworthy business partner but also about growing the same accountability from my coworkers. This process is definitely not easy, but I believe in the end this helps foster a strong trust and bond between me and my coworkers (at least those who share the same goal).

That’s it!

As always, let me know if you find this useful or share any feedback!