
If you’re a data analyst and you’re currently searching for a job, you’ll likely need to prepare for the technical interview, most commonly carried out in the form of a Sql challenge. Structured Query Language (SQL) is the bread and butter of data analytics. It’s the language that communicates with the database and helps retrieve, clean and analyze information. As a result, it’s only natural for a hiring committee to want to know the candidate’s level of comfort using SQL to query a database.
In this post, I won’t be sharing the what of the SQL challenge (what do I need to know? What questions should I prepare for? What resources can help me prepare?) but rather the how of it: How should I be thinking about the problem? How to break it down? How can I communicate potential challenges to the interviewer? Sometimes, having a framework of reference to tackle difficult situations beats the sheer repetition of going through the difficult situation itself. This post is about giving you that framework of reference.
Ok, enough talking, let’s get into it! 😄
Where do I start?
Before jumping into the problem, ask yourself the following questions:

Can I trust the data?
Often, datasets won’t come in a neat little package that’s ready for analysis – you’ll see email addresses recorded twice, a lot of blank values, spelling mistakes, etc. Express skepticism about the data. Are there any duplicate values? Is there missing information? Do I need to exclude outliers from the final output? Should I perform transformation on the data (e.g. remove dashes in a series of digits)? While it’s not your job to clean the data during the interview, you want to make sure that you have all the pieces of the puzzle to solve the question you’re being asked. Don’t assume that the data you’re working with is pristine. In real life, it never is.
--identifying duplicate values
SELECT column_name, COUNT(*)
FROM table_name
GROUP BY column_name
HAVING COUNT(*) > 1
--identifying blank values
SELECT column_name
FROM table_name
WHERE column_name IS NULL
/*transforming a column that contains
SSN information (can be both a series
of digits only or a series of digits
and dashes)*/
SELECT
ssn,
REPLACE(ssn, '-', '') AS transformed_ssn
FROM table_name
What assumptions should I make?
While data wrangling is an important step, you likely have between 30 to 60 minutes to solve the challenge. Given the time constraint, it’s important to make assumptions and move on. Be sure to explicitly convey the assumptions you’ll be making. If you’re not sure you’re making the right assumptions, it’s ok to ask the interviewer whether you’re on the right track. You don’t want to go too far into your analysis only to realize halfway through that you didn’t make the right assumptions (or not realize it and have an output that is far from the expected result).
What is the problem to solve?
That’s your Big Picture. Think about the problem you’re trying to solve for. While the SQL challenge may ask you to do x or y, what is it that you’re actually being asked? Again, make sure to restate the problem to the interviewer. This helps for two reasons: (1) you demonstrate your communication skills; and (2) you ensure that you and the interviewer are on the same page.
How to break down the question?
This is the essence of the interview (and probably why you’re still reading this article!). You’ll spend about 2/3 of your time here, setting up your logic, analyzing the data and testing it to make sure it runs and returns the expected output. A word of caution: don’t press the "run" button every 5 seconds, hoping that the output will help you solve the challenge. Really sit down with the problem and be your own query interpreter, running line by line to understand what happens at each step.
This step really depends on the question you’re being asked but you can use the following framework to help you through the SQL challenge: Isolate, Rank, Build.

Isolate
The first step is to isolate the different pieces required to solve the puzzle. A difficult SQL challenge may seem daunting. That’s why it’s important to break it down into smaller units that are easier to solve on their own. For that, you’ll need to take a look at the question and determine what those units are.
Rank
The next part is to identify how you’ll prioritize each unit to solve. After isolating each one of them into manageable bite-pieces, which one should you tackle first? And why? A tip to keep in mind: start general, end specific. Get the data you need first and do the aggregations last.
Build
The final step is to build the logic for each of the units you’ve identified. At this point, you’ve written pseudo-code, maybe some SQL to explore the data and understand how you’ll solve the problem. Now is the time to write the building blocks of your query. That’s where all your SQL knowledge comes in handy: CASE statements, JOIN types, window functions, common table expressions (CTEs), and subqueries are good tools to have in your back pocket.
I’m stuck – how do I continue?
There’s a chance you might not know how to continue. Been there, done that. First of all, relax. Take a deep breath. A mind that spirals out of control doesn’t do much good when faced with a complex problem. Second of all, remember that the interview is a conversation. The interviewer wants to see how you react when a difficult situation is thrown at you. Will you persist or give up? Will you seek help or try to solve yourself before asking for guidance? What is your thought process?
If you’re stuck, here are some tips that can help you:

Let the interviewer know
Make sure to let the interviewer know you’re stuck. Tell them where and why. It’ll help for a few reasons: (1) You might end up answering your own question – communicating challenges can help you identify an important piece of information you might have initially missed but that can help you resolve the issue you were dealing with (2) You’re not wasting time staring blankly at the screen or the whiteboard (pre-pandemic times!) not knowing how to proceed (3) You’re showing humility and a will to collaborate, which are key qualities in any role you’ll take on.
Ask questions
No data analyst comes into a role with all the answers and know-how required for the job. A huge component of a DA’s success is the ability to ask good questions. In your interview, you want to demonstrate curiosity and showcase your problem-solving skills by asking clarifying questions. You won’t be penalized for it. However, be strategic about the questions you ask. If you’re stuck because you don’t know the basic structure of a SELECT statement, you probably should review your fundamentals before even taking the interview.
Talk out loud
One way to work through a challenging problem is to talk out loud. You can use a process of elimination to figure out how to continue the challenge. "Should I do a self-JOIN on the table? Probably not, I don’t need to get the same information from the same table in different ways. What about using a CTE to get the information I need and join outside of it? Possible, but I don’t account for the two kinds of aggregations I’m being asked to do." Etc. It’ll help you narrow down solutions while allowing the interviewer to see how you think and what hint to give if you are still stuck.
Conclusion
A framework is a good way to structure your thoughts and guide you through a complex SQL challenge. To see how other analysts tackle the SQL question, you can check out Tina Huang’s SQL Sundays playlist. In 5- to 10-minute videos, she gives a step-by-step demo on how to solve the SQL challenge.
While using a framework helps, you’ll still need a good amount of practice to ace the Technical Interview. Here are fantastic resources to prepare for the tech interview: "How to Ace Data Science Interviews: SQL", "SQL Interview Questions you Must Prepare", "Cracking the SQL Interview." You can practice SQL challenges using platforms like Leetcode, InterviewQuery, Codewars, Stratascratch, etc.
In my next post, I’ll do a step-by-step walkthrough of a SQL challenge – stay tuned!
This post was last edited on October 8, 2021. The opinions expressed here belong solely to myself, and do not reflect the views of my employer.