My Top 3 SQL Interview Questions

Shirish Joshi
Towards Data Science
4 min readDec 16, 2019

--

I frequently interview candidates for positions of Data Analysts, Data Engineers and Data Architects, and sometimes participate in interviews for app server/microservice developers too.

Before I start an interview, I am always reminded of a friend’s favorite quote about interviews “A fool can ask more questions than a wise man can answer”. He told me this before I took my first interview, and I assumed it was to put me at ease. Over time, I realized he meant more. Now, looking back, I think that it sure helps to have friends who drill sense into you about keeping yourself forever grounded!

Participating in interviews where the interviewer tries to dazzle and overshadow the candidate with edge cases or learnings from experiences has me frequently shaking my head, about how the true intent of the interview is defeated. The notion that the interviewer is in a position of power is very wrong.

Interviews, for me, are less about specific syntax, and more about gauging the candidate’s understanding of concepts, application of problem solving methodology, ability to learn from mistakes, approach to testing and validation, and how much that person would be a team player.

TL:DR version: Top 3 SQL interview questions (beginner, intermediate, experienced)

Interview stock image (Photo by Amy Hirschi on Unsplash)

Q1: If you make a typo in every clause of the SQL query, what would the first error detected?

This question is intended for beginners, or those that have SQL-as-a-Second-Language, typically Java, JS or dotNET developers.

Here is an example query (with 1 added to each line to represent the typo)

SELECT count(e.empno1) count_emp, d.dname

FROM emp1 e

INNER JOIN dept d on e.deptno = d.deptno

WHERE e.sal1 > 2000

GROUP BY d.dname1

ORDER BY count_emp1;

The answer to the question is always the FROM clause. Why? Because SQL is about sets. Here is the query in plain English.

From the set of Employees, that match to Departments by department name,

Filter those with salary greater than 2000,

Group on department name, and count employees per department,

Select the count of employees, and name of department

Order the result by count of employees.

Understanding this flow assures me that the candidate knows enough of SQL concepts, and knowing that SQL comes from set theory.

Starting with SELECT is not the right approach.

Overall, it is important that the candidate thinks about sets and relational theory.

Q2. Can you rewrite an SQL query using IN or EXISTS to a JOIN query ? How does it impact performance ?

This question is targeted to intermediate experienced database professionals or those who have worked in data processing applications over some time.

The key to this question is query rewriting. Query equivalence and rewriting is important when you are refactoring queries, or migrating from a database platform to another. It demonstrates an understanding of SQL where you would be considered an independent database developer.

The IN, and EXISTS can be replaced by an INNER JOIN, with some caveats.

  1. If the dataset in the IN or EXISTS has multiple rows, the JOIN would result in more rows than intended.
  2. The IN or EXISTS would be processed differently. IN and EXISTS may be rewritten by the query optimizer as a JOIN sometimes. In some cases, that IN or EXISTS subquery would have been materialized. But if not, it would be processed per row of the outer query.
  3. Indexes, and data storage locality would play a large part in how it will be processed in both cases.

I look for people talking from experience in the database they are working in, and their experiences in each of the DB platforms and configurations — Indexes, Materialized Views, Vertica Projections, SQL Server Columnstore Indexes, Hybrid Columnar Storage in Oracle, RedShift sort keys, Teradata MPP processing — are valuable in understanding the candidate.

Q3. Explain and discuss the difference between HASH JOIN and SORT MERGE JOIN, and how the database you have worked on handles them.

This one is targeted to somewhat more experienced database professionals.

Understanding how JOINs are processed in queries, reading the query plans, trying to figure out how the database processes the queries, is always something that experienced engineers are supposed to look at. Follow-up questions could be to check what was the performance before, and after. What kind of testing or validation framework was established to measure performance? If a query rewrite was involved, what was the validation mechanism to ensure data correctness?

Talking about the situations experienced, the learnings from the situation, how the situation was addressed, helps understand the candidate better.

Like before, each database and query processing engine presents its own challenge, and set of options. A candidate who can discuss issues in depth, and displays a systems thinking approach, is obviously favored.

Conclusion

Some would say that by writing out these questions, I have shown my cards. However, the same concepts can be tested using other questions. The literal questions don’t matter.

I felt the need to put these questions out, because most SQL interview questions on the web range from “Explain 3rd Normal Form” to “Find Duplicate records”. None of these questions on the web really address concepts of query processing, understanding DB technologies, or check depth of understanding a practitioner would develop.

References and Further Reading

  1. Faroult, S., & Robson, P. (2006). The Art of SQL. “ O’Reilly Media, Inc.”.
  2. Evans, Julia. SQL Queries Don’t Start with SELECT — Julia Evans, jvns.ca/blog/2019/10/03/sql-queries-don-t-start-with-select/.

--

--