Introduction
SQL is the universal language in the data world. If you’re a data analyst, data scientist, data engineer, data architect, etc., you need to write good SQL code.
Learning how to write basic SQL queries isn’t hard, but learning how to write good SQL code is another story. Writing good SQL code isn’t necessarily hard, but it does require learning some rules.
If you’re fluent in Python or another coding language, some of these rules might seem familiar with you, and that’s because they’re very much transferable!
And so, I’m going to share with you five tips to write cleaner and better SQL queries. Let’s dive into it:
1. Programming Style
If you think that your Programming style is trivial for SQL, think again. The style in which you code is essential for interpretability and minimizing errors.
Let me give you a quick example – which of the following two blocks of code are more legible?
SELECT name,height,weight,age,gender,CASE WHEN age<18 THEN "child" ELSE "adult" END AS childOrAdult,salary
FROM People LEFT JOIN Income USING (name)
WHERE height<=200 and age<=65
or
SELECT name
, height
, weight
, age
, gender
, CASE WHEN age < 18 THEN "child"
ELSE "adult"
END AS childOrAdult
, salary
FROM People
LEFT JOIN Income USING (name)
WHERE height <= 200
AND age <= 65
It’s quite obvious that the second one is much easier to read and that’s completely attributed to its programming style! There are several components of programming style, but we’re going to focus on indentation and spaces.
Indentation/Alignment
The main reason why the second block of code is more legible is because of its indentation and vertical alignment. Notice how all of the column names in the SELECT clause are aligned, the conditions in the WHERE clause are aligned, and the conditions in the CASE statement are aligned.
You don’t necessarily have to follow this exact way, but you should apply indentation and align your statements where ever you can.
Spaces
Spaces refers to the white space that you use in your code. For example, instead of…
WHERE height<=200 AND age<=65
consider using white spaces to make it more legible:
WHERE height <= 200 AND age <= 65
Most importantly, you want to make sure that your programming style is consistent throughout your code. There are other things to consider like naming conventions and commenting, which we’ll cover later in this article.
2. Modularize Code via Common Table Expressions
Using common table expressions (CTEs) is a great way to modularize and break down your code, the same way that you would break down an essay into several paragraphs.
You can check out this article if you want to learn more about CTEs in greater depth but if you ever wanted query a query, that’s when CTEs come into play – CTEs essentially create a temporary table.
Consider the following query with a sub query in the where clause.
SELECT name
, salary
FROM People
WHERE name in (SELECT DISTINCT name
FROM population
WHERE country = "Canada"
AND city = "Toronto")
AND salary >= (SELECT AVG(salary)
FROM salaries
WHERE gender = "Female")
This may not seem to difficult to understand, but what if there were many sub queries or sub queries in sub queries? This is where CTEs come into play.
with toronto_ppl as (
SELECT DISTINCT name
FROM population
WHERE country = "Canada"
AND city = "Toronto"
)
, avg_female_salary as (
SELECT AVG(salary) as avgSalary
FROM salaries
WHERE gender = "Female"
)
SELECT name
, salary
FROM People
WHERE name in (SELECT DISTINCT FROM toronto_ppl)
AND salary >= (SELECT avgSalary FROM avg_female_salary)
Now it’s clear that the WHERE clause is filtering for names in Toronto. If you noticed, CTEs are useful because you can break down your code into smaller chunks, but they’re also useful because it allows you to assign a variable name to each CTE (i.e. toronto_ppl and avg_female_salary)
Note: programming style also applies for CTEs!
Speaking of naming conventions, that leads to my next point:
3. Variable Naming Conventions
There’s two parts to naming conventions that you should consider: the type of letter-casing that you use and how descriptive your variables are.
Letter-casing
Personally, I like to use snake_case for naming CTEs and camelCase for column names. You can choose to style your variables however you like but make sure that you’re consistent.
Here’s an example:
with short_people as (
SELECT firstName
FROM people
WHERE height < 165
)
SELECT * FROM short_people
Notice how i used snake_case for the CTE (short_people) and used camelCase for firstName.
Descriptive Names
Ideally, you want your variable names to describe what they represent. Consider my previous example:
with toronto_ppl as (
SELECT DISTINCT name
FROM population
WHERE country = "Canada"
AND city = "Toronto"
)
, avg_female_salary as (
SELECT AVG(salary) as avgSalary
FROM salaries
WHERE gender = "Female"
)
SELECT name
, salary
FROM People
WHERE name in (SELECT DISTINCT FROM toronto_ppl)
AND salary >= (SELECT avgSalary FROM avg_female_salary)
It’s very clear that the first CTE is querying people from toronto and the second CTE is getting the average female salary. This would be an example of poor naming conventions:
with table1 as (
SELECT DISTINCT name
FROM population
WHERE country = "Canada"
AND city = "Toronto"
)
, table2 as (
SELECT AVG(salary) as var1
FROM salaries
WHERE gender = "Female"
)
SELECT name
, salary
FROM People
WHERE name in (SELECT DISTINCT FROM table1)
AND salary >= (SELECT var1 FROM table2)
If you only read the last query, it’s very difficult to understand what it’s doing. So make sure that your casing is consistent and that your variable names are descriptive!
4. Simplify Code Using Temporary Functions
If you want to read more about temporary functions, check this out, but temporary functions are also a good way of breaking down code, writing cleaner code, and being able to reuse code.
Consider the following example:
SELECT name
, CASE WHEN tenure < 1 THEN "analyst"
WHEN tenure BETWEEN 1 and 3 THEN "associate"
WHEN tenure BETWEEN 3 and 5 THEN "senior"
WHEN tenure > 5 THEN "vp"
ELSE "n/a"
END AS seniority
FROM employees
Instead, you can leverage a temporary function to capture the CASE clause.
CREATE TEMPORARY FUNCTION seniority(tenure INT64) AS (
CASE WHEN tenure < 1 THEN "analyst"
WHEN tenure BETWEEN 1 and 3 THEN "associate"
WHEN tenure BETWEEN 3 and 5 THEN "senior"
WHEN tenure > 5 THEN "vp"
ELSE "n/a"
END
);
SELECT name
, seniority(tenure) as seniority
FROM employees
With a temporary function, the query itself is much simpler, more readable, and you can reuse the seniority function!
5. Write Useful Comments
It’s important to write comments only when you need to. By using descriptive names, writing modular code, and having a clean programming style, you shouldn’t need to write many comments.
That being said, comments are useful when the code itself can’t explain what you’re trying to achieve. Comments typically answer "why" you’re doing something rather than "what" you’re doing.
Here’s an example of bad commenting:
# Getting names of people in Toronto, Canada
with table1 as (
SELECT DISTINCT name
FROM population
WHERE country = "Canada"
AND city = "Toronto"
)
# Getting the average salary of females
, table2 as (
SELECT AVG(salary) as var1
FROM salaries
WHERE gender = "Female"
)
These are poor comments because it’s telling us what we’d already know by reading the code itself. Remember, comments typically answer "why" you’re doing something rather than "what" you’re doing.
Thanks for Reading!
If you made it to the end, I hope that you learned a thing or two. As I said before, learning how to write good Sql code is essential for all kinds of data professions, so make sure that you take the time to learn how to write good code!
If you enjoyed this, be sure to follow me on Medium for future content, and as always, I wish you the best in your learning endeavors!
Not sure what to read next? I’ve picked another article for you:
and another one!
A Complete 52 Week Curriculum to Become a Data Scientist in 2021
Terence Shin
- If you enjoyed this, follow me on Medium for more
- Interested in collaborating? Let’s connect on LinkedIn
- Sign up for my email list here!