The world’s leading publication for data science, AI, and ML professionals.

Five Best Practices for Writing Clean and Professional SQL Code

Level up your SQL code with these five tips!

Photo by The Creative Exchange on Unsplash
Photo by The Creative Exchange on Unsplash

Introduction

SQL is the universal language in the data world. If you’re a data analyst, data scientist, data engineer, data architect, etc., you need to write good SQL code.

Learning how to write basic SQL queries isn’t hard, but learning how to write good SQL code is another story. Writing good SQL code isn’t necessarily hard, but it does require learning some rules.

If you’re fluent in Python or another coding language, some of these rules might seem familiar with you, and that’s because they’re very much transferable!

And so, I’m going to share with you five tips to write cleaner and better SQL queries. Let’s dive into it:


1. Programming Style

If you think that your Programming style is trivial for SQL, think again. The style in which you code is essential for interpretability and minimizing errors.

Let me give you a quick example – which of the following two blocks of code are more legible?

SELECT name,height,weight,age,gender,CASE WHEN age<18 THEN "child" ELSE "adult" END AS childOrAdult,salary
FROM People LEFT JOIN Income USING (name)
WHERE height<=200 and age<=65

or

SELECT name
       , height
       , weight
       , age
       , gender
       , CASE WHEN age < 18 THEN "child"
              ELSE "adult"
         END AS childOrAdult
       , salary
FROM People
LEFT JOIN Income USING (name)
WHERE height <= 200
      AND age <= 65

It’s quite obvious that the second one is much easier to read and that’s completely attributed to its programming style! There are several components of programming style, but we’re going to focus on indentation and spaces.

Indentation/Alignment

The main reason why the second block of code is more legible is because of its indentation and vertical alignment. Notice how all of the column names in the SELECT clause are aligned, the conditions in the WHERE clause are aligned, and the conditions in the CASE statement are aligned.

You don’t necessarily have to follow this exact way, but you should apply indentation and align your statements where ever you can.

Spaces

Spaces refers to the white space that you use in your code. For example, instead of…

WHERE height<=200 AND age<=65

consider using white spaces to make it more legible:

WHERE height <= 200 AND age <= 65

Most importantly, you want to make sure that your programming style is consistent throughout your code. There are other things to consider like naming conventions and commenting, which we’ll cover later in this article.


2. Modularize Code via Common Table Expressions

Using common table expressions (CTEs) is a great way to modularize and break down your code, the same way that you would break down an essay into several paragraphs.

You can check out this article if you want to learn more about CTEs in greater depth but if you ever wanted query a query, that’s when CTEs come into play – CTEs essentially create a temporary table.

Consider the following query with a sub query in the where clause.

SELECT name
       , salary
FROM People
WHERE name in (SELECT DISTINCT name 
                   FROM population 
                   WHERE country = "Canada"
                         AND city = "Toronto")
      AND salary >= (SELECT AVG(salary)
                     FROM salaries
                     WHERE gender = "Female")

This may not seem to difficult to understand, but what if there were many sub queries or sub queries in sub queries? This is where CTEs come into play.

with toronto_ppl as (
   SELECT DISTINCT name
   FROM population
   WHERE country = "Canada"
         AND city = "Toronto"
)
, avg_female_salary as (
   SELECT AVG(salary) as avgSalary
   FROM salaries
   WHERE gender = "Female"
)
SELECT name
       , salary
FROM People
WHERE name in (SELECT DISTINCT FROM toronto_ppl)
      AND salary >= (SELECT avgSalary FROM avg_female_salary)

Now it’s clear that the WHERE clause is filtering for names in Toronto. If you noticed, CTEs are useful because you can break down your code into smaller chunks, but they’re also useful because it allows you to assign a variable name to each CTE (i.e. toronto_ppl and avg_female_salary)

Note: programming style also applies for CTEs!

Speaking of naming conventions, that leads to my next point:


3. Variable Naming Conventions

There’s two parts to naming conventions that you should consider: the type of letter-casing that you use and how descriptive your variables are.

Letter-casing

Personally, I like to use snake_case for naming CTEs and camelCase for column names. You can choose to style your variables however you like but make sure that you’re consistent.

Here’s an example:

with short_people as (
   SELECT firstName
   FROM people
   WHERE height < 165
)
SELECT * FROM short_people

Notice how i used snake_case for the CTE (short_people) and used camelCase for firstName.

Descriptive Names

Ideally, you want your variable names to describe what they represent. Consider my previous example:

with toronto_ppl as (
   SELECT DISTINCT name
   FROM population
   WHERE country = "Canada"
         AND city = "Toronto"
)
, avg_female_salary as (
   SELECT AVG(salary) as avgSalary
   FROM salaries
   WHERE gender = "Female"
)
SELECT name
       , salary
FROM People
WHERE name in (SELECT DISTINCT FROM toronto_ppl)
      AND salary >= (SELECT avgSalary FROM avg_female_salary)

It’s very clear that the first CTE is querying people from toronto and the second CTE is getting the average female salary. This would be an example of poor naming conventions:

with table1 as (
   SELECT DISTINCT name
   FROM population
   WHERE country = "Canada"
         AND city = "Toronto"
)
, table2 as (
   SELECT AVG(salary) as var1
   FROM salaries
   WHERE gender = "Female"
)
SELECT name
       , salary
FROM People
WHERE name in (SELECT DISTINCT FROM table1)
      AND salary >= (SELECT var1 FROM table2)

If you only read the last query, it’s very difficult to understand what it’s doing. So make sure that your casing is consistent and that your variable names are descriptive!


4. Simplify Code Using Temporary Functions

If you want to read more about temporary functions, check this out, but temporary functions are also a good way of breaking down code, writing cleaner code, and being able to reuse code.

Consider the following example:

SELECT name
       , CASE WHEN tenure < 1 THEN "analyst"
              WHEN tenure BETWEEN 1 and 3 THEN "associate"
              WHEN tenure BETWEEN 3 and 5 THEN "senior"
              WHEN tenure > 5 THEN "vp"
              ELSE "n/a"
         END AS seniority 
FROM employees 

Instead, you can leverage a temporary function to capture the CASE clause.

CREATE TEMPORARY FUNCTION seniority(tenure INT64) AS (
   CASE WHEN tenure < 1 THEN "analyst"
        WHEN tenure BETWEEN 1 and 3 THEN "associate"
        WHEN tenure BETWEEN 3 and 5 THEN "senior"
        WHEN tenure > 5 THEN "vp"
        ELSE "n/a"
   END
);
SELECT name
       , seniority(tenure) as seniority
FROM employees

With a temporary function, the query itself is much simpler, more readable, and you can reuse the seniority function!


5. Write Useful Comments

It’s important to write comments only when you need to. By using descriptive names, writing modular code, and having a clean programming style, you shouldn’t need to write many comments.

That being said, comments are useful when the code itself can’t explain what you’re trying to achieve. Comments typically answer "why" you’re doing something rather than "what" you’re doing.

Here’s an example of bad commenting:

# Getting names of people in Toronto, Canada
with table1 as (
   SELECT DISTINCT name
   FROM population
   WHERE country = "Canada"
         AND city = "Toronto"
)
# Getting the average salary of females
, table2 as (
   SELECT AVG(salary) as var1
   FROM salaries
   WHERE gender = "Female"
)

These are poor comments because it’s telling us what we’d already know by reading the code itself. Remember, comments typically answer "why" you’re doing something rather than "what" you’re doing.


Thanks for Reading!

If you made it to the end, I hope that you learned a thing or two. As I said before, learning how to write good Sql code is essential for all kinds of data professions, so make sure that you take the time to learn how to write good code!

If you enjoyed this, be sure to follow me on Medium for future content, and as always, I wish you the best in your learning endeavors!

Not sure what to read next? I’ve picked another article for you:

How to Write All of Your SQL Queries in Pandas

and another one!

A Complete 52 Week Curriculum to Become a Data Scientist in 2021

Terence Shin


Related Articles