The 6 Steps of a SQL Select Statement Process

A detailed explanation of a query execution order using a simple example

Wendy Navarrete
Towards Data Science

--

Licensed by Pixabay

Introduction

In this article, I will describe step by step the logical process phases during the execution of a query into a relational database. For doing that, I am going to use two simple unnormalized form tables: Citizen and City. They are described as followed:

Tables Description

Citizen table contains data of distinguished citizens and the identification number of the city they live on, and City is the table with city names and their respective identification number.

Data

Let’s say that we want to know the name of only two city names, except San Bruno, where two or more citizens are living on. We also want the result ordered alphabetically.

This is the query to get the required information.

SELECT city.city_name AS "City"
FROM citizen
JOIN city
ON citizen.city_id = city.city_id
WHERE city.city_name != 'San Bruno'
GROUP BY city.city_name
HAVING COUNT(*) >= 2
ORDER BY city.city_name ASC
LIMIT 2

Query Process Steps

1. Getting Data (From, Join)
2. Row Filter (Where)
3. Grouping (Group by)
4. Group Filter (Having)
5. Return Expressions (Select)
6. Order & Paging (Order by & Limit / Offset)

Step 1: Getting Data (From, Join)

FROM citizen
JOIN city

The first step in the process is the execution of the statements in From clause followed by the Join clause. The result of these operations is getting a cartesian product of our two tables.

Cartesian Product

Once the From and Join were executed, the processor will get the qualified rows based on the condition On.

ON citizen.city_id = city.city_id
Qualified Rows

Step 2: Row Filter (Where)

After getting qualified rows, it is passed on to the Where clause. This evaluates every row using conditional expressions. When rows do not evaluate to true, they will be removed from the set.

WHERE city.city_name != 'San Bruno'

Step 3: Grouping (Group by)

The next step is to execute Group by clause, it will group rows that have the same values into summary rows. After this point, all Select expressions will be evaluated per group, instead of being evaluated per row.

GROUP BY city.city_name

Step 4: Group Filter (Having)

The Having clause consists of a logical predicate, it is processed after the Group by and it can no longer refer to individual rows, only to groups of rows.

HAVING COUNT(*) >= 2

The result of executing this operation will keep the set as the figure above. This is because there are two or more elements in every group.

Step 5: Return Expressions (Select)

During this step, the processor evaluates what will be printed as a result of the query, and if there are some functions to run on data like Distinct, Max, Sqrt, Date, Lower, etc. In this case, the select clause just prints the city names and alias the city_name column with the identifier “City”.

SELECT city.city_name AS "City"

Step 6: Order (Order by) and Paging (Limit / Offset)

The final processing steps of the query deal with presentation ordering and the ability to limit the size of the result set. In our example, it is required to present a maximum of two records ordered alphabetically.

ORDER BY city.city_name ASC
LIMIT 2
Result-set

Conclusions

A better understanding of how things work, better results can be obtained. Having fundamental understandings of how a Select statement is executed, it will make easier to get the expected result-set.

Hopefully, this article is helpful for you as well.

--

--