The world’s leading publication for data science, AI, and ML professionals.

7 Examples to Master SQL Joins

A comprehensive practical guide

Photo by Mineragua Sparkling Water on Unsplash
Photo by Mineragua Sparkling Water on Unsplash

SQL is a Programming language used by most relational database management systems (RDBMS) to manage data stored in tabular form (i.e. tables). A relational database consists of multiple tables that relate to each other. The relation between tables is formed with shared columns.

When we are to retrieve data from a relational database, the desired data is typically spread out to multiple tables. In such cases, we use SQL joins which are used to handle tasks that include selecting rows from two or more related tables.

In order to be consistent while selecting rows from different tables, Sql joins make use of the shared column. In this article, we will go over 7 examples to demonstrate how SQL joins can be used to retrieve data from multiple tables.

I have prepared two tables with made up data. The first one is the customer table that contains information about the customers of a retail business.

customer table (image by author)
customer table (image by author)

The second one is the orders table that contains information about the orders made by these customers.

orders table (image by author)
orders table (image by author)

These two tables relate to each other by the cust_id column.


Note: There are many relational database management systems such as MySQL, SQL Server, SQLite, and so on. Although they share mostly the same SQL syntax, there might be small differences. I’m using MySQL for this article.


Example 1

We want to see the average age of customers who made a purchase on 2020–01–17.

mysql> select avg(customer.age), orders.date
    -> from customer
    -> join orders
    -> on customer.cust_id = orders.cust_id
    -> where orders.date = '2020-01-17';
+-------------------+------------+
| avg(customer.age) | date       |
+-------------------+------------+
|           32.7273 | 2020-01-17 |
+-------------------+------------+

In a normal select statement, we only write the name of the columns to be selected. When we join tables, the columns are specified with the name of the table so that SQL knows where a column comes from.

Then we write the names of the tables with join keyword (e.g. customer join orders). The "on" keyword is used to indicate how these tables are related. The where statement filters the rows based on the given condition.


Example 2

We want to see the average order amount made by customers in Austin.

mysql> select avg(orders.amount)
    -> from customer
    -> join orders
    -> on customer.cust_id = orders.cust_id
    -> where customer.location = "Austin";
+--------------------+
| avg(orders.amount) |
+--------------------+
|          50.572629 |
+--------------------+

The logic is the same. You may have noticed a small difference between the second and first examples. In the second example, we did not select the location but use it as a filtering condition in the where statement. Either option works fine. We do not have to select all the columns we use for filtering.

The aggregate function is applied while selecting the column just like in normal select statements.


Example 3

We want to see the average order amount for each city.

This is similar to the second example with a small difference. We have to also select the location column because it will be used to group the rows.

mysql> select customer.location, avg(orders.amount)
    -> from customer
    -> join orders
    -> on customer.cust_id = orders.cust_id
    -> group by customer.location;
+----------+--------------------+
| location | avg(orders.amount) |
+----------+--------------------+
| Austin   |          50.572629 |
| Dallas   |          47.624540 |
| Houston  |          50.109382 |
+----------+--------------------+

Example 4

We want to see the highest order amount and the age of customer who made that order.

mysql> select customer.age, orders.amount
    -> from customer
    -> join orders
    -> on customer.cust_id = orders.cust_id
    -> order by orders.amount desc
    -> limit 1;
+------+--------+
| age  | amount |
+------+--------+
|   41 |  99.95 |
+------+--------+

We select the age from customer table and amount from the orders table. One way to filter the highest amount is to sort the values in descending order and take first one. The order by statement sorts the rows based on the values in the given column. The default behavior is to sort in ascending order but we change it using the desc keyword.


Example 5

We want to see the highest order amount made by customer whose id is 1006.

mysql> select max(orders.amount) 
    -> from customer 
    -> join orders 
    -> on customer.cust_id = orders.cust_id 
    -> where customer.cust_id = 1006;
+--------------------+
| max(orders.amount) |
+--------------------+
|              93.18 |
+--------------------+

We use the max function on the amount column and filter the rows that belong to customer whose id is 1006.


Example 6

We want to see the top 5 customers from Houston in terms of the highest average order amount when there is sale.

We will be using two conditions in the where statement, group the values by the cust_id column, sort the rows based on the average amount in descending order, and select the first 5 rows.

mysql> select c.cust_id, avg(o.amount) as average 
    -> from customer c 
    -> join orders o 
    -> on c.cust_id = o.cust_id 
    -> where c.location = "Houston" and o.is_sale = "True" 
    -> group by c.cust_id 
    -> order by average desc limit 5;
+---------+-----------+
| cust_id | average   |
+---------+-----------+
|    1821 | 70.150000 |
|    1299 | 67.405000 |
|    1829 | 65.225000 |
|    1802 | 64.295000 |
|    1773 | 64.012500 |
+---------+-----------+

In this example, we have used an alias for each table name and the aggregated column. It makes easier to write the query because we write the table names many times.


Example 7

We want to find out the location of the customer who has the lowest order amount on 2020–02–09.

In this example, we will implement a nested select statement to be used as a condition in the where statement.

mysql> select c.cust_id, c.location
    -> from customer c
    -> join orders o
    -> on c.cust_id = o.cust_id
    -> where o.date = "2020-02-09" and o.amount = (
    -> select min(amount) from orders where date = "2020-02-09"
    -> );
+---------+----------+
| cust_id | location |
+---------+----------+
|    1559 | Dallas   |
+---------+----------+

We are given two conditions. The first one is specific date that we can directly write in the where statement. The second one includes an aggregation. We need to find the minimum order amount on a given date.

We can either find this value in a separate query or write a nested select statement as the condition on the order amount. In this example, we have done the latter.


Conclusion

We have done 7 examples to cover SQL joins. The relational databases typically consist of many tables that are related based on shared columns. The data we need to retrieve from a relational database is typically spread out to multiple tables. Thus, it is very important to have a comprehensive understanding of SQL joins.

Thank you for reading. Please let me know if you have any feedback.


Related Articles