Photo by Tim Johnson on Unsplash

4 Really Important Ways to Combine Data in SQL

Because not all data is stored in the same spot.

Scott Fiene
Towards Data Science
6 min readJun 9, 2021

--

For me joins in SQL were one of those things that was difficult to understand, but is super important to understand. After some practice and this article, hopefully the confusion will subside.

Like the headline says there are at least 6 ways to join data in SQL. Previously I wrote about the foundational SQL skills to learn. You can read the article here.

Foundational SQL Know-How. Enough To Be Dangerous. | by Scott Fiene | CodeX | Medium

You can also find all the code used on my GitHub.

Let’s take a look at the tables we will be using and get a quick refresher. The following code will create the temporary tables that we will use to discuss the joins.

DROP TABLE IF EXISTS customers CASCADE;
DROP TABLE IF EXISTS orders CASCADE;
CREATE TEMP TABLE customers (
cust_id INT PRIMARY KEY NOT NULL,
first_name VARCHAR(200) NOT NULL,
lifetime_value FLOAT NULL
);
CREATE TEMP TABLE orders (
order_id INT PRIMARY KEY NOT NULL,
cust_id INT NOT NULL,
date DATE NULL,
product VARCHAR(200) NOT NULL,
CONSTRAINT fk_customer
FOREIGN KEY(cust_id)
REFERENCES customers(cust_ID)
ON DELETE CASCADE
);
INSERT INTO customers VALUES (1, 'Jim', 4003)
, (2, 'Albert', 49024)
, (3, 'Juanita', 3200)
, (4, 'Bill', 5000)
, (5, 'Leslie', 23500)
, (6, 'Fredrick', NULL)
, (7, 'Ricky', NULL)
, (8, 'Dean', NULL);
INSERT INTO orders VALUES (1, 1, '01/01/2020', 'Pen')
, (2, 2, '04/20/2021', 'Scissors')
, (3, 1, '02/28/2021', 'Paper')
, (4, 3, '03/25/2021', 'Paper')
, (5, 4, '06/01/2021', 'Computer')
, (6, 5, '12/24/2020', 'Stapler')
, (7, 3, '01/19/2021', 'Desk')
, (8, 4, '01/28/2021', 'Pens')
, (9, 2, '05/08/2021', 'Paper Clip')
, (10, 1, '06/02/2021', 'Scissors');

Once the tables are created let’s take a look at what they look like.

SELECT *
FROM customers;
Customers Table
SELECT *
FROM orders;
Orders Table

Ok, now that we know what the tables contain, let’s move on to the joins.

JOINS

There are plenty of joins in SQL, but the most useful, in my opinion, are the following:

  • Inner Join
  • Left Join

This is by no means an exhaustive list of join, rather just something to whet your whistle. Let begin with some set theory. Now you may remember this from high school, or you may not. Set theory plays an important role in joins because it deals with the collection of two or more objects.

INNER JOIN

In set theory and inner join is written A⋂B, which is said A intersection B. It is simply the place where A and B overlap.

Inner Join

This is probably the most common join type in SQL. It returns only the rows where the specified column matches in both tables, hence the intersection, ⋂, of the two.

To perform an INNER JOIN in SQL you need to specify two things.

  • What type of join, INNER JOIN in this case
  • What columns to join on, cust_id. This is done using the keyword ON
SELECT *
FROM customers c
JOIN orders o ON c.cust_id = o.cust_id;
SELECT *
FROM customers c
INNER JOIN orders o ON c.cust_id = o.cust_id;

Notice that there are two different ways to produce the same result. This is because an INNER JOIN is the default join type unless specified. You can write it either way, but I prefer the second way because it is more readable and will help people who read your code in the future.

The output of both of the above queries will output only rows where there is the same cust_id in each of the tables.

Inner Join Output

LEFT JOIN

Now that we talked about a inner join, let’s discuss what a left join does. When discussing left join in set theory they are written A, or simple all of table A.

Left Join

This is the second most common join type in SQL and it returns all the rows in table A and only the rows from table B that match the specified column from A.

To perform an LEFT JOIN in SQL you need to specify two things.

  • What type of join, LEFT JOIN in this case
  • What columns to join on, cust_id. This is done using the keyword ON
SELECT *
FROM customers c
LEFT JOIN orders o ON c.cust_id = o.cust_id;

SELECT *
FROM customers c
LEFT OUTER JOIN orders o ON c.cust_id = o.cust_id;

Just like with inner join, there are two different way to write a left join both of which produce the same result. Again, I prefer the second way because it is more readable and will help people who read your code in the future.

The output of both of the above queries will output all rows from the left table and the matching rows from the right table. Now, in this case the left table is considered to be the one specified in the FROM clause and the right table is the one specified in the JOIN clause.

Left Join Output

These are probably the most useful joins there are and now you know how to use them. Next, let’s talk about unions.

UNIONS

For unions, we will be using the same two tables we used for the join portion, since we are familiar with the data.

Remember, joins are used to add more columns to the query output while unions are used to add more rows to the query output. With that being said, there are do ways to go about this, UNION and UNION ALL.

UNION

The first union method is simply UNION. This particular method will add rows to the output with a couple nuances. First, the structure of both queries need to be the exact same, and second, this method will automatically remove duplicates.

So lets break this down.

SELECT *
FROM customers
UNIONSELECT *
FROM customers;

You can run the single select and see what the results are, but they will be the same as the union. This is because the UNION removes duplicates, so they are essentially the same thing.

Union Output

UNION ALL

We are going to run the exact same query, but this time we will specify UNION ALL instead of UNION

SELECT *
FROM customers
UNION ALLSELECT *
FROM customers;

With UNION ALL, duplicates are not removed. So running the above query will result in the following, which is essentially the customers table stacked on itself.

A little note about UNION and UNION ALL that I was given while learning on the job is this:

You better have a real good reason as to why you are using a UNION, otherwise use UNION ALL.

I hope that this helps you understand how to combine your data better. Thank you for reading.

--

--