The world’s leading publication for data science, AI, and ML professionals.

10 Most Important SQL Commands Every Data Analyst Needs to Know

Querying data from a database doesn't need to be complicated

Photo by Shunya Koide on Unsplash
Photo by Shunya Koide on Unsplash

As a data analyst or data scientist, it doesn’t matter how good you are at creating fancy visualizations or how skilled you are at building complicated models – at its core, you need data in order to do those things.

When working at a large company, these data are typically stored in a database such that everyone can easily access and query the data that they need to do their job.

How do you query said data? Well, that’s where SQL comes in.

SQL is short for Structured Query Language and as the name suggests, it is a standardized language that is used to query or retrieve data from a relational database. Furthermore, it can also be used to perform various operations on the data such as filtering rows and columns as well as simple data manipulation.

One of the earliest observations that I made when I started working as a data analyst was that everybody around me knew how to use SQL, and I mean literally everybody. Whether if you are an intern who just recently joined the team or a senior analyst who has been around for a long time, this is a skill that everyone needs to know. This goes to show just how significant SQL is in the world of Data Science analytics.

In light of this, I wanted to highlight the top 10 SQL commands in this blog post to help you get started on your journey with using SQL but more importantly, I will be demonstrating each command using practical examples in order to simulate the feeling of actually working with a database.

To do this, we will be using SQL Fiddle, a handy application for testing and sharing SQL queries online. For the purpose of this tutorial, I have created two mock tables called transaction and customers, which represent the transactions of a supermarket chain and customer profile respectively.


1. SELECT and FROM

SELECT and FROM form the foundation of all SQL queries. The most basic SQL query will involve these two commands and as the query gets more complex, more commands will be added on top of them.

SELECT informs which columns you want to select whereas FROM specifies which table you want to query the data from.

Let’s now look at some examples in relation to the transaction table.

To view all the columns in the transaction table:

SELECT *
FROM transaction;

Suppose we only want specific columns in the transaction table:

SELECT transaction_id, purchase_date, sales
FROM transaction;

2. DISTINCT

DISTINCT is used to view the unique values in a column. For example, suppose we would like to see the unique dates with transactions:

SELECT DISTINCT purchase_date
FROM transaction;

3. WHERE

WHERE is used to filter rows that match a certain condition. Moreover, it is also often used with other operators such as AND, OR, BETWEEN, IN and LIKE to combine multiple conditions together.

Here are some examples:

SELECT *
FROM transaction
WHERE purchase_date = '2021-10-15';
SELECT *
FROM transaction
WHERE purchase_date = '2021-10-15'
AND store_location = 'Melbourne CBD';
SELECT *
FROM transaction
WHERE purchase_date = '2021-10-15'
OR store_location = 'Melbourne CBD';
SELECT *
FROM transaction
WHERE store_location IN ('Richmond', 'Brunswick', 'Kew');

4. % Wildcard

The % wildcard is used in a LIKE operator to match string patterns.

Before we look at how this wildcard works, let’s first examine the customer profile table. This table tells us the life stage and the premium status of a particular customer.

SELECT *
FROM customers;

Suppose now we want to filter out the rows in the customers’ table where the customer life stage starts with the word Young.

SELECT *
FROM customers 
WHERE customer_lifestage LIKE 'Young%';

Similarly, if we want to view the rows where the customer life stage ends with the word families.

SELECT *
FROM customers 
WHERE customer_lifestage LIKE '%families';

As you can see, SQL offers a quick and straightforward way to matching string patterns, which can be convenient in many situations when filtering out rows.

5. ORDER BY

ORDER BY can be used to sort the result of a query by a particular column either alphabetically or numerically. It can be ordered in two ways: DESC which is in descending order, or ASC which is in ascending order. Though you will notice that most people don’t write ASC in their queries as SQL sets this by default.

To demonstrate this, suppose we would like to order the transactions in ascending order based on sales amount.

SELECT store_location, sales
FROM transaction 
ORDER BY sales;

Alternatively, we can also sort the transactions in descending order based on sales amount.

SELECT store_location, sales
FROM transaction 
ORDER BY sales DESC;

6. AS

AS enables us to rename a column or table. Note that this will not directly alter the names in their original columns or tables.

The given query will return the date column from the transaction table where _purchasedate is renamed to date.

SELECT purchase_date as date
FROM transaction;

7. CASE WHEN, ELSE and THEN

If you have used any other programming languages before, this is very similar to an if-else statement.

Effectively, in plain English, the command sounds a little something like this: if a condition is satisfied, do this, otherwise do that.

Let’s look at an example to solidify this idea.

Suppose we want to create a new column that tells us if the sales amount of a particular transaction exceeds $20.

SELECT transaction_id, sales,
CASE WHEN sales < 20 THEN 'Sales amount is less than $20'
ELSE 'Sales amount is greater than $20' END AS sales_threshold 
FROM transaction;

8. GROUP BY and aggregate functions

GROUP BY will group data by their identical values. It is frequently used along with aggregate functions to summarise the attribute of a particular group of data.

Aggregate functions, on the other hand, perform calculations on a range of values and return a single value. Some examples of aggregate functions include:

  • COUNT: returns the total number of rows
  • SUM: returns the sum of all the values
  • MAX: returns the maximum value
  • MIN: returns the minimum value
  • AVG: returns the average value

Let’s now turn to some examples.

Suppose we want to know the number of rows in the transaction dataset.

SELECT COUNT(*)
FROM transaction;

What about the highest sales amount in the transaction dataset.

SELECT MAX(sales) as max_sales
FROM transaction;

Lastly, what if we are interested to know the total sales of each day, rounded to the nearest dollar?

SELECT purchase_date, ROUND(SUM(sales)) as total_sales 
FROM transaction 
GROUP BY purchase_date;

9. JOIN

Before we discuss the concept of joins, I think it is important that we first distinguish the difference between a primary key and a foreign key.

In relational databases, a primary key is used to uniquely identify each row in a table. For instance, the primary key for the transaction table is the _transactionid column, whereas the primary key for the customer profile table is the _customerid column.

A foreign key, on the other hand, provides a link between data in two tables. Specifically, a foreign key in one table will link to the primary key in another table. For example, the _customerid column is a foreign key in the transaction table but it is a primary key in the customer profile table.

Given the relationship between a primary key and a foreign key, we can perform a LEFT JOINin this particular scenario.

There are other types of joins in Sql, such as INNER JOIN, RIGHT JOIN and FULL JOIN, which I won’t get to here. If you are interested to learn more, check out this blog post for more details.

For now, suppose we want to perform a LEFT JOIN on the transaction table based on the _customerid column.

SELECT a.*, b.customer_lifestage, b.customer_premium
FROM transaction AS a
LEFT JOIN customers AS b
ON a.customer_id = b.customer_id;

It’s also good practice to always check the number of rows after we perform LEFT JOIN to make sure the new table has the same number of rows as the left table before the join, which in this case is the transaction table.

10. UNION

Last but certainly not least, UNION is used to combine query results from multiple SELECT statements. Note that the tables that you wish to union must have the same number of columns and equally important, the columns must be of the same data type.

I have to admit, both the tables I have put together for this exercise might not be the best at illustrating the power of UNION, however, for the purpose of completeness, I will demonstrate it here.

Suppose we want to union the _customerid column with the _quantitypurchased column, both from the transaction table.

SELECT customer_id AS sample_union
FROM transaction 
UNION
SELECT quantity_purchased FROM transaction;

With that, we conclude the 10 most important SQL commands that you need to get started with using SQL.

I hope, through this blog post, you have gained a better understanding of what SQL is, and its significance in data science analytics but more importantly, realize that it is actually relatively easy to learn once you get hold of the fundamentals.

For any aspiring data analyst or data scientist, this is definitely one of the skills you should consider adding to your toolkit because SQL is here to stay.


Don’t know what to read next? Here are some suggestions.

Pandas Zero to Hero – A Beginner’s Tutorial to Using Pandas

Five Basic Commands to Get Started with dplyr in R

Beginner’s Introduction to NLP – Building a Spam Classifier


Related Articles