We repeat what we don’t repair

Introduction
Welcome to another interesting article on SQL. This article aims to refine your querying skills by pointing out some common, yet ignored mistakes. I’ll elaborate on them using hypothetical tables, and also provide fixes for each. So, sit tight and get ready to polish your querying skills.

Let’s take a hypothetical business example, as always. Suppose you are in the e-Commerce Analytics team at Amazon and have a few simple queries to run. You have two tables called ‘product’ and ‘discount’.

Let’s dive right in.
Mistakes & Fixes

We’ll fix them all, whatever it takes!
1. Counting NULL columns
When you need to count null column values as well, you should be aware of how ‘COUNT’ works. Let’s say you want to count the number of products, even when the primary key ‘product_id’ of the table ‘product’ is missing.
SELECT COUNT(product_id)
FROM product;
Result: 3
You may have been expecting 4 since you wanted to count the null value of column ‘product_id’ as well. But, COUNT() doesn’t count the null values.
Fix:
Use COUNT(*). It counts the null values as well.
Select Count(*)
From product;
Result: 4
That looks easy but is often missed while writing complex queries.
2. Using reserved words as column names
SELECT product_id,
RANK() OVER (ORDER BY price desc) As Rank
FROM product;
This won’t work since the column name ‘Rank’ is a reserved word for the function ‘Rank’.
Fix:
SELECT product_id,
RANK() OVER (ORDER BY price desc) As 'Rank'
FROM product;
And boom! You’re good already!
3. Using comparison operations with NULL
SELECT product_name
FROM product
WHERE product_id=NULL;
This will throw an exception since you used the comparison operator ‘=’ with NULL, and so will the comparison operator ‘!=’.
The logic is that you mean to ask whether the value in the column ‘product_id’ is unknown, not whether it is equal to something that is unknown.
That sounds like a tongue twister, and seems a bit confusing, right? It’s safe to say that this syntax is invalid in SQL. That’s easy!
Fix:
SELECT product_name
FROM product
WHERE product_id ISNULL;
Remember it now!
4. Filtering in the ‘ON’ clause vs Filtering in the ‘WHERE’ clause

This is a very interesting concept and is sure to improve your fundamental understanding of the difference between filtering in the ‘ON’ clause & filtering in the ‘WHERE’ clause. This isn’t exactly a mistake, just a demonstration of the usage of both, and you can choose the best option depending on the business requirement.
SELECT d.product_id,
price,
discount
FROM product p RIGHT JOIN discount d ON
p.product_id=d.product_id
WHERE p.product_id>1;
Result:

Here, the filter is applied after the tables are joined. So, the result has no rows where d.product_id≤1 (and obviously not where p.product≤1)
Fix:
Notice the usage of ‘AND’, and the difference in results.
SELECT d.product_id,
price,
discount
FROM product p RIGHT JOIN discount d ON
p.product_id=d.product_id
AND p.product_id>1;
Result:

Here, the conditional statement ‘AND’ is evaluated before the join occurs. You can think of it as a ‘WHERE’ clause that applies only to one of the tables (‘product’ table). Now, the result has even those rows where d.product_id≤1 (obviously along with p.product_id>1) because of the RIGHT JOIN.
Mind it that filtering in the ‘ON’ clause and ‘WHERE’ clause differ only when it’s a LEFT/RIGHT/OUTER JOIN and not when it’s an INNER JOIN.
5. Using columns created through Windows functions & ‘CASE WHEN’ in ‘WHERE’ clause of the same query
You should be careful not to use the column names created through Windows functions and the ‘CASE WHEN’ clause in the ‘WHERE’ clause of the same query.
SELECT product_id,
RANK() OVER (ORDER BY price desc) AS rk
FROM product
WHERE rk=2;
This query will throw an exception since column ‘rk’ has been created through the Windows function, and it has been used in the ‘WHERE’ clause of the same query.
Fix:
This can be fixed by using a temporary table or a subquery.
WITH CTE AS
(
SELECT product_id,
RANK() OVER (ORDER BY price desc) AS rk
FROM product
)
SELECT product_id
FROM
CTE
WHERE rk=2;
OR
SELECT product_id
FROM
(
SELECT product_id,
RANK() OVER (ORDER BY price desc) AS rk
FROM product;
)
WHERE rk=2;
The same idea applies to a column created using the ‘CASE WHEN’ clause. Remember that the Windows functions can only appear in the ‘SELECT’ or ‘ORDER BY’ clauses.
6. Using BETWEEN incorrectly
If you’re unaware of the inclusivity of the range covered by ‘BETWEEN’, you may get unexpected outputs.
The statement ‘BETWEEN x AND y’ includes both x and y in the required range.
SELECT *
FROM discount
WHERE offer_valid_till BETWEEN '2019/01/01' AND '2020/01/01'
ORDER BY offer_valid_till;
Result:

For this query, you may have been expecting all the dates in 2019, but you get the 1st day of 2020 as well as the output. This is because ‘BETWEEN’ is inclusive of both ‘2019/01/01’ and ‘2020/01/01’.
Fix:
Tweaking the range accordingly takes care of this.
SELECT *
FROM discount
WHERE offer_valid_till BETWEEN '2019/01/01' AND '2019/12/31'
ORDER BY offer_valid_till;
Result:

And now you have all the dates only in 2019. That was an easy fix, right?
7. Using the ‘WHERE’ clause after ‘GROUP BY’
You should be careful about where you write ‘WHERE’ when you use ‘GROUP BY’.
SELECT category,
AVG (price)
FROM product p INNER JOIN discount d ON
p.product_id=d.product_id
GROUP BY category
WHERE discount_amount>10;
This is incorrect because you have used the ‘WHERE’ clause after ‘GROUP BY’. And, why is that?
‘WHERE’ clause is used to filter on the results, and you should logically do that before you group that, not after. Applying the ‘WHERE’ filter first would reduce the Data and then ‘GROUP BY’ will group it according to the aggregation function used (AVG here).
Fix:
SELECT category,
AVG (price)
FROM product p INNER JOIN discount d ON
p.product_id=d.product_id
WHERE discount_amount>10
GROUP BY category;
Be mindful of the order of execution of the main Sql statements:

- FROM clause
- WHERE clause
- GROUP BY clause
- HAVING clause
- SELECT clause
- ORDER BY clause
This will help you in optimizing your SQL queries. Also, give this article on SQL query optimization by using the ‘Indexing’ technique a quick read:
Conclusion
This wasn’t an exhaustive list of all the common mistakes but consisted of mostly the ones where the fixes can be a bit confusing, especially for beginners. Hope it helps you refine your querying skills.
To end with a powerful message by Henry Ford,
"The only real mistake is the one from which we learn nothing."
Feel free to have a look at the other articles that I’ve written around data analytics:
Feel free to provide feedback and follow me for more interesting articles on data analytics. Connect with me on LinkedIn.