The world’s leading publication for data science, AI, and ML professionals.

How to Use These 6 Unusual SQL Functions

Keep these in your toolbox and you'll be ready to solve the hardest problems

Photo by Joni Ludlow on Unsplash
Photo by Joni Ludlow on Unsplash

Whether you’re a data analyst, data engineer, or Analytics engineer, you need to know your SQL. It is an essential language to master in the world of data. We are constantly using it!

As an analytics engineer, I’m constantly writing SQL for my dbt data models. I’m always working to expand the number of functions I know and improve on how I use the ones I already know.

Here are some unusual but helpful Sql functions to learn whether you’re a SQL ninja or just learning the language.

SIGN()

The SIGN() function returns a value depending on whether the column it’s given is positive or negative. If a number is > 0, it returns 1. If a number is < 0, it returns -1. If the number is = 0, it returns 0. This function will only work on columns that are of a number data type.

This function can be helpful to use in CASE statements as well as in WHERE clauses. It makes it easier to filter a column by its sign.

Let’s say we are a bank and we need to look at our customer’s monthly statements. We want to flag all of the customers that have either 0 dollars in their account or a negative amount of money in their account.

We could do something like this:

SELECT
   customer_name,
   amount
FROM bank_statements 
WHERE SIGN(amount) IN (-1, 0)

This would result in a table with the customer_name and amount for customers that had negative or zero dollars in their accounts.

FLOOR()

This function is a popular one to use when doing mathematical calculations within your SQL queries. It returns the largest whole number that is equal to or less than the number provided.

Note that it returns a number that is equal to or less than the number provided. This means that when you use FLOOR() on a negative number, an even larger negative number will result.

SELECT FLOOR(-15.5) FROM numbers

This will return -16 as your answer since that is the largest whole number greater than -15.5.

CEILING()

This function is the sister of the FLOOR() function. Instead of returning the largest whole number, it returns the smallest whole number that is greater than or equal to the number provided.

Let’s look at the same example as above but now using the CEILING() function.

SELECT CEILING(-15.5) FROM numbers

Instead of returning -16 as the FLOOR() function did, this will return -15. -15 is the smallest whole number greater than or equal to -15.5.

Have you ever been to the grocery store and the cashier asks if you want to round up your amount to donate the extra change to a local charity? This function would be helpful behind the scenes in that grocery store’s database. Using CEILING would provide the cashier with the new amount to charge the customer as well as the amount they’d be donating.

SELECT 
   transaction_id,
   transaction_amount,
   CEILING(transaction_amount) AS transaction_with_donation,
   CEILING(transaction_amount) - transaction_amount AS donation_amount 
FROM grocery_db

Using this function, you can get all the information you need about the transaction!

LEAD()

It wouldn’t be a proper SQL article without including at least one window function. I’ve always been taught that they’re not necessary to learn, but I’m not sure who is saying that. Window functions are some of the most useful functions to exist in SQL.

The LEAD() function saves you from needing to do a confusing, messy join to another table. It allows you to access the values in the rows after the row you are currently looking at. It makes comparison super easy.

Let’s continue with the grocery store examples. In a grocery store, there are multiple different checkout lanes 1–10. Then, within each checkout lane are different customers. Whoever arrived first is first in line and whoever arrived last is last in line.

Image by author
Image by author

We want to find the person who is behind each customer in line. In order to do this, we would use the LEAD() function on the customer_name column, so we get the name of the next customer. We then partition it by line_number because only customers in the same line can be before/after one another. Lastly, we order it by arrived_at time so we have an accurate ordering of the customers in each line.

SELECT 
   customer_name,
   LEAD(customer_name) OVER(PARTITION BY line_number ORDER BY arrived_at ASC) AS next_in_line
FROM grocery_customers

This will result in something that looks like this:

Notice that the customers who are last in their line have a NULL value for the next_in_line column. LEAD() will return a NULL value whenever there is no row meeting the specified conditions after the current row. Since Fred and Haley are last in their lines, they have a NULL value for the next_in_line column. Since Sebastian is the only person in line 2, he also has a NULL value for that column.

LAG()

Opposite from the LEAD() function, LAG() allows you to compare your current row to the rows before it, rather than those after it. It is still used for comparison, it just serves a bit of a different purpose.

Similarly, with LEAD(), you choose the column you wish to output from the rows before your current row and partition based on how you wish to separate out your columns. Then, the most important part is ORDER BY. This will determine whether you’re getting the value you want or not. If you choose ASC or DESC values, that can change your whole query. Just be sure it fits the context of your problem and what you’re trying to solve.

SELECT 
   customer_name,
   LAG(Name) OVER(PARTITION BY line_number ORDER BY arrived_at ASC)    AS ahead_in_line
FROM grocery_customers

Here, we have the same exact query except now I’m using the LAG() function. This function will get the customer in line before me rather than after.

Image by author
Image by author

Notice that the NULL values are now opposite what they were using the LEAD() function. Now, those who were first in their lines have NULL values. Sebastian still has a NULL value since he was the only person in line.

See some more examples on how to use these functions in my in-depth article here.

IFF()

I’ve personally never used this function, but I need to start now that I know how useful it can be. It’s essentially a simpler CASE statement. The IFF() function tests a condition and returns a specified value if the condition is TRUE and another specified value if the condition is FALSE.

IFF(condition, value if true, value if false)

This can be helpful when using one column to create a boolean column. Let’s say you want to check which customers in your database are elderly, or over 65. You could write a query that looks like this:

SELECT 
   customer_name, 
   IFF(age >= 65, TRUE, FALSE) AS is_elderly 
FROM grocery_customers 

Now, instead of looking at the age column and adding a filter based on that, we can simply use our boolean is_elderly column to filter our customers.

Conclusion

I don’t use these SQL functions often, but when I do, they make a huge difference in the speed and sophistication of my queries. Having these functions in your toolbox is essential for having the most optimal solution for when a problem needing one of these arises.

They are also super helpful when completing a SQL test in an interview. Sometimes the simple solution doesn’t come to you at the moment and knowing the fancy functions saves you. Trust me, I’ve been there. And, interviewers are actually even more impressed by your knowledge of these.

Learn more about SQL and other tools used by analytics engineers by subscribing to my email list.


Related Articles