Considering the enormous amount of structured data stored in relational databases, data scientist and analysts are likely to interact with a relational database almost daily. SQL is what we use to make these interactions, which make it one of the most in-demand skills in Data Science.
There are many different relational database management systems (RDBMS) that use SQL to interact with and manage data in relational databases. One of the popular RDBMSs is PostgreSQL, which is open-source and has a strong reputation for reliability, feature robustness, and performance.
In this article, we will learn about 2 PostgreSQL functions, which simplify complex operations and come in handy in many cases. I call them as life-saver functions.
1. generate_series
This function can be used for creating ordered tables with numbers and dates. A typical use case with dates is when you have a starting and ending dates for an event and need to expand it as a calendar.
Let’s go over an example. We have a booking table that contains the bookings at a hotel. It includes the booking id, check-in and check-out dates. We want to expand it so that it will be easier to calculate the occupancy.
SELECT
id,
generate_series(
checkIn,
checkOut,
INTERVAL '1 day'
) AS booked
FROM booking
The query above will include the check out date which is actually not regarded as booked. Thus, in order to mark booked days accurately, we need to subtract one day from the check out date before generating the series.
Here is how this can be done:
SELECT
id,
generate_series(
checkIn,
(checkOut - INTERVAL '1 day'),
INTERVAL '1 day'
) AS booked
FROM booking
What this code does is illustrated below:
data:image/s3,"s3://crabby-images/daa27/daa27ff193a6a0c880123a449b10e4ff3e8ac975" alt="(image by author)"
We can join the table on the left to a calendar table and that’s it! We have a calendar with booked days marked.
2. to_char
The to_char function can be used for converting timestamps, intervals, or a numeric values to formatted strings. I find it the most useful when extracting a piece of information from timestamps or dates.
Its syntax is as follows:
TO_CHAR(expression, format)
We can use it to extract month name and month abbreviation using the ‘MONTH’ and ‘MON’ formats, respectively. Consider we have a booking table which has a date column called booked. We can use the to_char function to extract these pieces as follows:
select
booked,
to_char(booked, 'MONTH') AS month,
to_char(booked, 'MON') AS mon,
from booking
data:image/s3,"s3://crabby-images/9e509/9e5091837fb9fc52eb25ea55d911e07a6082ac77" alt="(image by author)"
We can also get the year month information as a string:
select
booked,
to_char(booked, 'YYYY-MM') AS yearmonth
from booking
If the date is 2022–10–08, then year month value will be 2022–10.
There are also useful formats related to days. For instance, we can extract the day name and number.
select
booked,
to_char(booked, 'DY') AS day_name,
to_char(booked, 'DD') AS day,
from booking
data:image/s3,"s3://crabby-images/457e7/457e7d1ead0e86cb97303e83480e3c511135a280" alt="(image by author)"
There are other formats that might be useful for different tasks. You can check the official documentation for other formats supported by the to_char function.
We have covered two important functions of PostgreSQL. They are mainly used in data preprocessing, or making the raw data better fit the tasks you will do afterwards. These functions actually show that SQL is not just a query language but a highly efficient data analysis and manipulation tool as well.
You can become a Medium member to unlock full access to my writing, plus the rest of Medium. If you already are, don’t forget to subscribe if you’d like to get an email whenever I publish a new article.
Thank you for reading. Please let me know if you have any feedback.