The world’s leading publication for data science, AI, and ML professionals.

Learn SQL Server Management Studio – Part 12 Ranking Window Functions

How to get the most recent record for each user using SQL? Step by step tutorial

Photo by Michael Dziedzic on Unsplash
Photo by Michael Dziedzic on Unsplash

In the Last Episodes…

Hi there! Welcome to the 12th instalment of a series of tutorials on Sql and SQL Server Studio. My goal is simple: To make you familiar and comfortable with the tool, and the language. "Why does it even matter?" I see you asking. It turns out that curiosity and side projects are often key in getting picked on new projects or even getting hired for a new job. The mere fact you’ve already used an important tool such as SQL Server Studio and wrote some SQL queries, can and will give you a clear head start.

And don’t forget to come back 😉 .

What to Expect?

Data comes in many forms and shapes. It can happen that a dataset, think a log table, will contain multiple records for a single user or machine, over some time. How could we possibly partition this data, to only retrieve the latest timestamp for a given user, machine or X?

This is where Ranking Window Functions come to the rescue. Their syntax is a bit more complex than the regular SELECT or CRUD operations we have seen in the other episodes (see above). In all honesty, I can never get my head around that syntax, and always need to refer to my holy text file 😉 .

Knowing how to leverage it will solve many headaches and allow you to rank records and filter out what’s not required. Without further ado…

Introduction

We’ll assume that you are all set with your SQL Server Studio environment. If you require any help, please check Episode 1 mentioned above.

Let’s first create a new table ‘Logs’:

CREATE TABLE Logs
(Timestamp datetime,
Username nvarchar(100))

Now let’s populate it with some data:

INSERT INTO [HR_data].[dbo].[Logs]
VALUES
(GETUTCDATE(), 'Max'),
(GETUTCDATE()-1, 'Max'),
(GETUTCDATE()-2, 'Beth'),
(GETUTCDATE()-3, 'Beth')

Let’s add some more noise to it, by duplicating what we just inserted. This will insert the same rows, except that the timestamp will be different by a few seconds, depending on when you executed the query.

/* This is to add a bit more records/noise in our dataset */
INSERT INTO [HR_data].[dbo].[Logs]
VALUES
(GETUTCDATE(), 'Max'),
(GETUTCDATE()-1, 'Max'),
(GETUTCDATE()-2, 'Beth'),
(GETUTCDATE()-3, 'Beth')

Run a quick SELECT query to check the dataset:

SELECT * FROM [HR_data].[dbo].[Logs]

This should return 8 records.

The Ranking Window Function

How could we extract the most recent Timestamp for any single user in our table? The Ranking Window Function comes to our rescue. This is a native T-SQL function.

SELECT Timestamp, Username
ROW_NUMBER() OVER(ORDER BY Timestamp DESC) AS Row#
FROM [HR_data].[dbo].[Logs]
  • We start by selecting Timestamp and Username. So far so good.
  • Then we add the Row_Number() function, which will define a row number for each record.
  • OVER() helps define our target. In this case, we’ll look at the Timestamp. Within the OVER() function we precise an ORDER BY and define whether it will be ASCENDING (ASC) or DESCENDING (DESC).
  • Last, we set an alias, here calling this column Row#

The output is:

Well, at least they’re ranked. But what we need is to find the latest timestamp for any given user. We need to add another element to our ROW_NUMBER(), the PARTITION BY. The use of PARTITION BY will help us to partition the dataset referred to by the FROM clause. As we have seen above, when PARTITION BY is not used, all rows of the result are treated as a single group.

This is how we can write our query:

SELECT Timestamp, UserName,
ROW_NUMBER() OVER(PARTITION BY UserName ORDER BY Timestamp DESC) AS Row#
FROM [HR_data].[dbo].[Logs]
  • PARTITION BY is added within the OVER clause.
  • PARTITION BY requires that we define at least one column, in our case we partition by UserName.

Output:

Max and Beth both have 4 records each. Their records are sorted by Timestamp.

  • The PARTITION BY targeted username. There were 2 distinct usernames, the ROW_NUMBER() ranked these records based on the partitions.
  • This is why we’ve got Max and Beth’s records going from 1–4, while we previously had them ranked from 1–8.

Final Step

Our original goal was to obtain the most recent timestamp for each user. As the above output shows, we know which record is the most recent for each user. Just look at Row# = 1. How could we single these rows out? A nested query will help us.

We’ll first reuse what worked well:

SELECT Timestamp, UserName,
ROW_NUMBER() OVER(PARTITION BY UserName ORDER BY Timestamp DESC) AS Row#
FROM [HR_data].[dbo].[Logs]

We then wrap it, using parentheses, within another SELECT statement and give an alias ‘x’ to our nested query:

SELECT * FROM
(
SELECT Timestamp, UserName,
ROW_NUMBER() OVER(PARTITION BY UserName ORDER BY Timestamp DESC) AS Row#
FROM [HR_data].[dbo].[Logs]
) x

The last step is to add a WHERE condition, so that we only look at Row# = 1:

SELECT * FROM
(
SELECT Timestamp, UserName,
ROW_NUMBER() OVER(PARTITION BY UserName ORDER BY Timestamp DESC) AS Row#
FROM [HR_data].[dbo].[Logs]
) x
where x.Row# = 1

Output:

We now have the two most recent timestamps for our two distinct users. Mission accomplished 🙂

Final Words

We now master a new tool, the Ranking Window Functions. This can solve many problems when for example we face identical rows which only differ by timestamps. ROW_NUMBER() combined with PARTITION BY will allow us to partition our dataset based on one or multiple attributes. In our case the username. We can then rank rows based on timestamp.

Lastly, the nested query allows us to single out given rank(s). In our case, we focused on rank/row #1, the most recent timestamp.


I hope you found this piece useful, let me know what you think or if there’s a topic I should cover. In the meantime, feel free to subscribe and follow me. Until next time!

Happy Coding 🎉 !


Thanks for reading! Enjoyed this story? Join Medium for complete access to all my stories.


Related Articles