In the Last Episodes…
Hi there! Welcome to the 12th instalment of a series of tutorials on Sql and SQL Server Studio. My goal is simple: To make you familiar and comfortable with the tool, and the language. "Why does it even matter?" I see you asking. It turns out that curiosity and side projects are often key in getting picked on new projects or even getting hired for a new job. The mere fact you’ve already used an important tool such as SQL Server Studio and wrote some SQL queries, can and will give you a clear head start.
- In Episode 1, I show you how to set up our environment and local server – Part 1 Step by Step Setup of SQL Server Studio
- In Episode 2, we cover how to create your databases, tables, and an important commentary on naming conventions – Part 2 Databases, Tables & Naming Convention
- In Episode 3, we introduce the CRUD operations and the Primary and Foreign Keys – Part 3 CRUD Operations, Primary & Foreign Keys
- In Episode 4, we discuss schemas & the main normalization steps – Part 4 Schemas & Normalization
- In Episode 5, we cover Stored Procedures & Scheduling, this is a real beast. I can’t stress enough how this will facilitate and automate your daily (data-)life. Part 5 Stored Procedures & Scheduling
- In Episode 6, we introduce SSIS Packages for ETL and reviewed out to import and export data between two databases, and between a database and Excel. Part 6 SSIS Packages Introduction
- In Episode 7, we connect SQL Studio to PowerBI and build our first visuals. Part 7 Connect to PowerBI & First Visuals.
- In Episode 8, we leverage the power of SQL Views, an easy and lightweight tool to combine and retrieve complex tables. You won’t believe how you could live without them. Part 8 Leverage The Views
- In Episode 9, we take advantage of temporary tables, to temporarily store and retrieve data within stored procedures. It’s extremely useful to find out intermediary results, and cache data in a table form for later use. Find out more in Part 9 Temporary Tables
- Episode 10 explores how to export SQL data as a CSV document and email it, using stored procedures and a scheduling task. Part 10 Export SQL Query Data by Email as CSV
- Episode 11 jumps into the cloud, with the setup of a Server and a Database in Azure Cloud.
And don’t forget to come back 😉 .
What to Expect?
Data comes in many forms and shapes. It can happen that a dataset, think a log table, will contain multiple records for a single user or machine, over some time. How could we possibly partition this data, to only retrieve the latest timestamp for a given user, machine or X?
This is where Ranking Window Functions come to the rescue. Their syntax is a bit more complex than the regular SELECT or CRUD operations we have seen in the other episodes (see above). In all honesty, I can never get my head around that syntax, and always need to refer to my holy text file 😉 .
Knowing how to leverage it will solve many headaches and allow you to rank records and filter out what’s not required. Without further ado…
Introduction
We’ll assume that you are all set with your SQL Server Studio environment. If you require any help, please check Episode 1 mentioned above.
Let’s first create a new table ‘Logs’:
CREATE TABLE Logs
(Timestamp datetime,
Username nvarchar(100))
Now let’s populate it with some data:
INSERT INTO [HR_data].[dbo].[Logs]
VALUES
(GETUTCDATE(), 'Max'),
(GETUTCDATE()-1, 'Max'),
(GETUTCDATE()-2, 'Beth'),
(GETUTCDATE()-3, 'Beth')
Let’s add some more noise to it, by duplicating what we just inserted. This will insert the same rows, except that the timestamp will be different by a few seconds, depending on when you executed the query.
/* This is to add a bit more records/noise in our dataset */
INSERT INTO [HR_data].[dbo].[Logs]
VALUES
(GETUTCDATE(), 'Max'),
(GETUTCDATE()-1, 'Max'),
(GETUTCDATE()-2, 'Beth'),
(GETUTCDATE()-3, 'Beth')
Run a quick SELECT query to check the dataset:
SELECT * FROM [HR_data].[dbo].[Logs]
This should return 8 records.
The Ranking Window Function
How could we extract the most recent Timestamp for any single user in our table? The Ranking Window Function comes to our rescue. This is a native T-SQL function.
SELECT Timestamp, Username
ROW_NUMBER() OVER(ORDER BY Timestamp DESC) AS Row#
FROM [HR_data].[dbo].[Logs]
- We start by selecting Timestamp and Username. So far so good.
- Then we add the Row_Number() function, which will define a row number for each record.
- OVER() helps define our target. In this case, we’ll look at the Timestamp. Within the OVER() function we precise an ORDER BY and define whether it will be ASCENDING (ASC) or DESCENDING (DESC).
- Last, we set an alias, here calling this column Row#
The output is:

Well, at least they’re ranked. But what we need is to find the latest timestamp for any given user. We need to add another element to our ROW_NUMBER(), the PARTITION BY. The use of PARTITION BY will help us to partition the dataset referred to by the FROM clause. As we have seen above, when PARTITION BY is not used, all rows of the result are treated as a single group.
This is how we can write our query:
SELECT Timestamp, UserName,
ROW_NUMBER() OVER(PARTITION BY UserName ORDER BY Timestamp DESC) AS Row#
FROM [HR_data].[dbo].[Logs]
- PARTITION BY is added within the OVER clause.
- PARTITION BY requires that we define at least one column, in our case we partition by UserName.
Output:

Max and Beth both have 4 records each. Their records are sorted by Timestamp.
- The PARTITION BY targeted username. There were 2 distinct usernames, the ROW_NUMBER() ranked these records based on the partitions.
- This is why we’ve got Max and Beth’s records going from 1–4, while we previously had them ranked from 1–8.
Final Step
Our original goal was to obtain the most recent timestamp for each user. As the above output shows, we know which record is the most recent for each user. Just look at Row# = 1. How could we single these rows out? A nested query will help us.
We’ll first reuse what worked well:
SELECT Timestamp, UserName,
ROW_NUMBER() OVER(PARTITION BY UserName ORDER BY Timestamp DESC) AS Row#
FROM [HR_data].[dbo].[Logs]
We then wrap it, using parentheses, within another SELECT statement and give an alias ‘x’ to our nested query:
SELECT * FROM
(
SELECT Timestamp, UserName,
ROW_NUMBER() OVER(PARTITION BY UserName ORDER BY Timestamp DESC) AS Row#
FROM [HR_data].[dbo].[Logs]
) x
The last step is to add a WHERE condition, so that we only look at Row# = 1:
SELECT * FROM
(
SELECT Timestamp, UserName,
ROW_NUMBER() OVER(PARTITION BY UserName ORDER BY Timestamp DESC) AS Row#
FROM [HR_data].[dbo].[Logs]
) x
where x.Row# = 1
Output:

We now have the two most recent timestamps for our two distinct users. Mission accomplished 🙂
Final Words
We now master a new tool, the Ranking Window Functions. This can solve many problems when for example we face identical rows which only differ by timestamps. ROW_NUMBER() combined with PARTITION BY will allow us to partition our dataset based on one or multiple attributes. In our case the username. We can then rank rows based on timestamp.
Lastly, the nested query allows us to single out given rank(s). In our case, we focused on rank/row #1, the most recent timestamp.
I hope you found this piece useful, let me know what you think or if there’s a topic I should cover. In the meantime, feel free to subscribe and follow me. Until next time!
Happy Coding 🎉 !
Thanks for reading! Enjoyed this story? Join Medium for complete access to all my stories.