The world’s leading publication for data science, AI, and ML professionals.

Unconventional Datetime Transformation Made Easy Using Python’s Pandas Library

Explained with a real-life example

Background

Very recently I was tasked with analyzing leaves taken by employees at a client’s company. In particular, I need to understand whether an employee has taken leave in a given period, ultimately setting a benchmark for measuring employees’ compliance with return to office policies.

I was provided with the following two leaves datasets:

  1. Time Off Data ("Dataset A") which sets out shorter term leaves such as annual leave or sick leave taken by an employee. These leaves were unique at a per employee per date level (i.e. each row in the dataset represents a day of leave taken by a particular employee).
  2. Leave of Absence Data ("Dataset B") which sets out the from and to dates of longer term leaves taken by an employee. Some examples of these leaves are parental leave, maternity leave, leave without paid and career break. This dataset is on a ‘leave-as-you-go’ basis for employees who have taken longer term leaves, in that each row represents a date range for an employee, who may appear in multiple rows in the dataset with multiple date ranges (e.g. employees may prefer to take parental leave in batches of 3 days a week for 30 weeks, which would appear as 30 date ranges across 30 rows in the dataset).

The two datasets augment each other as an employee could take both shorter and longer term leaves in a given period.

For my analysis, I would like to merge the two datasets into a common format so that all leaves by a particular employee are accounted for. Since Dataset A is already of a structured tabular format at a per date per employee level, I needed to convert Dataset B into a similar format as shown in the image below (for a dataset I created for the purpose of demonstration).

Methodology

The transformation illustrated in image 1 presented a number of challenges given it’s not ‘one-to-one’ and it involves dates. To achieve the transformation, I followed the methodology below:

  • The date ranges in Dataset B (as shown in the table on the left in image 1) were broken up into unique (working) dates
  • The broken up dates associated with the same employee which appear in multiple rows were converted into two columns grouped by employee and then unique dates (as shown in the table on the right in image 1)

How did I implement the above in Python? Read on for a step-by-step guide!

Datetime Transformation using Pandas

Step 1: Load libraries and data

As the name of the step implies:

# Load libraries

import Pandas as pd
import numpy as np

from datetime import date, timedelta

path = r"Your_Directory_PathData.xlsx"
data = pd.read_excel(path)

The image below shows a print of the data ingested and data type by column. You should ensure that the two date columns are of the datetime data type, which if they are not, they can be converted to using _pd.to_datetime()_.

Step 2: Get unique dates per row

We will then ‘explode’ the date ranges in each row of the data into unique dates. This can be achieved using the _pd.date_range_ method and specifying the freq parameter as ‘D’:

collate = []

for j in range(len(data)):

    start_date = data['Leave From Date'][j]
    end_date = data['Leave To Date'][j] 

    date_range = pd.date_range(start = start_date, end = end_date, freq = 'D')

    collate.append(date_range)

df_date = pd.DataFrame(collate)

The output of this is shown in the image below. For instance, date range from 2023–02–01 to 2023–02–03 for employee B in row 2 is now broken up into three unique dates.

You may notice that there are some non-working dates in the last row as 2023–07–15 and 2023–07-16 in the date range fall on a weekend. To get business dates only, add a line of code to restrict the weekday index to less than 5 per the code below below. In addition, I’ve provided an option to restrict dates to a range in the commented-out code.


collate = []

for j in range(len(data)):

    start_date = data['Leave From Date'][j]
    end_date = data['Leave To Date'][j] 

    date_range = pd.date_range(start = start_date, end = end_date, freq = 'D')

    ## Get business dates only
    work_dates = date_range[date_range.weekday < 5]

    ## Apply a date range filter if you wish

    #from_date = pd.to_datetime('2023-01-01')
    #to_date = pd.to_datetime('2023-06-30')
    #work_dates = work_dates[(work_dates >= from_date) & (work_dates <= to_date)]

    collate.append(work_dates)

df_date = pd.DataFrame(collate)

Step 2: Row to column transformation

Now we have the date ranges broken up. We will move on to collapse these at an employee level. For instance, both row 2 and 5 in image 3 above correspond to employee C.

To convert the dates in multiple rows into just one column by employee, we add back the ‘Employee’ identifier to the _dfdate dataframe generated above, and then apply Pandas’ .melt() method over the ‘Employee’ column.

## Add back employee identifier

df_concat = pd.concat([pd.DataFrame(data['Employee']), df_date], axis = 1) 
                           .sort_values(['Employee']).reset_index(drop = True)

## Apply .melt() over employee identifier, and remove NaT values
df_melt = df_concat.melt(id_vars = ['Employee'], var_name = 'Index', value_name = 'Date')
df_melt.sort_values(['Employee']).dropna().reset_index(drop = True)

The output of the code above is exactly what we are after as shown below.

Mission accomplished! This dataframe can now be appended to Dataset A like-for-like for further analysis.

Concluding Thoughts

When I first encountered this problem, I was contemplating a solution in various tools such as SQL, R and Python. Through a bit of research, I ultimately chose Python as it appeared to be the easiest to implement, particularly for breaking up date ranges into individual dates and unpivoting columns into rows (noting that the melt method is also available in R).

In saying that, I’ll be more than happy to receive feedback on some of the other ways the readers think are more practical in solving this data transformation problem – leave them in the comments!


As I ride the AI/ML wave, I enjoy writing and sharing step-by-step guides and how-to tutorials in comprehensive language with ready-to-run codes. If you would like to access all my articles (and articles from other practitioners/writers on Medium), you can sign up using the link here!


Related Articles