The world’s leading publication for data science, AI, and ML professionals.

5 Examples to Learn Date and Time Manipulation with Python Pandas

A practical guide

Photo by Samantha Gades on Unsplash
Photo by Samantha Gades on Unsplash

We often deal with dates and times in Data Science. If you are working with time-series data, they are always a part of your work.

Pandas is highly efficient and practical with regard to manipulating dates and times. In this article, we will go over 5 examples for solving some of the common date and time manipulation operations.

Don’t forget to subscribe if you’d like to get an email whenever I publish a new article.

Let’s start by creating a sample DataFrame that contains dates.

import pandas as pd
df = pd.DataFrame({
    "Date": pd.date_range(start="2021-12-28", periods=5, freq="D"),
    "Measurement": [1, 10, 25, 7, 12]
})
(image by author)
(image by author)

1. Datetime manipulation with DateOffset

The DateOffset function can be used for adding a specific length of duration to dates.

Let’s create a new column by adding 6 months to the existing date column.

df["Date2"] = df["Date"] + pd.DateOffset(months=6)
(image by author)
(image by author)

The following line of code adds 1 year to the existing date column.

df["Date2"] = df["Date"] + pd.DateOffset(years=1)

It is important to note that the data type of values in the date column is datetime. Thus, we can also add time intervals.

df["Date"] + pd.DateOffset(hours=2)
# output
0   2021-12-28 02:00:00
1   2021-12-29 02:00:00
2   2021-12-30 02:00:00
3   2021-12-31 02:00:00
4   2022-01-01 02:00:00
Name: Date, dtype: datetime64[ns]

If you need to do subtraction, just change the "+" sign to "-" or use a negative value inside the function.


2. Datetime manipulation with Timedelta

We can use the Timedelta function for adding dates and times as well. Its syntax is slightly different than the syntax of the DateOffset function.

df["Date"] + pd.Timedelta(3, unit="day")
# output
0   2021-12-31
1   2022-01-01
2   2022-01-02
3   2022-01-03
4   2022-01-04
Name: Date, dtype: datetime64[ns]

The Timedelta function also accepts strings for specifying the duration to be added. The following line of code does the same operation as above.

df["Date"] + pd.Timedelta("3 days")

3. Extracting information from datetime objects

A datetime object contains several pieces of information such as year, month, day, week, hour, microsecond, and so on.

We sometimes need a particular piece of information. For instance, we can extract month from a date as below:

df["Date"].dt.month
# output
0    12
1    12
2    12
3    12
4     1
Name: Date, dtype: int64

The year and day methods return the year and day part of a date, respectively.

In retail analytics, the day of the week is a significant piece of information for analysis and modeling. The dayofweek method can be used to get this information from a date.

df["Date"].dt.dayofweek
# output
0    1
1    2
2    3
3    4
4    5
Name: Date, dtype: int64

All these methods are available under the dt accessor so make sure to write "dt" before the name of the method.


4. Isocalendar

The isocalendar function returns a DataFrame with year, week, and day of week. Thus, it is a quick way of extracting multiple pieces of information in a single operation.

For instance, we can add the year, week, and day of week columns to our initial DataFrame as below:

df = pd.concat([df, df["Date"].dt.isocalendar()], axis=1)
(image by author)
(image by author)

The part that creates the additional columns is:

df["Date"].dt.isocalendar()

We use the concat function to combine these columns with the original DataFrame.


5. Difference

The difference between two dates or times can be of great importance in some tasks. For instance, we might need to calculate the time between consecutive measurements in a process.

The subtraction operation with two datetime objects gives us the difference in days.

df["Diff"] = df["Date2"] - df["Date"]
(image by author)
(image by author)

The data type of the diff column is timedelta so we can get the number of days using the days method.

df["Diff"].dt.days
# output
0    365
1    365
2    365
3    365
4    365
Name: Diff, dtype: int64

We can also divide it by a timedelta object of 1 day to get the number of days.

df["Diff"] / pd.Timedelta(days=1)
# output
0    365.0
1    365.0
2    365.0
3    365.0
4    365.0
Name: Diff, dtype: float64

If you want to convert the difference to months or years, use the timedelta of NumPy because Pandas cannot construct a Timedelta from months or years.

import numpy as np
df["Diff"] / np.timedelta64(1, 'M')
# output
0    11.992033
1    11.992033
2    11.992033
3    11.992033
4    11.992033
Name: Diff, dtype: float64

We have learned Pandas functions and methods for solving some of the common tasks in date and time manipulation.

Don’t forget to subscribe if you’d like to get an email whenever I publish a new article.

You can become a Medium member to unlock full access to my writing, plus the rest of Medium. If you do so using the following link, I will receive a portion of your membership fee at no additional cost to you.

Join Medium with my referral link – Soner Yıldırım


Thank you for reading. Please let me know if you have any feedback.


Related Articles