The world’s leading publication for data science, AI, and ML professionals.

Time Series Analysis – Handling Time Zones

How to handle time zones with Pandas

Figure source
Figure source

Have you ever taken a long flight from east to west? If you have, you know how it feels. Assume the plane takes off at 1 pm and it takes 10 hours to reach your destination. You look outside as soon as the plane lands. It is still bright as if you only flew for an hour. You check the local time. It is only 3 pm. It feels like the longest day of your life.

Due to the shape and movements of the earth, different parts on earth receive sunlight at different angles at the same time. While you are having your morning coffee in Germany, your friend in the US may be getting ready to go to bed. It was inevitable to adjust the time according to the sunlight because we want to start the day with sunlight and sleep when it is dark, if possible. Therefore, a concept called "local time" arose. which is just the time reckoned according to the angle of the light beam from the sun. To make it convenient to use, the earth is divided into regions with different local times and these are called "time zones". Some countries have only one time zone in their territory while there exist multiple time zones in some countries. After this little introduction to time zones, let’s start with the real topic.

How do we handle time zones in Time Series Analysis?

Time series data includes data points attached to sequential time stamps. The sources of time series data are periodic measurements or observations. The analysis of time series data is essential for key tasks in many industries. I will focus on time zones in this post but if you would like to learn the basics of time series analysis, you can start with:

Time Series Analysis with Pandas

We live in a global world. Large companies usually aim to operate in many different countries which are in different time zones. Data being a valuable asset for companies need to be analyzed correctly. For time series analysis, it would be a crucial mistake to oversight time zones. In this post, I will cover the key concepts of time zones and how to handle them with Pandas.

Let’s start with importing related libraries:

import numpy as np
import pandas as pd
import datetime
import dateutil
import pytz

Then we create a simple time series data:

dates = pd.date_range('2019-01-01','2019-01-10')

By default, time series objects of pandas do not have an assigned time zone:

#date_range object
dates.tz is None
True
#Timestamp object
date = pd.Timestamp('2019-01-01')
date.tz is None
True

We can assign a time zone to these objects using tz_localize method:

dates_lcz = dates.tz_localize('Europe/Berlin')
dates_lcz.tz
  <DstTzInfo 'Europe/Berlin' LMT+0:53:00 STD>

Pytz vs dateutil

Note: Pandas supports time zones with pytz and dateutil libraries or datetime.timezone objects of the standard library.

To assign a time zone:

  • pass pytz or dateutil time zone object, or
  • pass Olson time zone string

An example for a pytz time zone string is ‘US/Pacific’. Same time zone string for dateutil is ‘dateutil/US/Pasific’. You can check the whole time zone string list using:

  • from pytz import common_timezones
  • from pytz import all_timezones

We can also create a time series object with a time zone using tz keyword argument:

#using pytz
date_pytz = pd.Timestamp('2019-01-01', tz = 'Europe/Berlin')
date_pytz.tz
    <DstTzInfo 'Europe/Berlin' CET+1:00:00 STD>
#using dateutil
date_util = pd.Timestamp('2019-01-01', tz = 'dateutil/Europe/Berlin')
date_util.tz
  tzfile('/usr/share/zoneinfo/Europe/Berlin')

The data type of a time series without a time zone is datetime64[ns]. If a time zone is assigned, then the data type becomes datetime64[ns, tz]:

dates = pd.date_range('2020-01-01', periods = 5, freq = 'D')
dates_tz = pd.date_range('2020-01-01', periods = 5, freq = 'D', tz='US/Eastern')

We may also need to convert time series data to a different time zone:

date_berlin = pd.Timestamp('2019-01-01', tz = 'Europe/Berlin')
date_converted = date_berlin.tz_convert('US/Central')

Note: UTC

Coordinated Universal Time (UTC) is a time standard and the local times in different countries are described according to UTC. For example, local time in Malaysia is UTC + 8. Regarding time series analysis with Pandas, all timestamps are stored under UTC. When you convert the time zone of a timestamp object, the new object has its own date and time but it is considered to be equal to the original object.

Let’s confirm by checking date_berlin and date_converted in the preceding example:

date_berlin == date_converted
True

Removing time zone information

We can also remove time zone information using tz_localize(None) or tz_convert(None):

date = pd.Timestamp('2020-02-01', tz='US/Eastern')
date
  Timestamp('2020-02-01 00:00:00-0500', tz='US/Eastern')
date.tz_localize(None)
  Timestamp('2020-02-01 00:00:00')

To do a substraction on timestamps, they must be in same time zone or without a time zone:

date1 = pd.Timestamp('2020-02-01 15:00', tz='US/Eastern')
date2 = pd.Timestamp('2020-02-01 09:00', tz='US/Pacific')

We can remove the time zone and do the subtraction:

date1.tz_localize(None) - date2.tz_localize(None)
Timedelta('0 days 06:00:00')

Thank you for reading. Please let me know if you have any feedback.


If you’d like to learn more about time series with Pandas, you can always check the user guide on the official Pandas website.


Related Articles