Time Travel Made Easy: A Comprehensive Guide to Python Datetime

Probably all you ever need for Python Datetime

Andreas Lukita
Towards Data Science

--

Photo by Zulfa Nazer on Unsplash

Working with data that has dates and times can be easily overwhelming, especially if you are not quite familiar with the ins and outs of datetime manipulation. Many terms such as DatetimeIndex, Timestamp, Timedelta, Timezone, and Offset, might be confusing to grasp and remember, even for intermediate-level analysts. This guide will help you master datetime manipulation and unlock powerful insights from your data. Let’s get started!

The datetime module in Python’s standard library provides classes that can work with dates, times, and time intervals¹. This module is particularly important in data analysis because dates and times are often key components of data, and manipulating them accurately is essential for projects such as time series analysis and financial modeling. With the use of datetime, analysts can gain a better understanding of time-based trends and patterns in data, which can lead to more accurate insights and predictions from the dataset. The 6 classes under the datetime module include date, time, datetime, timedelta, tzinfo, and timezone

Table of Content

Aware vs Naive datetime objects

In simple terms, an aware datetime object contains timezone information, making it unambiguous about the timezone for a specific date and time¹. To create an aware datetime object, a timezone object needs to be attached to the datetime object with the help of pytz module.

from datetime import datetime
import pytz

tz = pytz.timezone('Asia/Singapore')
dt = datetime(2023, 5, 4, 10, 30, 0, tzinfo=tz)

This creates an aware datetime object representing May 4, 2023, at 10:30 AM Singapore Time. The tzinfo argument specifies the timezone for the datetime object. Printing dt out will give us the following information datetime.datetime(2023, 5, 4, 10, 30, tzinfo=<DstTzInfo ‘Asia/Singapore’ LMT+6:55:00 STD>).

On the other hand, a naïve datetime object does not contain timezone information. It does represents date and time, but it is not clear what timezone that date and time refer to¹.

from datetime import datetime

dt = datetime(2023, 5, 4, 10, 30, 0)

Calling the attribute dt.tzinfo and dt.utcoffsetproduces None.

It is worth noting that aware datetime objects are always in UTC time internally, and they are adjusted to the specified timezone when displayed or used in calculations. This implies that you can compare aware datetime objects from different timezones directly since they are both internally represented in UTC time. More often than not, it is better to use aware datetime objects wherever possible, especially in applications that deal with data from various timezones.

Coordinated Universal Time (UTC), Time, Timezone, and Offset

UTC, or Universal Coordinated Time, is the main time standard used worldwide to regulate clocks and time. Prior to 1972, it is known as Greenwich Mean Time (GMT)². UTC is a time standard that is globally recognized and coordinated, making it essential for international communication, navigation, and scientific research. It is worth noting that UTC is not affected by daylight saving time, making it a stable reference point for time-related activities. Instead, it is based on atomic clocks and is adjusted as needed to stay synchronized with the Earth’s rotation by adding or subtracting leap seconds³. As a result, UTC time is consistent worldwide, regardless of the local time in different time zones.

UTC Time is unambiguous, it does not repeat.

Timezone: A timezone refers to a region on the globe where all clocks have the same offset from Coordinated Universal Time (UTC). It is significant because it impacts the local time in various parts of the world.

Offset: An offset refers to a certain duration of time that is either added or subtracted from Coordinated Universal Time (UTC) to obtain the local time in a specific timezone. This is important because it affects the local time in different parts of the world. We can create an offset using the timedelta class from the datetime module

from datetime import timedelta

offset = timedelta(hours=1)
dt2 = dt + offset

Printing dt2 out will give the following information datetime.datetime(2023, 5, 4, 11, 30, tzinfo=<DstTzInfo ‘Asia/Singapore’ LMT+6:55:00 STD>). Notice that the hour attribute changes from 10 to 1 after the addition of the offset.

The interplay of time, timezone, and offset are crucial in manipulating datetime in Python as they determine the true time in a specific timezone, including any adjustments for daylight saving time.

Attributes and methods to datetime objects, ISO 8601 Standard

The datetime class has several essential attributes that are commonly used in datetime manipulation. They are year , month , day , hour , minute , second , microsecond tzinfo. From our example above,

print(dt.year) #2023
print(dt.month) #5
print(dt.day) #4
print(dt.hour) #11
print(dt.minute) #30
print(dt.second) #0
print(dt.microsecond) #0
print(dt.tzinfo) #Asia/Singapore

Some of the essential methods include date() , time() , replace() , isoformat() , isocalendar() , strftime(format). Hold, what is ISO format anyway?

The ISO calendar format is a standard used globally to represent dates and times in a format that is easy to read by computer programs. The format consists of a specific syntax, where dates are represented using four digits for the year, two digits for the month, and two digits for the day (YYYY-MM-DD). For example, January 1st, 2023 would be represented as “2023–01–01”. Moreover, it can also take on more complicated information such as time, and timezone as illustrated in the code below.

print(dt.isoformat())
#return '2023-05-04T10:30:00+06:55'

print(dt.isocalendar())
#return tuple of datetime.IsoCalendarDate(year=2023, week=18, weekday=4)

datetime.fromisoformat("2023-01-05")
#return datetime.datetime(2023, 1, 5, 0, 0)

datetime.fromisoformat('2011-11-04T00:05:23')
#return datetime.datetime(2011, 11, 4, 0, 5, 23)

datetime.fromisoformat('2011-11-04 00:05:23.283')
#return datetime.datetime(2011, 11, 4, 0, 5, 23, 283000)

datetime.fromisoformat('2011-11-04 00:05:23.283+00:00')
#return datetime.datetime(2011, 11, 4, 0, 5, 23, 283000, tzinfo=datetime.timezone.utc)

datetime.fromisoformat('2011-11-04T00:05:23+04:00')
#return datetime.datetime(2011, 11, 4, 0, 5, 23, tzinfo=datetime.timezone(datetime.timedelta(seconds=14400)))

Let’s delve into the last line of code further.

  • 2011-11-04: The date component, represents November 4th, 2011.
  • T: A separator character indicating the start of the time component.
  • 00:05:23: The time component, representing 12:05:23am.
  • +04:00: The timezone offset component, indicating a 4-hour time difference from Coordinated Universal Time (UTC) in the positive direction (ahead of UTC).

Formatting datetime objects using strftime(format) and Parsing strings to datetime objects using strptime(format)

In Python, you can use the strftime(format) method to turn datetime objects into strings. You just need to give it a string that tells it how you want the string to look. Conversely, you can also parse strings into a datetime object using the strptime(input_string, input_format) method.

The format string can contain a combination of format codes and literal characters. Format codes are special character sequences (denoted by the symbol %) that are replaced with corresponding values from the datetime object. Literal characters are included in the resulting string as-is. Here is a list of common format codes from the Python User Guide¹.

  • %Y: The year as a four-digit number.
  • %m: The month as a zero-padded decimal number (01-12).
  • %d: The day of the month as a zero-padded decimal number (01-31).
  • %H: The hour as a zero-padded decimal number (00-23).
  • %M: The minute as a zero-padded decimal number (00-59).
  • %S: The second as a zero-padded decimal number (00-59).
  • %a: The abbreviated weekday name (Sun, Mon, Tue, etc.).
  • %A: The full weekday name (Sunday, Monday, Tuesday, etc.).
  • %b: The abbreviated month name (Jan, Feb, Mar, etc.).
  • %B: The full month name (January, February, March, etc.).
  • %p: The AM/PM designation (AM or PM).
from datetime import datetime
import pytz

tz = pytz.timezone('Asia/Singapore')
dt = datetime(2023, 5, 4, 10, 30, tzinfo=tz)

# Format as YYYY-MM-DD
# Output: 2023-05-04
print(dt.strftime('%Y-%m-%d'))

# Format as MM/DD/YYYY
# Output: 05/04/2023
print(dt.strftime('%m/%d/%Y'))

# Format as Weekday, Month DD, YYYY HH:MM PM/AM
# Output: Thursday, May 04, 2023 10:30 AM
print(dt.strftime('%A, %B %d, %Y %I:%M %p'))

# .strptime() converts string to datetime object
# Output: 2023-05-04 20:30:45 with type datetime.datetime
dt_parsed = datetime.strptime("2023-05-04 20:30:45", '%Y-%m-%d %H:%M:%S')
print(dt_parsed)

# Another example of .strptime()
# Output: 2022-05-05 12:30:00
input_str = '5 May 2022 12:30 PM'
input_format = '%d %B %Y %I:%M %p'
dt = datetime.strptime(input_str, input_format)

timedelta objects

A timedelta object in Python represents the duration or difference between two dates or times. It can be used to perform arithmetic with datetime objects, such as adding or subtracting time intervals, or calculating the difference between two datetime objects¹.

To create a timedelta object, you can use the datetime.timedelta() constructor, which takes one or more arguments to specify the duration. The arguments can be integers or floats, representing the number of days, seconds, microseconds, or a combination thereof. For example, timedelta(days=1, hours=3) creates a timedelta object that represents one day and three hours. You can perform arithmetic operations such as addition, subtraction, multiplication, and division with timedelta objects, and they can also be compared using comparison operators. For example,

from datetime import timedelta

delta_1 = timedelta(days=5, seconds=55, microseconds=555)
delta_2 = timedelta(days=1, seconds=11, microseconds=111)

print(delta_1.total_seconds()) #return 432055.000555
print(delta_1 - delta_2) #return 4 days, 0:00:44.000444
print(delta_1 > delta_2) #return True
print(delta_1 * 2) #return datetime.timedelta(days=10, seconds=110, microseconds=1110)
print(delta_1 / 2) #datetime.timedelta(days=2, seconds=43227, microseconds=500278)

POSIX Timestamp

A POSIX timestamp, also called a Unix timestamp or Epoch timestamp, is a way of representing time as a single integer value that can be easily compared and manipulated. It is widely used in computer systems and programming languages like Python. It represents a point in time as the number of seconds since January 1, 1970, 00:00:00 UTC, which is known as the Unix epoch time. It is useful for storing and manipulating dates and times in computer systems because it is not affected by time zones and daylight saving time (DST).

from datetime import datetime

# create a datetime object for a specific date and time
dt1 = datetime(2023, 5, 4, 10, 30, 0)
dt2 = datetime(2023, 5, 4, 11, 30, 0)

# convert the datetime object to a POSIX timestamp
timestamp1 = dt1.timestamp()
timestamp2 = dt2.timestamp()

print(timestamp1) # output: 1683171000.0
print(timestamp2) # output: 1683174600.0
print(timestamp2 - timestamp1) # output: 3600.0

Note that the difference between the timestamp is 3600 seconds, which is equal to a one-hour interval.

Similarly, we can also convert POSIX Timestamp to a datetime object.

from datetime import datetime

# create a POSIX timestamp for a specific date and time
timestamp = 1680658200.0

# convert the POSIX timestamp to a datetime object
dt = datetime.fromtimestamp(timestamp)

print(dt) # output: 2023-04-05 08:30:00

Pandas, .dt accessor and datetime64[ns]

The datetime64[ns] data type is a type of data that represents date and time with precision up to nanoseconds. It is part of the NumPy library in Python and is similar to the datetime module, but works better with large sets of date and time data. This makes it more efficient when working with large datasets, especially when combined with other NumPy functions for handling arrays and matrices.

When working with the Pandas Library, datetime64[ns] is such a great data type to work with as it allows us to access the powerful dt attribute for working with the datetime object. Note that this attribute is not available to the pd.Timestamp object, thus it is recommended to convert your data type to datetime64[ns] for ease of manipulating the datetime object. Let’s get started. We will import a random timestamp with 2 cities (Bangkok / Singapore) of 500 records from a CSV file. By default, the data type would be str. This section and coding style here is inspired by the book Effective Pandas.

random_timestamp = pd.read_csv("random_timestamp.csv")
random_timestamp

FYI, the offset time for Bangkok is GMT+7 whereas the offset time for Singapore is GMT+8. Our aim is to convert all the time into Singapore time. Working with datetime object in Pandas…

#Creating the offset
offset = np.where(random_timestamp.country == "Bangkok", "+07:00", "+08:00")
offset

#Convert data type using pd.to_datetime, groupby offset, convert to SG time
(pd
.to_datetime(random_timestamp.timestamp)
.groupby(offset)
.transform(lambda s: s.dt.tz_localize(s.name)
.dt.tz_convert('Asia/Singapore'))
)

If you are confused with the last method operation, here’s the breakdown: First, the datetime series or index is assumed to be timezone-naive, i.e., it does not have any timezone information attached to it.

  1. The .dt.tz_localize() method is used to attach a timezone to the datetime series or index, which effectively makes it timezone-aware.
  2. The method takes a single argument, which is the timezone to which the datetime series or index should be localized.
  3. Once the timezone is attached to the datetime series or index, you can perform datetime operations that require timezone information such as converting the column to a specific timezone (i.e. Singapore).
Left vs Right Output: Without vs With converting to a specific Timezone

.dt accessor allows us to retrieve datetime information just like the following

offset = np.where(random_timestamp.country == "Bangkok", "+07:00", "+08:00")
offset

(random_timestamp
.assign(sg_time = (pd
.to_datetime(random_timestamp.timestamp)
.groupby(offset)
.transform(lambda s: s.dt.tz_localize(s.name)
.dt.tz_convert('Asia/Singapore'))),
sg_hour = lambda df_: df_.sg_time.dt.hour,
sg_minute = lambda df_: df_.sg_time.dt.minute,
sg_second = lambda df_: df_.sg_time.dt.second,
sg_weekday = lambda df_: df_.sg_time.dt.weekday,
sg_weekofyear = lambda df_: df_.sg_time.dt.isocalendar().week,
sg_strftime = lambda df_: df_.sg_time.dt.strftime('%A, %B %d, %Y %I:%M %p'))
)

Date as Index, the .resample(), .agg(), .transform() method

This section and coding style here is inspired by the book Effective Pandas. Imagine we have a dataset of temperature at several points in time of the day such as the following.

We would want to perform aggregation of the data to find out the minimum, maximum and mean temperature of the day. We can achieve this by setting the index of the DataFrame to the column containing the datetime object, then using the resample method and specifying the frequency of aggregation.

(temp_record
.set_index("timestamp")
.resample('D')
.agg(['min', 'max', 'mean'])
.round(1)
)
.agg() summarises and shrinks the records of the DataFrame

If we would like to retain the number of rows instead of shrinking it into 7 days, we can use the method transform() instead of agg(). However, note that transform() cannot take in a list of aggregation methods but is limited to only one aggregation method at a time.

(temp_record
.set_index("timestamp")
.resample('D')
.transform('min')
.round(1)
)
.transform() retains all the records of the DataFrame

Afterword

Understanding and manipulating datetime objects is a crucial skill for any data analyst or scientist. By mastering the various classes and methods available, you can unlock powerful insights from your data and make informed decisions. Promise me, next time when you encounter a dataset with dates and times, don’t shy away from it — embrace it and let the datetime magic begins!

If you pick up something useful from this article, do consider giving me a Follow on Medium. Easy, 1 article a week to keep yourself updated and stay ahead of the curve!

You can connect with me on LinkedIn: https://www.linkedin.com/in/andreaslukita7/

References:

  1. Python Docs: https://docs.python.org/3/library/datetime.html#timezone-objects
  2. National Hurricane Center and Central Pacific Hurricane Center. What is UTC or GMT Time? https://www.nhc.noaa.gov/aboututc.shtml#:~:text=Prior%20to%201972%2C%20this%20time,%22%20or%20%22Zulu%20Time%22.
  3. National Institute of Standards and Technology. What is USNO time or UTC(USNO)? https://www.nist.gov/pml/time-and-frequency-division/nist-time-frequently-asked-questions-faq#:~:text=USNO%20has%20an%20ensemble%20of,scale%20called%20UTC(USNO).
  4. International Organization for Standardization. ISO 8601 Date and Time Format. https://www.iso.org/iso-8601-date-and-time-format.html
  5. UNIX Time. https://unixtime.org/
  6. Effective Pandas by Matt Harrison: https://store.metasnake.com/effective-pandas-book

--

--