
Working with data that has dates and times can be easily overwhelming, especially if you are not quite familiar with the ins and outs of Datetime
manipulation. Many terms such as DatetimeIndex
, Timestamp
, Timedelta
, Timezone
, and Offset
, might be confusing to grasp and remember, even for intermediate-level analysts. This guide will help you master datetime
manipulation and unlock powerful insights from your data. Let’s get started!
The datetime
module in Python’s standard library provides classes that can work with dates, times, and time intervals¹. This module is particularly important in data analysis because dates and times are often key components of data, and manipulating them accurately is essential for projects such as time series analysis and financial modeling. With the use of datetime
, analysts can gain a better understanding of time-based trends and patterns in data, which can lead to more accurate insights and predictions from the dataset. The 6 classes under the datetime
module include date
, time
, datetime
, timedelta
, tzinfo
, and timezone
Table of Content
- Aware vs Naive
datetime
objects - Coordinated Universal Time (UTC), Time, Timezone, Offset
- Attributes and methods to
datetime
objects, ISO 8601 Standard - Formatting
datetime
objects using `strftime(format)and Parsing strings to
datetimeobjects using **
strptime**(format)` timedelta
objects- POSIX Timestamp
- Pandas,
.dt
accessor anddatetime64[ns]
- Date as Index, the
.resample(), .agg(), .transform()
method
Aware vs Naive datetime
objects
In simple terms, an aware datetime object contains timezone information, making it unambiguous about the timezone for a specific date and time¹. To create an aware datetime object, a timezone object needs to be attached to the datetime object with the help of pytz
module.
from datetime import datetime
import pytz
tz = pytz.timezone('Asia/Singapore')
dt = datetime(2023, 5, 4, 10, 30, 0, tzinfo=tz)
This creates an aware datetime
object representing May 4, 2023, at 10:30 AM Singapore Time. The tzinfo
argument specifies the timezone for the datetime
object. Printing dt
out will give us the following information datetime.datetime(2023, 5, 4, 10, 30, tzinfo=<DstTzInfo ‘Asia/Singapore’ LMT+6:55:00 STD>).
On the other hand, a naïve datetime object does not contain timezone information. It does represents date and time, but it is not clear what timezone that date and time refer to¹.
from datetime import datetime
dt = datetime(2023, 5, 4, 10, 30, 0)
Calling the attribute dt.tzinfo
and dt.utcoffset
produces None
.
It is worth noting that aware datetime
objects are always in UTC time internally, and they are adjusted to the specified timezone when displayed or used in calculations. This implies that you can compare aware datetime
objects from different timezones directly since they are both internally represented in UTC time. More often than not, it is better to use aware datetime
objects wherever possible, especially in applications that deal with data from various timezones.
Coordinated Universal Time (UTC), Time, Timezone, and Offset
UTC, or Universal Coordinated Time, is the main time standard used worldwide to regulate clocks and time. Prior to 1972, it is known as Greenwich Mean Time (GMT)². UTC is a time standard that is globally recognized and coordinated, making it essential for international communication, navigation, and scientific research. It is worth noting that UTC is not affected by daylight saving time, making it a stable reference point for time-related activities. Instead, it is based on atomic clocks and is adjusted as needed to stay synchronized with the Earth’s rotation by adding or subtracting leap seconds³. As a result, UTC time is consistent worldwide, regardless of the local time in different time zones.
UTC Time is unambiguous, it does not repeat.
Timezone: A timezone refers to a region on the globe where all clocks have the same offset from Coordinated Universal Time (UTC). It is significant because it impacts the local time in various parts of the world.
Offset: An offset refers to a certain duration of time that is either added or subtracted from Coordinated Universal Time (UTC) to obtain the local time in a specific timezone. This is important because it affects the local time in different parts of the world. We can create an offset using the timedelta
class from the datetime
module
from datetime import timedelta
offset = timedelta(hours=1)
dt2 = dt + offset
Printing dt2
out will give the following information datetime.datetime(2023, 5, 4, 11, 30, tzinfo=<DstTzInfo ‘Asia/Singapore’ LMT+6:55:00 STD>). Notice that the hour attribute changes from 10 to 1 after the addition of the offset.
The interplay of time, timezone, and offset are crucial in manipulating datetime
in Python as they determine the true time in a specific timezone, including any adjustments for daylight saving time.
Attributes and methods to datetime objects, ISO 8601 Standard
The datetime
class has several essential attributes that are commonly used in datetime
manipulation. They are year
, month
, day
, hour
, minute
, second
, microsecond
tzinfo
. From our example above,
print(dt.year) #2023
print(dt.month) #5
print(dt.day) #4
print(dt.hour) #11
print(dt.minute) #30
print(dt.second) #0
print(dt.microsecond) #0
print(dt.tzinfo) #Asia/Singapore
Some of the essential methods include date()
, time()
, replace()
, isoformat()
, isocalendar()
, strftime(format)
. Hold, what is ISO format anyway?
The ISO calendar format is a standard used globally to represent dates and times in a format that is easy to read by computer programs⁴. The format consists of a specific syntax, where dates are represented using four digits for the year, two digits for the month, and two digits for the day (YYYY-MM-DD). For example, January 1st, 2023 would be represented as "2023–01–01". Moreover, it can also take on more complicated information such as time, and timezone as illustrated in the code below.
print(dt.isoformat())
#return '2023-05-04T10:30:00+06:55'
print(dt.isocalendar())
#return tuple of datetime.IsoCalendarDate(year=2023, week=18, weekday=4)
datetime.fromisoformat("2023-01-05")
#return datetime.datetime(2023, 1, 5, 0, 0)
datetime.fromisoformat('2011-11-04T00:05:23')
#return datetime.datetime(2011, 11, 4, 0, 5, 23)
datetime.fromisoformat('2011-11-04 00:05:23.283')
#return datetime.datetime(2011, 11, 4, 0, 5, 23, 283000)
datetime.fromisoformat('2011-11-04 00:05:23.283+00:00')
#return datetime.datetime(2011, 11, 4, 0, 5, 23, 283000, tzinfo=datetime.timezone.utc)
datetime.fromisoformat('2011-11-04T00:05:23+04:00')
#return datetime.datetime(2011, 11, 4, 0, 5, 23, tzinfo=datetime.timezone(datetime.timedelta(seconds=14400)))
Let’s delve into the last line of code further.
2011-11-04
: The date component, represents November 4th, 2011.T
: A separator character indicating the start of the time component.00:05:23
: The time component, representing 12:05:23am.+04:00
: The timezone offset component, indicating a 4-hour time difference from Coordinated Universal Time (UTC) in the positive direction (ahead of UTC).
Formatting datetime objects using strftime(format) and
Parsing strings to datetime objects using `strptime(format)`
In Python, you can use the strftime(format)
method to turn datetime objects into strings. You just need to give it a string that tells it how you want the string to look. Conversely, you can also parse strings into a datetime object using the strptime(input_string, input_format)
method.
The format string can contain a combination of format codes and literal characters. Format codes are special character sequences (denoted by the symbol %) that are replaced with corresponding values from the datetime
object. Literal characters are included in the resulting string as-is. Here is a list of common format codes from the Python User Guide¹.
%Y
: The year as a four-digit number.%m
: The month as a zero-padded decimal number (01-12).%d
: The day of the month as a zero-padded decimal number (01-31).%H
: The hour as a zero-padded decimal number (00-23).%M
: The minute as a zero-padded decimal number (00-59).%S
: The second as a zero-padded decimal number (00-59).%a
: The abbreviated weekday name (Sun, Mon, Tue, etc.).%A
: The full weekday name (Sunday, Monday, Tuesday, etc.).%b
: The abbreviated month name (Jan, Feb, Mar, etc.).%B
: The full month name (January, February, March, etc.).%p
: The AM/PM designation (AM or PM).
from datetime import datetime
import pytz
tz = pytz.timezone('Asia/Singapore')
dt = datetime(2023, 5, 4, 10, 30, tzinfo=tz)
# Format as YYYY-MM-DD
# Output: 2023-05-04
print(dt.strftime('%Y-%m-%d'))
# Format as MM/DD/YYYY
# Output: 05/04/2023
print(dt.strftime('%m/%d/%Y'))
# Format as Weekday, Month DD, YYYY HH:MM PM/AM
# Output: Thursday, May 04, 2023 10:30 AM
print(dt.strftime('%A, %B %d, %Y %I:%M %p'))
# .strptime() converts string to datetime object
# Output: 2023-05-04 20:30:45 with type datetime.datetime
dt_parsed = datetime.strptime("2023-05-04 20:30:45", '%Y-%m-%d %H:%M:%S')
print(dt_parsed)
# Another example of .strptime()
# Output: 2022-05-05 12:30:00
input_str = '5 May 2022 12:30 PM'
input_format = '%d %B %Y %I:%M %p'
dt = datetime.strptime(input_str, input_format)
timedelta
objects
A timedelta
object in Python represents the duration or difference between two dates or times. It can be used to perform arithmetic with datetime
objects, such as adding or subtracting time intervals, or calculating the difference between two datetime
objects¹.
To create a timedelta
object, you can use the datetime.timedelta()
constructor, which takes one or more arguments to specify the duration. The arguments can be integers or floats, representing the number of days, seconds, microseconds, or a combination thereof. For example, timedelta(days=1, hours=3)
creates a timedelta
object that represents one day and three hours. You can perform arithmetic operations such as addition, subtraction, multiplication, and division with timedelta
objects, and they can also be compared using comparison operators. For example,
from datetime import timedelta
delta_1 = timedelta(days=5, seconds=55, microseconds=555)
delta_2 = timedelta(days=1, seconds=11, microseconds=111)
print(delta_1.total_seconds()) #return 432055.000555
print(delta_1 - delta_2) #return 4 days, 0:00:44.000444
print(delta_1 > delta_2) #return True
print(delta_1 * 2) #return datetime.timedelta(days=10, seconds=110, microseconds=1110)
print(delta_1 / 2) #datetime.timedelta(days=2, seconds=43227, microseconds=500278)
POSIX Timestamp
A POSIX timestamp, also called a Unix timestamp or Epoch timestamp, is a way of representing time as a single integer value that can be easily compared and manipulated⁵. It is widely used in computer systems and programming languages like Python. It represents a point in time as the number of seconds since January 1, 1970, 00:00:00 UTC, which is known as the Unix epoch time. It is useful for storing and manipulating dates and times in computer systems because it is not affected by time zones and daylight saving time (DST).
from datetime import datetime
# create a datetime object for a specific date and time
dt1 = datetime(2023, 5, 4, 10, 30, 0)
dt2 = datetime(2023, 5, 4, 11, 30, 0)
# convert the datetime object to a POSIX timestamp
timestamp1 = dt1.timestamp()
timestamp2 = dt2.timestamp()
print(timestamp1) # output: 1683171000.0
print(timestamp2) # output: 1683174600.0
print(timestamp2 - timestamp1) # output: 3600.0
Note that the difference between the timestamp is 3600 seconds, which is equal to a one-hour interval.
Similarly, we can also convert POSIX Timestamp to a datetime
object.
from datetime import datetime
# create a POSIX timestamp for a specific date and time
timestamp = 1680658200.0
# convert the POSIX timestamp to a datetime object
dt = datetime.fromtimestamp(timestamp)
print(dt) # output: 2023-04-05 08:30:00
Pandas, .dt accessor and datetime64[ns]
The datetime64[ns]
data type is a type of data that represents date and time with precision up to nanoseconds. It is part of the NumPy library in Python and is similar to the datetime
module, but works better with large sets of date and time data. This makes it more efficient when working with large datasets, especially when combined with other NumPy functions for handling arrays and matrices.
When working with the Pandas Library, datetime64[ns]
is such a great data type to work with as it allows us to access the powerful dt
attribute for working with the datetime
object. Note that this attribute is not available to the pd.Timestamp
object, thus it is recommended to convert your data type to datetime64[ns]
for ease of manipulating the datetime
object. Let’s get started. We will import a random timestamp with 2 cities (Bangkok / Singapore) of 500 records from a CSV file. By default, the data type would be str
. This section and coding style here is inspired by the book Effective Pandas⁶.
random_timestamp = pd.read_csv("random_timestamp.csv")
random_timestamp

FYI, the offset time for Bangkok is GMT+7 whereas the offset time for Singapore is GMT+8. Our aim is to convert all the time into Singapore time. Working with datetime
object in Pandas…
#Creating the offset
offset = np.where(random_timestamp.country == "Bangkok", "+07:00", "+08:00")
offset
#Convert data type using pd.to_datetime, groupby offset, convert to SG time
(pd
.to_datetime(random_timestamp.timestamp)
.groupby(offset)
.transform(lambda s: s.dt.tz_localize(s.name)
.dt.tz_convert('Asia/Singapore'))
)
If you are confused with the last method operation, here’s the breakdown: First, the datetime
series or index is assumed to be timezone-naive, i.e., it does not have any timezone information attached to it.
- The
.dt.tz_localize()
method is used to attach a timezone to thedatetime
series or index, which effectively makes it timezone-aware. - The method takes a single argument, which is the timezone to which the
datetime
series or index should be localized. - Once the timezone is attached to the
datetime
series or index, you can performdatetime
operations that require timezone information such as converting the column to a specific timezone (i.e. Singapore).

.dt
accessor allows us to retrieve datetime
information just like the following
offset = np.where(random_timestamp.country == "Bangkok", "+07:00", "+08:00")
offset
(random_timestamp
.assign(sg_time = (pd
.to_datetime(random_timestamp.timestamp)
.groupby(offset)
.transform(lambda s: s.dt.tz_localize(s.name)
.dt.tz_convert('Asia/Singapore'))),
sg_hour = lambda df_: df_.sg_time.dt.hour,
sg_minute = lambda df_: df_.sg_time.dt.minute,
sg_second = lambda df_: df_.sg_time.dt.second,
sg_weekday = lambda df_: df_.sg_time.dt.weekday,
sg_weekofyear = lambda df_: df_.sg_time.dt.isocalendar().week,
sg_strftime = lambda df_: df_.sg_time.dt.strftime('%A, %B %d, %Y %I:%M %p'))
)

Date as Index, the .resample(), .agg(), .transform()
method
This section and coding style here is inspired by the book Effective Pandas⁶. Imagine we have a dataset of temperature at several points in time of the day such as the following.

We would want to perform aggregation of the data to find out the minimum, maximum and mean temperature of the day. We can achieve this by setting the index of the DataFrame to the column containing the datetime
object, then using the resample
method and specifying the frequency of aggregation.
(temp_record
.set_index("timestamp")
.resample('D')
.agg(['min', 'max', 'mean'])
.round(1)
)

If we would like to retain the number of rows instead of shrinking it into 7 days, we can use the method transform()
instead of agg()
. However, note that transform()
cannot take in a list of aggregation methods but is limited to only one aggregation method at a time.
(temp_record
.set_index("timestamp")
.resample('D')
.transform('min')
.round(1)
)

Afterword
Understanding and manipulating datetime
objects is a crucial skill for any data analyst or scientist. By mastering the various classes and methods available, you can unlock powerful insights from your data and make informed decisions. Promise me, next time when you encounter a dataset with dates and times, don’t shy away from it – embrace it and let the datetime
magic begins!
If you pick up something useful from this article, do consider giving me a Follow on Medium. Easy, 1 article a week to keep yourself updated and stay ahead of the curve!
You can connect with me on LinkedIn: https://www.linkedin.com/in/andreaslukita7/
References:
- Python Docs: https://docs.python.org/3/library/datetime.html#timezone-objects
- National Hurricane Center and Central Pacific Hurricane Center. What is UTC or GMT Time? https://www.nhc.noaa.gov/aboututc.shtml#:~:text=Prior%20to%201972%2C%20this%20time,%22%20or%20%22Zulu%20Time%22.
- National Institute of Standards and Technology. What is USNO time or UTC(USNO)? https://www.nist.gov/pml/time-and-frequency-division/nist-time-frequently-asked-questions-faq#:~:text=USNO%20has%20an%20ensemble%20of,scale%20called%20UTC(USNO).
- International Organization for Standardization. ISO 8601 Date and Time Format. https://www.iso.org/iso-8601-date-and-time-format.html
- UNIX Time. https://unixtime.org/
- Effective Pandas by Matt Harrison: https://store.metasnake.com/effective-pandas-book