Effective coding with dates and times in Python

Making use of datetime, zoneinfo, dateutil and pandas

Alicia Horsch
Towards Data Science

--

Photo by Jordan Benton from Pexels

I have been working extensively with time series data lately and have dealt with date and time objects in Python. For this, I have learned some useful tricks for working with datetime objects in Python that removed complexity from my code. In this article, I would like to share and summarise the most valuable tips and tricks I learned.

For demonstration, I will use two Kaggle datasets, which I will link when I use them. If you want to follow along, you can import the following libraries.

Datetime, zoneinfo, dateutil and pytz

Commonly used packages that deal with dates and times in Python are datetime, dateutil, pytz and the recent zoneinfo. Datetime is the built-in module for working with times and dates in Python and lets you do most of the basics. Dateutil and pytz are third-party modules and powerful extensions to datetime when dealing with more complex manipulations like relative time deltas, time zones, and parsing of strings.

However, since Python version 3.9, zoneinfo, is incorporated in the Python Standard Library and, therefore, is considered “more convenient” for timezone support compared to other third-party modules like dateutil or pytz.

So, depending on the Python version you are working with, the Python built-in modules might be already sufficient and no third-party modules (dateutil and pytz) are needed when dealing with different timezones!

In the remainder of the article, I will focus mostly on handling dates and times with datetime but will also mention potential options for zoneinfo or dateutil. The article will first focus on single datetime objects, followed by dealing with dates in arrays and data frames using numpy and pandas:

  1. Creating a date or datetime from scratch
  2. Converting to and parsing strings: strftime(), strptime() and dateutil
  3. Dates and times in numpy — numPy’s datetime64
  4. Dates and times in pandas
  5. Creating features from dates and times

1. Creating a date or datetime from scratch

The datetime package lets you create date and datetime objects easily from scratch that can be used, for example, as thresholds for filtering (try printing the created objects below and their types to better understand their format).

Also, datetime lets you create date and time objects that refer to today or now.

Be careful here, as datetime objects are usually “timezone naive” and do not refer to a specific time zone, which may get you into trouble when working with international colleagues!

With the help of the zoneinfo module (built-in since Python version 3.9), you can set the timezone with the tz parameter of astimezone().

2. Converting to and parsing strings: strftime(), strptime() and dateutil

You might find yourself in a situation where you want to display your datetime object as a string or convert a string into a datetime object. Here, the functions strftime() and strptime() are helpful.

Converting a datetime object (or parts of it) to a string

Commonly used format codes for describing datetime objects can be found here.

Converting a string into a datetime object

Parsing complex strings using dateutil

3. Dates and times in numpy - numPy's datetime64

If you are handling large datasets, numpy’s datetime64 may come in handy as, due to its design, it can be much faster than working with datetime and dateutil objects. The datetime64 data type in numpy encodes dates and times as 64-bit integers.

This stores dates and times compactly and allows vectorized operations (repeated operations applied to each element of a numpy array).

As you can see when running the code above, with a datetime or dateutil object, vectorized operations will give you an error.

4. Dates and times in pandas

Pandas can be a good choice when working on a time series data project.

The famous data-wrangling library pandas combines the convenience of datetime and dateutil with the effective storing and manipulation possibility from numpy.

Create a pandas dataframe (from CSV) parsing a date column

Now, we have a basic understanding of handling dates and times in Python using numpy and pandas. However, often, we do not create dates and times ourselves, but they are already part of the dataset we are dealing with. Let’s create a pandas data frame with a date column (Kaggle dataset NFL).

As you can see, when loading from a CSV, the column that holds a date is turned into a string format if not specified anywhere precisely. To receive the date format, you could create an extra column called “gameDate_dateformat” or directly pass the date column through the parameter parse_dates in pd.read_csv().

Another handy manipulation when working with time series data is to be able to filter by date/time or subsetting a data frame using date/time. There are two methods to do this: filtering/subsetting or indexing.

Filtering pandas data frames by time

Make sure that the threshold date you use for subsetting has the same format as the column!

If the column you want to filter by has the format datetime (like in the example), the comparison date cannot be a date but needs to have a datetime format!

Indexing pandas data frames by time

Even more powerful is indexing a pandas data frame by date or time.

Indexing can be especially useful when working with time series, as there are methods like rolling windows and time-shifting.

5. Creating features from dates and times

Often, we are not interested in the date itself but maybe the duration, the weekday, or just a part of the datetime, e.g. the year. For this, datetime but also pandas provide some useful manipulations.

Timedelta

With pandas, you can calculate, for example, the difference between two datetimes. For this, we will look at a different dataset of Uber trips (Kaggle dataset Uber) with a start and an end timestamp. Some preprocessing is needed (delete the Total Row) to start looking into timedelta.

Extract the weekday or the month

This works slightly differently for the single datetime versus the pandas Series. While the weekday or the month of the single datetime object can be directly accessed by adding an attribute (e.g., .month) or method (e.g., weekday()), the pandas Series always needs the .dt accessor.

The dt. accessor allows you to access datetime-specific attributes and methods from a datetime Series.

Create a date/time lag

Another helpful manipulation for time series data could be to add an extra column that adds a lag of a date or datetime.

Summary

To work with date or time objects in Python, knowing the basics of the built-in package datetime (e.g. date() or strftime() and strptime()) are beneficial. Zoneinfo is a new built-in package (since version 3.9) which is more convenient than third-party modules when working with different time zones. Dateutil is a valuable library for more advanced date and time manipulations when working with single date objects, e.g., parsing complex strings. When working with dates and times in data frames, Series, or arrays, pandas combines the benefits of datetime, dateutil, and numpy and serves as a convenient library.

Sources

--

--