The world’s leading publication for data science, AI, and ML professionals.

A Collection of Must-Know Techniques for Working with Time Series Data in Python

How to manipulate and visualize time series data in datetime format with ease

Illustration by xsIJciJN from IllustAC
Illustration by xsIJciJN from IllustAC

Working with time series data can be intimidating at first. The time series values are not the only information you have to consider. The timestamps also contain information, especially about the relationship between the values.

The timestamps also contain information, especially about the relationship between the values.

In contrast to common data types, timestamps have a few unique characteristics. While they look like a string at first glance, they also have numerical aspects.

This article will give you the following list of must-know techniques for handling time series data:

How to Deal with Datetime FormatReading Datetime Format Converting a String to DatetimeConverting Unix Time to Datetime FormatCreating a Range of DatesChanging the Datetime Format How to Compose and Decompose a DatetimeDecomposing a DatetimeAssembling Multiple Columns to a Datetime How to Fill Missing ValuesFilling Missing Values with a Constant ValueFilling Missing Values with the Last ValueFilling Missing Values with Linearly Interpolated Values How to Perform Operations on a Time SeriesGetting the Min and MaxDifferencingCumulatingGetting the Rolling MeanCalculating the Time Difference between Two Timestamps How to Filter Time SeriesFiltering Time Series on Specific TimestampsFiltering Time Series on Time Ranges How to Resample Time SeriesDownsamplingUpsampling How to Plot Time SeriesPlotting Numerical Data over TimePlotting Categorical Data over TimePlotting a TimelineSetting the X-Axis Limits of a Time SeriesSetting the X-Ticks of a Time Series


For this article, we will be using a minimal fictional dataset. It has three columns: date, cat_feature, and num_feature.

Fictional minimal time series dataset loaded as pandas DataFrame (Image by the author)
Fictional minimal time series dataset loaded as pandas DataFrame (Image by the author)

How to Deal with Datetime Format

The essential part of time series data is the timestamps. If these timestamps are in Datetime format, you can apply various manipulations, which we will discuss in this section.

Reading Datetime Format

By default, pandas reads timestamp columns as strings into a DataFrame when reading from a CSV file. To read the timestamp column as datetime objects (with data type datetime64[ns]) directly, you can use the parse_date parameter, as shown below.

import pandas as pd
df = pd.read_csv("example.csv", 
                 parse_dates = ["date"]) 
Fictional minimal time series dataset loaded as pandas DataFrame (Image by the author)
Fictional minimal time series dataset loaded as pandas DataFrame (Image by the author)
Data type of time stamps as datetime64[ns] (Image by the author)
Data type of time stamps as datetime64[ns] (Image by the author)

Converting a String to Datetime

To convert a string to a datetime64[ns] format, you can use the [.to_datetime()](https://pandas.pydata.org/docs/reference/api/pandas.to_datetime.html) method. This is handy if you can’t use the parse_dates parameter during the import. You can look up the relevant [strftime](https://strftime.org/) format.

# By default the date column is imported as string
df = pd.read_csv("example.csv")
# Convert to datetime data type
df["date"] = pd.to_datetime(df["date"], 
                            format = "%Y-%m-%d %H:%M:%S.%f")
Fictional minimal time series dataset loaded as pandas DataFrame (Image by the author)
Fictional minimal time series dataset loaded as pandas DataFrame (Image by the author)

Converting Unix Time to Datetime Format

If your timestamp column is in Unix time, you can convert it to a human-readable format with the [.to_datetime()](https://pandas.pydata.org/docs/reference/api/pandas.to_datetime.html) method by using the unit parameter.

# Convert from unix
df["date"] = pd.to_datetime(df["date_unix"], 
                                 unit = "s")
# Convert to unix
df["date_unix"] = df["date"].view('int64')
Timestamps converted to unix (Image by the author)
Timestamps converted to unix (Image by the author)

Creating a Range of Dates

If you want to create a range of dates, you have two options:

  • define the range of dates with a start and an end date
  • define the range of dates with a start date, a frequency (e.g., daily, monthly, etc.), and the number of periods.
df["date"] = pd.date_range(start = "2022-01-01", 
                           end = "2022-12-31")
df["date"] = pd.date_range(start = "2022-01-01", 
                           periods = 365, 
                           freq = "D")
Pandas series of a range of dates (Image by the author)
Pandas series of a range of dates (Image by the author)

Changing the Datetime Format

To change the timestamp format you can use the [.strftime()](https://docs.Python.org/3/library/datetime.html#strftime-strptime-behavior) method.

# Example: Change "2022-01-01" to "January 1, 2022"
df["date"] = df["date"].dt.strftime("%b %d, %Y")
Changed datetime format with strftime (Image by the author)
Changed datetime format with strftime (Image by the author)

How to Compose and Decompose a Datetime

A timestamp is made up of many things like, e.g., the date or the time – or even more fine-grained, like the hour or the minute. This section will discuss how to decompose a daytime data type into its components, and also how to compose a datetime data type from different columns containing timestamp components.

Decomposing a Datetime

When you have a date and a timestamp, you can decompose them into their components, as shown below.

# Splitting date and time
df["dates"] = df["date"].dt.date
df["times"] = df["date"].dt.time
Timestamp decomposed to dates and times (Image by the author)
Timestamp decomposed to dates and times (Image by the author)

You can find also decompose it into smaller components, as shown below. You can find more possible components in the pandas DatetimeIndex documentation.

# Creating datetimeindex features
df["year"] = df["date"].dt.year
df["month"] = df["date"].dt.month
df["day"] = df["date"].dt.day
# etc.
Timestamp decomposed to year, monthn, and day (Image by the author)
Timestamp decomposed to year, monthn, and day (Image by the author)

Assembling Multiple Columns to a Datetime

If you want to assemble a date column from its components like the year, month, and day, you can also use the [.to_datetime()](https://pandas.pydata.org/docs/reference/api/pandas.to_datetime.html) method.

df["date"] = pd.to_datetime(df[["year", "month", "day"]])
Assembling Multiple Columns to a Datetime (Image by the author)
Assembling Multiple Columns to a Datetime (Image by the author)

How to Fill Missing Values

Filling missing values is challenging whether you are working with numerical, categorical, or time series data. This section will explore three methods to fill in missing values in time series data.

Filling Missing Values with a Constant Value

One approach is to fill missing values with a constant value with the .fillna() method. Commonly such a constant value could be the mean of the time series or an outlier value like -1 or 999. However, filling missing values with a constant value is often not sufficient.

df["num_feature"] = df["num_feature"].fillna(0)
Filling Missing Values with a Constant Value (Image by the author via Kaggle)
Filling Missing Values with a Constant Value (Image by the author via Kaggle)

Filling Missing Values with the Last Value

Another approach is to fill the missing value with the last available value with the .ffill() method.

df["num_feature"] = df["num_feature"].ffill()
Filling Missing Values with the Last Value (Image by the author via Kaggle)
Filling Missing Values with the Last Value (Image by the author via Kaggle)

Filling Missing Values with Linearly Interpolated Values

Often a good solution to handle missing values is to linearly interpolate the missing values with the .interpolate() method.

df["num_feature"] = df["num_feature"].interpolate()
Filling Missing Values with Linearly Interpolated Values (Image by the author via Kaggle)
Filling Missing Values with Linearly Interpolated Values (Image by the author via Kaggle)

How to Perform Operations on a Time Series

You can perform various operations on time series data, which we will discuss in this section.

Getting the Min and Max

Knowing the time series’ start or end date can be helpful in many cases.

df["date"].min()
df["date"].max()

Differencing

Differencing means taking the difference between two consecutive values in a time series. For this, you can use the .diff() method.

df["num_feature_diff"] = df["num_feature"].diff()
Differencing of time series data (Image by the author)
Differencing of time series data (Image by the author)

Cumulating

The opposite of differencing is accumulating values of the time series with the .cumsum() method.

df["num_feature_cumsum"] = df["num_feature"].cumsum()
Cumulating of time series data (Image by the author)
Cumulating of time series data (Image by the author)

Getting the Rolling Mean

Sometimes you need the rolling mean of a time series. You can use the .rolling() method, which takes a parameter of the number of values to consider in the rolling window. In the example below, we take the mean of three values. Therefore, the first two rows are empty, and the third row is the mean value of the first three rows.

df["num_feature_mean"] = df["num_feature"].rolling(3).mean()
Rolling mean of time series data (Image by the author)
Rolling mean of time series data (Image by the author)

Calculating the Time Difference between Two Timestamps

Sometimes you need to calculate the time difference between two timestamps. E.g., if you might need to calculate the time difference from a specific date.

df["time_since_start"] = df["date"] - df["date"].min()
Time difference of timestamp and first timestamp (Image by the author)
Time difference of timestamp and first timestamp (Image by the author)

Or if you want to find out whether the timestamps are distributed equidistantly.

df["timestamp_difference"] = df["date"].diff()
Time difference between timestamps (Image by the author)
Time difference between timestamps (Image by the author)

How to Filter Time Series

When working with time series data, you might need to filter it at specific times. To filter the time series data, you must set the date column as the index. Once you have the time stamp index, you can fill it out on a specific date or even on a specific time range.

df = df.set_index(["date"])
DataFrame of time series data with the timestamps as index (Image by the author)
DataFrame of time series data with the timestamps as index (Image by the author)

Filtering Time Series on Specific Timestamps

When you have the timestamps set as the index of the pandas DataFrame, you can easily filter for specific timestamps with loc.

df.loc["2020-03-30"]
Filtered tme series on a date (Image by the author)
Filtered tme series on a date (Image by the author)

Filtering Time Series on Time Ranges

Similarly to the above example of filtering on specific timestamps, you can also use loc for filtering on time ranges when the timestamps are set as the index of the pandas DataFrame.

df.loc["2020-04-10":"2020-04-15"]
Filtered tme series on a date range (Image by the author)
Filtered tme series on a date range (Image by the author)

How to Resample Time Series

Resampling can provide additional information on the data. There are two types of resampling:

Downsampling

Downsampling is when the frequency of samples is decreased (e.g., seconds to months). You can use the .resample() method.

upsampled = df.resample("M")["num_feature"].mean()
Series of monthly resampled (downsampled) values (Image by the author)
Series of monthly resampled (downsampled) values (Image by the author)

Upsampling

Upsampling is when the frequency of samples is increased (e.g., months to days). Again, you can use the .resample() method.

upsampled.resample("D").interpolate(method = "linear")
Series of daily resampled (upsampled) values (Image by the author)
Series of daily resampled (upsampled) values (Image by the author)

How to Plot Time Series

This section will discuss how to visualize numerical and categorical time series data with Matplotlib and Seaborn. In addition to the pyplot module, we will explore different visualization techniques with the dates module.

import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import seaborn as sns

To visualize the timely order of a time series, the x-axis of a plot usually represents the time, and the y-axis represents the value.

Plotting Numerical Data over Time

Most time series data is numerical, e.g., temperature or stock price data. To visualize numerical time series data, you can use line plots.

sns.lineplot(data = df, 
             x = "date", 
             y = "num_feature")
Line plot of numerical time series data (Image by the author)
Line plot of numerical time series data (Image by the author)

Plotting Categorical Data over Time

Sometimes time series data can be categorical, e.g., tracking occurrences of different events.

Before plotting the data, you can label encode the categorical columns, e.g., by using the LabelEncoder or with a simple dictionary, as shown below.

# Label encode the categorical column
enum_dict = {}
for i, cat in enumerate(df.cat_feature.unique()):
    enum_dict[cat] = i
df["cat_feature_enum] = df["cat_feature"].replace(enum_dict)
Label encoded feature "cat_feature" as "cat_feature_enum" (Image by the author)
Label encoded feature "cat_feature" as "cat_feature_enum" (Image by the author)

To visualize categorical time series data, you can use scatter plots.

fig, ax = plt.subplots(figsize=(8, 4))
sns.scatterplot(data = df,
                x = "date", 
                y = "cat_feature_enum", 
                hue = "cat_feature",
                marker = '.',
                linewidth = 0,
                )
ax.set_yticks(np.arange(0, (len(df.cat_feature.unique()) + 1), 1))
ax.set_yticklabels(df.cat_feature.unique())
ax.get_legend().remove() # remove legend - it's not necessary here
plt.show()
Event plot of categorical time series data with scatter plot (Image by the author)
Event plot of categorical time series data with scatter plot (Image by the author)

You can also try out Matplotlib’s eventplot demo.

Plotting a Timeline

For plotting a timeline, we will use the label encoded categorical values from the previous section and vlines.

fig, ax = plt.subplots(figsize=(8, 4))
ax.vlines(df["date"], 0, df["cat_feature_enum"])
plt.show()
Timeline plot of categorical time series data with vlines (Image by the author)
Timeline plot of categorical time series data with vlines (Image by the author)

Setting the X-Axis Limits of a Time Series

When you want to set the x-axis limits of a time series plot, the range has to be of the datetime64[ns] data type.

E.g., you can use the minimum and maximum timestamps of your time series:

ax.set_xlim([df.date.min(), df.date.max()])

Or you can specify a custom range, as shown below:

ax.set_xlim(np.array(["2020-04-01", "2020-04-30"],
                      dtype="datetime64"))
Adjusted x-axis ranges (Image by the author)
Adjusted x-axis ranges (Image by the author)

Setting the X-Ticks of a Time Series

To improve the readability of your data visualization, you can add major and minor x-ticks at specific intervals (e.g., weekly, monthly, yearly, etc.)

ax.xaxis.set_major_locator(mdates.MonthLocator())
ax.xaxis.set_major_formatter(mdates.DateFormatter("%b %d"));
ax.xaxis.set_minor_locator(mdates.DayLocator())
Custom x-axis ticks (Image by the author)
Custom x-axis ticks (Image by the author)

Conclusion

Getting started with handling time series data can be challenging when you are unfamiliar with the [datetime](https://docs.python.org/3/library/datetime.html#datetime.datetime) data type. As you saw, the datetime data type has many practical in-built methods for easily manipulating time series data. This article discussed everything from manipulating the timestamps and valuable operations of the time series values to visualizing time series data.


Enjoyed This Story?

Here is a collection of my other Time Series Analysis and Forecasting articles:

Time Series Analysis and Forecasting

Subscribe for free to get notified when I publish a new story.

Get an email whenever Leonie Monigatti publishes.

Find me on LinkedIn, Twitter, and Kaggle!


Related Articles