
Working with time series data can be intimidating at first. The time series values are not the only information you have to consider. The timestamps also contain information, especially about the relationship between the values.
The timestamps also contain information, especially about the relationship between the values.
In contrast to common data types, timestamps have a few unique characteristics. While they look like a string at first glance, they also have numerical aspects.
This article will give you the following list of must-know techniques for handling time series data:
How to Deal with Datetime Format ∘ Reading Datetime Format ∘ Converting a String to Datetime ∘ Converting Unix Time to Datetime Format ∘ Creating a Range of Dates ∘ Changing the Datetime Format How to Compose and Decompose a Datetime ∘ Decomposing a Datetime ∘ Assembling Multiple Columns to a Datetime How to Fill Missing Values ∘ Filling Missing Values with a Constant Value ∘ Filling Missing Values with the Last Value ∘ Filling Missing Values with Linearly Interpolated Values How to Perform Operations on a Time Series ∘ Getting the Min and Max ∘ Differencing ∘ Cumulating ∘ Getting the Rolling Mean ∘ Calculating the Time Difference between Two Timestamps How to Filter Time Series ∘ Filtering Time Series on Specific Timestamps ∘ Filtering Time Series on Time Ranges How to Resample Time Series ∘ Downsampling ∘ Upsampling How to Plot Time Series ∘ Plotting Numerical Data over Time ∘ Plotting Categorical Data over Time ∘ Plotting a Timeline ∘ Setting the X-Axis Limits of a Time Series ∘ Setting the X-Ticks of a Time Series
For this article, we will be using a minimal fictional dataset. It has three columns: date
, cat_feature
, and num_feature
.

How to Deal with Datetime Format
The essential part of time series data is the timestamps. If these timestamps are in Datetime format, you can apply various manipulations, which we will discuss in this section.
Reading Datetime Format
By default, pandas reads timestamp columns as strings into a DataFrame when reading from a CSV file. To read the timestamp column as datetime objects (with data type datetime64[ns]
) directly, you can use the parse_date
parameter, as shown below.
import pandas as pd
df = pd.read_csv("example.csv",
parse_dates = ["date"])

![Data type of time stamps as datetime64[ns] (Image by the author)](https://towardsdatascience.com/wp-content/uploads/2022/10/15K0KJepZ-h0nBDQap7cZLg.png)
Converting a String to Datetime
To convert a string to a datetime64[ns]
format, you can use the [.to_datetime()](https://pandas.pydata.org/docs/reference/api/pandas.to_datetime.html)
method. This is handy if you can’t use the parse_dates
parameter during the import. You can look up the relevant [strftime](https://strftime.org/)
format.
# By default the date column is imported as string
df = pd.read_csv("example.csv")
# Convert to datetime data type
df["date"] = pd.to_datetime(df["date"],
format = "%Y-%m-%d %H:%M:%S.%f")

Converting Unix Time to Datetime Format
If your timestamp column is in Unix time, you can convert it to a human-readable format with the [.to_datetime()](https://pandas.pydata.org/docs/reference/api/pandas.to_datetime.html)
method by using the unit
parameter.
# Convert from unix
df["date"] = pd.to_datetime(df["date_unix"],
unit = "s")
# Convert to unix
df["date_unix"] = df["date"].view('int64')

Creating a Range of Dates
If you want to create a range of dates, you have two options:
- define the range of dates with a start and an end date
- define the range of dates with a start date, a frequency (e.g., daily, monthly, etc.), and the number of periods.
df["date"] = pd.date_range(start = "2022-01-01",
end = "2022-12-31")
df["date"] = pd.date_range(start = "2022-01-01",
periods = 365,
freq = "D")

Changing the Datetime Format
To change the timestamp format you can use the [.strftime()](https://docs.Python.org/3/library/datetime.html#strftime-strptime-behavior)
method.
# Example: Change "2022-01-01" to "January 1, 2022"
df["date"] = df["date"].dt.strftime("%b %d, %Y")

How to Compose and Decompose a Datetime
A timestamp is made up of many things like, e.g., the date or the time – or even more fine-grained, like the hour or the minute. This section will discuss how to decompose a daytime data type into its components, and also how to compose a datetime data type from different columns containing timestamp components.
Decomposing a Datetime
When you have a date and a timestamp, you can decompose them into their components, as shown below.
# Splitting date and time
df["dates"] = df["date"].dt.date
df["times"] = df["date"].dt.time

You can find also decompose it into smaller components, as shown below. You can find more possible components in the pandas DatetimeIndex documentation.
# Creating datetimeindex features
df["year"] = df["date"].dt.year
df["month"] = df["date"].dt.month
df["day"] = df["date"].dt.day
# etc.

Assembling Multiple Columns to a Datetime
If you want to assemble a date column from its components like the year, month, and day, you can also use the [.to_datetime()](https://pandas.pydata.org/docs/reference/api/pandas.to_datetime.html)
method.
df["date"] = pd.to_datetime(df[["year", "month", "day"]])

How to Fill Missing Values
Filling missing values is challenging whether you are working with numerical, categorical, or time series data. This section will explore three methods to fill in missing values in time series data.
Filling Missing Values with a Constant Value
One approach is to fill missing values with a constant value with the .fillna()
method. Commonly such a constant value could be the mean of the time series or an outlier value like -1 or 999. However, filling missing values with a constant value is often not sufficient.
df["num_feature"] = df["num_feature"].fillna(0)

Filling Missing Values with the Last Value
Another approach is to fill the missing value with the last available value with the .ffill()
method.
df["num_feature"] = df["num_feature"].ffill()

Filling Missing Values with Linearly Interpolated Values
Often a good solution to handle missing values is to linearly interpolate the missing values with the .interpolate()
method.
df["num_feature"] = df["num_feature"].interpolate()

How to Perform Operations on a Time Series
You can perform various operations on time series data, which we will discuss in this section.
Getting the Min and Max
Knowing the time series’ start or end date can be helpful in many cases.
df["date"].min()
df["date"].max()
Differencing
Differencing means taking the difference between two consecutive values in a time series. For this, you can use the .diff()
method.
df["num_feature_diff"] = df["num_feature"].diff()

Cumulating
The opposite of differencing is accumulating values of the time series with the .cumsum()
method.
df["num_feature_cumsum"] = df["num_feature"].cumsum()

Getting the Rolling Mean
Sometimes you need the rolling mean of a time series. You can use the .rolling()
method, which takes a parameter of the number of values to consider in the rolling window. In the example below, we take the mean of three values. Therefore, the first two rows are empty, and the third row is the mean value of the first three rows.
df["num_feature_mean"] = df["num_feature"].rolling(3).mean()

Calculating the Time Difference between Two Timestamps
Sometimes you need to calculate the time difference between two timestamps. E.g., if you might need to calculate the time difference from a specific date.
df["time_since_start"] = df["date"] - df["date"].min()

Or if you want to find out whether the timestamps are distributed equidistantly.
df["timestamp_difference"] = df["date"].diff()

How to Filter Time Series
When working with time series data, you might need to filter it at specific times. To filter the time series data, you must set the date column as the index. Once you have the time stamp index, you can fill it out on a specific date or even on a specific time range.
df = df.set_index(["date"])

Filtering Time Series on Specific Timestamps
When you have the timestamps set as the index of the pandas DataFrame, you can easily filter for specific timestamps with loc
.
df.loc["2020-03-30"]

Filtering Time Series on Time Ranges
Similarly to the above example of filtering on specific timestamps, you can also use loc
for filtering on time ranges when the timestamps are set as the index of the pandas DataFrame.
df.loc["2020-04-10":"2020-04-15"]

How to Resample Time Series
Resampling can provide additional information on the data. There are two types of resampling:
Downsampling
Downsampling is when the frequency of samples is decreased (e.g., seconds to months). You can use the .resample()
method.
upsampled = df.resample("M")["num_feature"].mean()

Upsampling
Upsampling is when the frequency of samples is increased (e.g., months to days). Again, you can use the .resample()
method.
upsampled.resample("D").interpolate(method = "linear")

How to Plot Time Series
This section will discuss how to visualize numerical and categorical time series data with Matplotlib and Seaborn. In addition to the pyplot
module, we will explore different visualization techniques with the dates
module.
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import seaborn as sns
To visualize the timely order of a time series, the x-axis of a plot usually represents the time, and the y-axis represents the value.
Plotting Numerical Data over Time
Most time series data is numerical, e.g., temperature or stock price data. To visualize numerical time series data, you can use line plots.
sns.lineplot(data = df,
x = "date",
y = "num_feature")

Plotting Categorical Data over Time
Sometimes time series data can be categorical, e.g., tracking occurrences of different events.
Before plotting the data, you can label encode the categorical columns, e.g., by using the LabelEncoder or with a simple dictionary, as shown below.
# Label encode the categorical column
enum_dict = {}
for i, cat in enumerate(df.cat_feature.unique()):
enum_dict[cat] = i
df["cat_feature_enum] = df["cat_feature"].replace(enum_dict)

To visualize categorical time series data, you can use scatter plots.
fig, ax = plt.subplots(figsize=(8, 4))
sns.scatterplot(data = df,
x = "date",
y = "cat_feature_enum",
hue = "cat_feature",
marker = '.',
linewidth = 0,
)
ax.set_yticks(np.arange(0, (len(df.cat_feature.unique()) + 1), 1))
ax.set_yticklabels(df.cat_feature.unique())
ax.get_legend().remove() # remove legend - it's not necessary here
plt.show()

You can also try out Matplotlib’s eventplot demo.
Plotting a Timeline
For plotting a timeline, we will use the label encoded categorical values from the previous section and vlines
.
fig, ax = plt.subplots(figsize=(8, 4))
ax.vlines(df["date"], 0, df["cat_feature_enum"])
plt.show()

Setting the X-Axis Limits of a Time Series
When you want to set the x-axis limits of a time series plot, the range has to be of the datetime64[ns]
data type.
E.g., you can use the minimum and maximum timestamps of your time series:
ax.set_xlim([df.date.min(), df.date.max()])
Or you can specify a custom range, as shown below:
ax.set_xlim(np.array(["2020-04-01", "2020-04-30"],
dtype="datetime64"))

Setting the X-Ticks of a Time Series
To improve the readability of your data visualization, you can add major and minor x-ticks at specific intervals (e.g., weekly, monthly, yearly, etc.)
ax.xaxis.set_major_locator(mdates.MonthLocator())
ax.xaxis.set_major_formatter(mdates.DateFormatter("%b %d"));
ax.xaxis.set_minor_locator(mdates.DayLocator())

Conclusion
Getting started with handling time series data can be challenging when you are unfamiliar with the [datetime](https://docs.python.org/3/library/datetime.html#datetime.datetime) data type. As you saw, the datetime data type has many practical in-built methods for easily manipulating time series data. This article discussed everything from manipulating the timestamps and valuable operations of the time series values to visualizing time series data.
Enjoyed This Story?
Here is a collection of my other Time Series Analysis and Forecasting articles:
Subscribe for free to get notified when I publish a new story.