The world’s leading publication for data science, AI, and ML professionals.

Time Series Analysis with Pandas

How to handle and manipulate time series data

Figure source
Figure source

There are many definitions of time series data, all of which indicate the same meaning in a different way. A straightforward definition is that time series data includes data points attached to sequential time stamps. The sources of time series data are periodic measurements or observations. We observe time series data in many industries. Just to give a few examples:

  • Stock prices over time
  • Daily, weekly, monthly sales
  • Periodic measurements in a process
  • Power or gas consumption rates over time

Advancements in machine learning have increased the value of time series data. Companies apply machine learning to time series data to make informed business decisions, do forecasting, compare seasonal or cyclic trends. Large Hadron Collider (LHC) at CERN produces a great amount of time series data with measurements on sub-atomic particles. So, it is everywhere. Handling time series data well is crucial for Data Analysis process in such fields. Pandas was created by Wes Mckinney to provide an efficient and flexible tool to work with financial data. Therefore, it is a very good choice to work on time series data.

Pandas for time series data

Time series data can be in the form of a specific date, time duration, or fixed defined interval.

Timestamp can be the date of a day or a nanosecond in a given day depending on the precision. For example, ‘2020–01–01 14:59:30’ is a second-based timestamp.

Pandas provides flexible and efficient data structures to work with all kinds of time series data. Following is a table to show basic time series data structures and their corresponding index representations:

For any topic, it is fundamental to learn the basics. Rest can be built-up with practice. Let’s explore time series data functionalities of Pandas. As usual, we import the libraries first:

The most basic time series data structure is timestamp which can be created using to_datetime or Timestamp functions:

In real life cases, we almost always work sequential time series data rather than individual dates. Pandas makes it very simple to work with sequential time series data as well. For example, if we pass multiple dates to to_datetime function, it creates a DatetimeIndex which is basically an array of dates.

We don’t have to follow a specific format to input a date. to_datetime converts dates in different formats to a standard format. It may not seem convenient to create a time index by passing a list of individual dates. There are, of course, other ways to create an index of time.

We can create an index of dates using a date and to_timedelta function:

‘2020–02–01’ serves as starting point and to_timedelta creates a sequence with specified time delta. In the above example, ‘D’ is used for ‘day’ but there are many other options available. You can check the whole list here.

We can also use date_range function to create time index from scratch:

  • Using start and end dates
  • Using start or end date and number of periods (default is ‘start’)

Default frequency is ‘day’ but there are many options available.

Note: ‘M’ indicates the last day of month while ‘MS’ stands for ‘month start’.

We can even derive frequencies from default ones:

Pandas also provides period_range and timedelta_range functions to create PeriodIndex and TimedeltaIndex, respectively:


We’ve learned how to create time series data but there are many other operations that Pandas can do with time series data. I will also cover shifting, resampling and rolling time series data.

Shifting Time Series Data

Time series data analysis may require to shift data points to make a comparison. The shift and tshift functions shift data in time.

  • shift: shifts the data
  • tshift: shifts the time index

The difference between shift and tshift is better explained with visualizations. Let’s first create a sample DataFrame with time series data:

Then we can plot original data and shifted data on the same figure to see the difference:

order: original data, shift, tshift
order: original data, shift, tshift

Resampling

Another common operation with time series data is resampling. Depending on the task, we may need to resample data at a higher or lower frequency. Pandas handles both operations very well. Resampling can be done by resample or asfreq methods.

  • asfreq returns the value at the end of the specified interval
  • resample creates groups (or bins) of specified internal and lets you do aggregations on groups

It will be more clear with examples. Let’s first create time series data for year.

asfreq(‘M’) returns the value on the last day of each month. We can confirm by checking the value at the end of January:

resample(‘M’) creates bins of months but we need to apply an aggregate function to get values. Let’s calculate the average monthly values. We can also confirm the result by comparing the average value of January:

Rolling

Rolling is a very useful operation for time series data. Rolling means creating a rolling window with a specified size and perform calculations on the data in this window which, of course, rolls through the data. The figure below explains the concept of rolling.

It is worth noting that the calculation starts when the whole window is in the data. In other words, if the size of the window is three, the first aggregation is done at the third row. Let’s apply rolling with size 3 to the DataFrame we created:


More on Time Series Analysis

If you’d like to learn more about time series analysis, here is a list my related posts:

Conclusion

Predictive Analytics is highly valuable in the data science field and time series data is at the core of many problems that predictive analytics aims to solve. Hence, if you plan to work in the field of predictive analytics, you should definitely learn how to handle time series data.


Thank you for reading. Please let me know if you have any feedback.

My other posts on data manipulation and analysis

References


Related Articles