The world’s leading publication for data science, AI, and ML professionals.

Resample function of Pandas

Use of resample function of pandas in time series data

Photo by Nathan Dumlao on Unsplash
Photo by Nathan Dumlao on Unsplash

Resampling is used in time series data. This is a convenience method for frequency conversion and resampling of time series data. Although it works on the condition that objects must have a datetime-like index for example, DatetimeIndex, PeriodIndex, or TimedeltaIndex. In simpler words, if one wants to arrange the time series data in patterns like monthly, weekly, daily, etc., this function is very useful. This function is available in Pandas library. For the demonstration purpose, UCI dataset is used, i.e., https://archive.ics.uci.edu/ml/datasets/Parking+Birmingham.

Reading Data

In time series data, date variables’ data type is objects when we read data from a .csv file. Therefore to read the date column in datetime format, we use parse_dates argument. In the study data, LastUpdated is the date variable and parse_dates=["LastUpdated"] argument reading the date format properly, whereas when parse_dates argument doesn’t use "LastUpdated" variable type is object.

Image by Author
Image by Author

DateTimeIndex

As resample function uses DatetimeIndex, PeriodIndex, or TimedeltaIndex, therefore, now we need to change variable "LastUpdated" into datetimeindex as follows:

Image by Author
Image by Author

Resampling

Resampling is for frequency conversion and resampling of time series. So, if one needs to change the data instead of daily to monthly or weekly etc. or vice versa. For this, we have resample option in pandas library[2]. In the resampling function, if we need to change the date to datetimeindex there is also an option of parameter "on" but the column must be datetime-like.

Image by Author
Image by Author

Below from resampling with option "D", the data got changed into daily data, i.e., all the dates will be taken into account. 375717 records downsampled to 77 records.

Image by Author
Image by Author

Other Rule Options

The most used options for rule (representing target conversion) are as below and other options can also be found in the reference [1]:

Image by Author
Image by Author

A resample option is used for two options, i.e., upsampling and downsampling.

Upsampling: In this, we resample to the shorter time frame, for example monthly data to weekly/biweekly/daily etc. Because of this, many bins are created with NaN values and to fill these there are different methods that can be used as pad method and bfill method. For example, changing weekly data to daily data and using bfill method following results are obtained, so bfill filling backward the new missing values in the resampled data:

Image by Author
Image by Author

Other method is pad method, it forward fills the values as below:

Image by Author
Image by Author

We can also use asfreq() or fillna() methods in upsamling.

Downsampling: In this we resample to the wider time frame, for example resample daily data to weekly/biweekly/monthly etc. For this we have options like sum(), mean(), max() etc. For example, daily data got resampled to month start data and mean function is used as below:

Image by Author
Image by Author

Graphical representation of Resampling

After resampling data by four different rules, i.e., hourly, daily, weekly, and monthly, following graphs are obtained. We can clearly see the difference in shorter vs wider time frames. In the hourly plot, more noise is there and it is decreasing from daily to weekly to monthly. As per study objective, we can decide which option for rule would be best.

Image by Author
Image by Author

Thanks!

References:

  1. https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html
  2. https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.resample.html

Related Articles