The world’s leading publication for data science, AI, and ML professionals.

How to Merge “Not Matching” Time Series with Pandas

How to use merge_asof function

Photo by Djim Loic on Unsplash
Photo by Djim Loic on Unsplash

There are many definitions of time series data, all of which indicate the same meaning in a different way. A straightforward definition is that time series data includes data points attached to sequential time stamps.

Pandas was created by Wes Mckinney to provide an efficient and flexible tool to work with financial data for which time is crucially important. Therefore, Pandas is a very good choice to work on time series data.

Financial data usually includes measurements taken at very short time periods (e.g. at the level of seconds). Therefore, when we merge two dataframes consist of time series data, we may encounter measurements off by a second or two. For such cases, Pandas provide a "smart" way of merging done by merge_asof.

Assume we are merging dataframes A and B. If a row in the left dataframe (A) does not have a matching row in the right dataframe (B), merge_asof allows to take a row whose value is close to the value in left dataframe (A).

Left and right is defined based on the order of parameters in the function. pd.merge_asof (A, B, … ) means A is the left one.

There are some limitations on being "close". The comparison is done based on the columns specified by "on" parameter and the value in right dataframe (B) must be less than the value in left dataframe (A). To make a simpler explanation, merge_asof means:

If there is no match, take the previous one.

It will make more sense as we go through some examples.

Sample A and B dataframes are as below:

We want to merge these dataframes on "time" by "name" column but some time values are not matching. For example, in the second row, time in A is one second behind the time in B. Assume we can tolerate 2 seconds latency.

In this case, merge_asof can be used:

  • 1st row: Time matches so the values are taken with respect to row index.
  • 2nd row: Time on left dataframe (A) is 00:00:03 but right dataframe (B) does not have a value at this second. So merge_asof looks at the previous steps and if there is a value within the specified time interval which is 2 seconds, it takes the last one and put in the merged dataframe. There is only one previous value at time 00:00:00 and it is 3 seconds behind so not in the specified time inverval. Therefore, values_b column is filled with NaN for this time.
  • 3rd row: Time on left dataframe (A) is 00:00:06. Right dataframe (B) does not have a value at this second but it has a value at 00:00:04 which is in the specified time internal so merge_asof takes the value and put in the merged dataframe.
  • 4th row: This row is important so please pay close attention. Time on left dataframe (A) is 00:00:09. Right dataframe (B) does not have a value on this time but it has a value on 00:00:08 which is in the specified time interval. However, the "name" column must also match because we merged by "name" column. Since the name is different, merge_asof does not take this value so values_b column is filled with NaN for this row.
  • 5th row: Both dataframes have a value for time 00:00:12 and name FFF.

If we do not specify the time interval, merge_asof looks for the previous one regardless of time difference.

When time interval is 2 seconds, values_b is NaN. If there is no specified time interval, it is filled with the previous value which is 5.

Merge_asof also provides an option to exclude exact matches. If allow_exact_matches is set as False, merge_asof only looks for previous values at right dataframe and the values in matching rows will not be put in the merged dataframe:

The number of ways to measure the time is limitless because we can be as specific as nanoseconds. Depending on the task, we can select an optimal frequency for our measurements. Merge_asof allows us to have some room when matching time series or ordered data.


Thanks for reading. Please let me know if you have any feedback.


Related Articles