Adjusting Prices for Inflation in Python with Pandas Merge

Jake Huneycutt
Towards Data Science
6 min readMay 15, 2018

--

For several years, I managed a small, start-up investment fund. While Wall Street analysts typically look back only a year or two when analyzing companies, I’d often dig back 20 or 30 years. I’d even occasionally scour through economic and price data going back to the 1950’s, ‘20s, or in rare cases 19th Century, to get a better understanding of market cycles. When you do this type of deep long-term historical analysis, however, one issue you will run into: nominal prices can be very deceiving.

Take for instance, the price of grains. We have Producer Price Index values for grains running back to January 1926 when the index was at 37.9 during the 1920’s boom. In April 2018, that same index was at 151.5. Hence, prices increased 4-fold for grains in the during that 92 year time frame. If we were to measure it from the bottom of the Great Depression in 1933 (when the index was at 11.0), prices have increased 15-fold!

This is a very misleading picture and it’s difficult to know this simply from looking at the chart above. Once we adjust for inflation, we get the radically different looking chart below.

Not only have grain prices not been rising over time, they’ve actually been falling in real terms. Adjusted for CPI inflation, grain prices today are 71% lower than they were in 1926. Today’s prices are even 24% below the Great Depression lows.

Grains have become more affordable over time and falling real agriculture prices have alleviated poverty in much of the world. You may have never gained that insight by simply looking at the nominal prices in the top chart. You have to adjust for inflation to gain that perspective.

Inflation in the US

Inflation has cycled dramatically in the US over the past century. In World War II, inflation shot-up to 10.5%. By 1953, inflation had fallen all the way to 0.8%. In 1980, inflation was sky-high at 13.5% before the Federal Reserve cracked down on it. In 2005, official CPI inflation was at 3.4%, but housing prices in the 20 largest US metro areas rose 14%! In 2017, inflation was relatively tame at 2.0% (but it’s starting to push back up again as labor and input costs continue to rise).

Inflation-adjustment gives you a much better understanding of real prices. This analysis can be very insightful. Indeed, one of my most successful investment theses revolved around examining inflation-adjusted copper-prices during the Japanese Asset Bubble and making comparisons to the late 00’s / early 10’s copper price bubble.

Nevertheless, inflation adjustment might seem like a tricky issue to handle in Python if you’re working in a Pandas DataFrame. You may have thousands of rows of “data points” with dates scattered all across the spectrum in your data frame. Whereas, inflation data tends to be in the form of sequential monthly data. How do you bridge the two?

Finding Your Sources

The first thing to talk about is sources. What data do you use to adjust for inflation? Where do you find inflation data?

There’s not always one “right” answer, but if you live in the US or are dealing with American pricing data, the Federal Reserve Economic Data (commonly known by the acronym “FRED”) should be one of your go-to sources. There are dozens of inflation measures out there, but the one I most frequently use is called “CPIAUCNS”. This inflation data set goes all the way back to 1913, which is why it tends to be my favorite for long-term analysis.

There are many other data sets, as well. If you’re looking at housing price data in Minneapolis, for example, you might prefer to look at a housing price index, such as the All-Transactions House Price Index or the Case-Shiller Minneapolis Home Price Index. Normally, however, some variant of CPI is what you’ll use as your index.

Set Your Index

CPI inflation and housing price indices are already indexed in a certain manner, but you’ll likely need to re-index the data to suit your own purposes. For instance, if your inflation data is indexed to 100 in January 1982, while your data set deals with data running from 2004 to 2017, you’ll likely either want to index your data based on either 2004 prices (beginning) or 2017 prices (end).

We can do this in Excel or Python. Admittedly, I find it’s normally easier to adjust the data in Excel, but if we had a particularly large data set, Python might be a superior option.

In Python, we could do this by loading our inflation data in and creating a new column for the ‘Index Multiplier’. Then we’d simply multiply the nominal prices by the ‘Index Multiplier.’

# load inflation data
inflation = pd.read_excel(‘inflation_data.xls’, sheetname=’Data’)
# create index multiplier
inflation[‘CPI_Multiplier’] = inflation[‘CPI’].iloc[-1] / inflation[‘CPI’]

Note in this example, I’m using the last data value to create the index, so that prices will be indexed to today’s prices. If we used the first data point to index based on the beginning start point, we’d select the first value instead of the last.

Now we have an index to adjust our prices. We need to merge the two data sets to perform the operation.

Matching Up Dates

Before we can merge, however, we have to deal with the trickiest aspect of this exercise: dates. Dates can be a major pain in programming. There are several ways to handle them.

In a recent data set, I found months and years in separate columns, such as in the example below.

Month Year
01 2008
04 2012
09 2016

Meanwhile, the dates on my inflation data were recognized in yyyy/mm/dd format, that looked like this:

Date          CPI_Multiplier
2008/01/01 1.000
2008/02/01 1.003
2008/03/01 1.005
2008/04/01 1.011

How to align the two?

There are numerous answers, but I created a ‘day’ column and assigned it a value of 1 (to match the dates in the inflation data) and used pd.to_datetime to create a ‘Date’ column in the original data set.

# load data set
df = pd.read_excel('training_data.xlsx')
# create new day column for 'day'
df['day'] = 1
#create new dataframe to store date info
new_df = df[['year', 'month', 'day']]
# create new column 'Date' in original data w/ datetime conversion
df['Date'] = pd.to_datetime(new_df)

In Pandas, pd.to_datetime() is a powerful tool, but you do have to get your data just-right to make it work.

Merging the Data

Finally, we load the inflation data and use pd.merge to merge the two data frames. The ‘how’ argument specifies the type of join. In this instance, I set ‘df’ (my main data set) as the 1st argument and ‘inflation’ (the inflation data) as the second and used a ‘left’ join to merge on ‘Date’.

# merge dataframe
df = pd.merge(df, inflation, how='left', on='Date')

Now, with the merged dataframe, it’s easy to create an index.

df[‘CPIAdjPrice’] = df[‘SalePrice’] * df[‘CPI_Multiplier’] 

Viola! We can now see prices in real terms!

Conclusions

On long-term data series, inflation adjustments can be vital in understanding your data. By using nominal prices, it can also make prediction trickier.

Inflation adjustment can be useful for looking at things such as commodity prices or retail prices, but also can be used with housing price indices to adjust housing prices in volatile markets. This way, we can strip out ‘market volatility’ and ‘inflation’ as part of our prediction models.

--

--