Data Science in the Real World

How to Model Gold Price

Using domain knowledge and supervised learning model to understand and predict the gold price

Alex Kim
Towards Data Science
8 min readApr 27, 2020

--

In this article, I would like to present to you a very simple but powerful gold price model by covering the following items sequentially:

  • historical and current uses of gold;
  • factors that theoretically impact gold prices;
  • a regression model used to predict gold price;
  • an application of the regression model to make investment decisions.

For the busy readers, the article also provides TL;DR in the appendix, along with a disclaimer, condensed Jupyter notebook, and source list.

Photo by Holger Link on Unsplash

1. Why do we need gold?

Around 3600 BC, gold was first smelted in ancient Egypt. Thirty centuries later, the first gold coin was struck in an ancient kingdom in western Turkey and had ever since been adopted as a form of physical money by powerful kingdoms.

In the modern era, gold has evolved from physical money to a currency reserve. In 1819, the UK formally adopted the first-ever gold standard by pegging its currency to gold. By 1900, most countries aside from China adopted the gold standard. However, when World War I began, the gold standard eventually disappeared as many countries needed to print money to pay for the war.

Today, gold is used for a variety of purposes in our lives such as coinage, jewellery, electronics and dental treatments. However, the impact of the monetary policies and financial demands can have a more substantial impact on the price of gold.

2. What factors can affect the gold price?

In theory, there is a lot of fundamental, macro, and sentimental factors that can impact the price of gold. On the fundamental side, there are demands from the central bank, jewellery buyers, and ETFs for physical gold whereas, in the macro space, there are inflation, interest rate, money supply and dollar strength. Also, some traders track CFTC net position data to gauge the sentiments of speculators.

How can each factor theoretically affect the gold price?

The impact of the fundamental factors is the easiest to understand. The more gold governments and people purchase, the less gold there are in the market and the price should climb up.

In the macro space, the inflation (deflation) rate is a measure of the price increase (decrease) of a basket of goods and services. If the inflation rate rises, the price of our lunch and gold will rise as well. Conversely, if an interest rate rises, gold as an investment vehicle becomes unattractive relative to treasury bonds. Just like Buffett said, gold is a hen that does not lay any eggs.

“If you own one ounce of gold for an eternity, you still own one ounce of gold at its ends.” — Warren Buffett

One of the most important factors is money supply, which measures the total money available in an economy. Let’s cover two examples to explain why the increase in the money supply will increase the gold price.

  1. Imagine a small economy of you and your friends, who agree to use your monopoly money to buy and sell gold from each other. If you start to print the monopoly money in your garage and start buying all the gold from your friends, the price of gold relative to your monopoly will climb up. In other words, if the Fed starts to print tons of US dollar, the price of gold relative to the US dollar can climb up.
  2. Again imagine a small economy of you and your friends, who use the US dollar to trade. If you print money and give everybody free money, the price of goods and services will increase as the free money will be used to buy extra goods and services. The increase in the money supply ultimately circled back to the rise in the inflation rate so the price of gold can go up.

Last but not least, the strength of the US dollar can impact the price of gold and any other commodities denominated in US dollar. If the dollar becomes weak, other countries can buy more dollars and then more gold, driving up the gold price, and other commodities as long as they are quoted in dollars.

3. How can we model the gold price?

To model the gold price, we first gather the input data and apply data transformations. With the transformed data, we use a linear regression model to explain the relationship between the predictors and the gold price. To validate the model, an out-of-sample backtest is conducted and the R² value will be calculated to measure the performance of the model.

3.1 Data Gathering

To prepare for analysis and model development, the following data since 1981 are collected and cleaned:

  • XAUUSD: gold spot price is denominated in the US dollar.
  • US CPI: the index tracks changes in the price for goods and services paid by urban consumers (i.e. inflation rates).
  • US M2: the money supply includes cash, checking deposits, and easily convertible money.
  • US GDP: the featured metric measures the size of the US economic output.
  • Dollar index: an index that tracks the value of the US dollar relative to a basket of foreign currencies.
Input data and sources

One notable challenge with the data is that these factors are observed in different frequencies. To align the frequency, the values are grouped by quarters and then averaged.

3.2. Data Transformation & Feature Engineering

3.2.1. Target Variable

The target variable in this model is the gold spot price adjusted for inflation. To adjust the gold spot price by inflation, we deflate the gold spot time series by US CPI. Going forward, this inflation-adjusted gold time series will be referred to as “gold spot price,” “gold price,” or “XAUUSD.”

3.2.2. Predictors

There are two predictors used for this regression model: the money supply to GDP (M2/GDP) ratio and the dollar index. While the dollar index can remain as it is, we need to derive the ratio by dividing US M2 by US GDP. The money supply to GDP ratio is preferred to the money supply because it is a measure of excess money supply in the economy.

3.2.3. Log-Transformation

After the two transformations, we apply log-transformation to the target variable and the predictors since all the values are positive and exhibit high positive skewness. This will help any linear model find patterns more easily.

Correlation matrix suggests a strong relationship between the predictors and the target variable.

Using the transformed variables, we plot a correlation matrix to understand the linear relationship between the gold price and the predictors. As shown, the gold spot price is highly correlated with the money supply to GDP ratio and has some but negative correlation with the dollar index.

Gold Price vs. Money Supply to GDP Ratio: Positive Relationship

More importantly, the two predictors are almost uncorrelated with a correlation measure of -0.08. This suggests that even though the correlation for the dollar index is smaller than that of the money supply to GDP ratio, the dollar can still be useful as it may add non-overlapping information.

Gold Price vs. Dollar Index: Negative Relationship

3.3. Model Training & Backtest Performance

Now using the predictors and the target variable, we fit the gold price using the two predictors with a linear regression equation.

Gold price linear regression model

Note that adding L1 and L2 regularization parameters to reduce overfitting is unlikely to add values in this model due to the small number of predictors the small correlation between the predictors. Regardless, other algorithms including random forest and XGBoost can also be trained.

Out-of-sample backtest results show there is no need for complicated models.

To evaluate this model, an out-of-sample backtest is conducted by repeatedly training the model using a sliding window of 100 quarters and predicting the average gold price in the next quarter. Ultimately, the predicted values are compared to the actual values to calculate the R², which comes to about 92%.

R² is the coefficient of determination, which represents the percentage of the total variation that can be explained by the model. Hence, this simple model can explain a whopping 92% of the total variation in gold prices.

4. How can we use this model?

Using this model, we can predict the gold price by translating our views on the money supply to GDP, the dollar strength, and the inflation rate into the price impact on gold.

In my opinion, the money supply to GDP ratio will increase in the short-term and long-term as governments will continue to print money to offset the effect of global lockdown and concerns in credit markets respectively.

However, the dollar index might see some strength caused by safe-haven demand. When a sell-off happens in the market, the need for safe-haven assets such as the US dollar increases, strengthening the US dollar.

Combining these, I expect the price of gold to remain quite volatile in the short-term as these two predictors can offset each other. However, in the long-term, I believe the long-awaited bull market run will return.

Appendix

TL;DR

Amid pandemic and currency war, gold can be a perfect hedge for unexpected inflation, extreme currency devaluation, and sluggish economic growth. The data analysis suggests that the gold price is mainly driven by:

  • the inflation rate
  • the money supply relative to the size of the economy
  • the strength and weakness of the denominated currency

Other fundamental and sentimental factors that can affect the gold price are:

  • interest rates
  • the central bank demand
  • the gold ETF demand
  • the jewellery demand
  • CFTC positions

Disclaimer

The sole purpose of this article is to educate the readers by expressing the author’s personal views. The content only reflects the view of the author at the time of the writing and does not constitute any financial advice, nor reflect any views of the author’s affiliated organizations.

Source Code

Sources

--

--