Reconstruct Google Trends Daily Data for Extended Period

Comparing methods to get beyond the 9-month query limit of daily trends data.

Qingzong TSENG
Towards Data Science

--

iPhone search trends, AAPL stock price and Apple key events
  • We can have search trends data at daily resolution for any duration.
  • Scaling daily data by weekly trends could generated some artifacts.
  • Scaling using overlapped period is better as long as there are enough search activities during the overlaping period.
  • Code can be find here.

As google gained monopoly over the internet search, whatever we googled become another kind of measure of public interest over time. It has already been well explained by the Google News Lab and several articles have demonstrated data analytics based on the Google Trends, such as the cyclic pattern of ending a relationship with someone, predicting U.S. presidential race, etc.

The Limitations

To get most out of the search trends database, one can use the python module pytrends or R package gtrendsR. However, google currently limit the time resolution based on the query’s time frame. For example, query for the last 7 days will have hourly search trends (the so-called real time data), daily data is only provided for query period shorter than 9 months and up to 36 hours before your search (as explained by Google Trends FAQ), weekly data is provided for query between 9 month and 5 years, and any query longer than 5 years will only return monthly data.

My motivation into this subject was first inspired by the Rossmann competition in Kaggle where google search trends were used to predict sales number. I found it not so obvious to obtain the daily search trends and people used the weekly trends as surrogate. However, it is not ideal for any predictive model which necessitate precision at daily scale and real-time applications (as weekly data will only be available until the current week ends).

Comparing Methods for Reconstructing Daily Trends

Although people have already purposed methods to circumvent it (for example: here, and here). I was curious about how those methods compared to each other and matched best with the original daily trends data. Therefore, I used ‘iPhone’ as the keyword and reconstruct daily trends data over 35 months by the following three methods:

  1. The daily data concatenated from multiple 1-month queries and normalized by corresponding weekly trends data. (the dailydata function implemented in pytrends).
  2. Query for multiple 9-month period with significant overlapping periods and use the overlapped period to have consistent scaling (similar to what is purposed here).
  3. The daily data simply interpolated from the weekly data. (for reference)
Comparing daily trend data obtained using the overlapping method(blue), dailydata method from pytrends(orange), interpolation from weekly data(green), and the overlapped period used for scaling (red)

At first glance, we can notice that they gave quite different results. The dailydata function from pytrends matched quite well with the weekly data, and its baseline values (the period between important peaks) are higher than the those obtained from the overlapping method. At this moment, it’s hard to tell which one is better, nonetheless, the dailydata (green line) have some significant dips around the major peaks, such as the lowest value around 2017 Sep.

To verify which reconstructed daily trends matches better with the original data, daily data of shorter period (<9 months) was fetched directly for comparison.

Comparing daily trends data normalized differently with the original daily trends (pink)

Now it is clear that the overlapping method (blue line) matched best with the original daily data (pink line), and the dips of the orange line are artifacts resulted from the scaling by weekly data.

Brexit, Google Trends, and the British Pound

To further demonstrate the potential benefits and applications of this daily search data over extended period, ‘brexit’ was used as the search term for the period between 2018 Oct and 2019 Nov. The reconstructed daily data was plotted together with the default weekly data (since the query period is longer than 9 months) for comparison.

Compare the daily trend reconstructed by the overlapping method (blue) with the original weekly trend (red).

With the reconstructed daily trends, we could match precisely the surge of the search trend with each major event over this 14 months period. It also enabled us to compare the relative search volume at each distinct event date, whereas the weekly trends lost the resolution and relative search volume of adjacent events are averaged out (or pooled together).

Out of curiosity, I tried to overlay the British pound exchange rate on the daily trends plot. It is not surprising to see the pound plunged very often at the same day when the ‘Brexit’ search peaked.

GBP/USD exchange rate and ‘Brexit’ search trends

Probably Google will took off such limit and provide a more convenient API in the future, but before then, stitching and scaling with overlapped queries is probably the best way to have daily search trends over extended period.

*The code for reconstructing the daily trends and generating the plots of this article can be found on github.

--

--