The world’s leading publication for data science, AI, and ML professionals.

Time Series Analysis From Scratch – Seeing The Big Picture

Part 1 of the Time Series from Scratch Series – Answering the big questions and explaining why time series skills are a must in data…

Time Series from Scratch

Photo by Alvaro Pinot on Unsplash
Photo by Alvaro Pinot on Unsplash

I remember my first real Data Science task on a 9 to 5 job. It was to develop a framework for automated training, optimization, and evaluation of thousands of previously unseen time series. The project was a success in the long run, but I definitely hit every possible bump.

That’s why I’m writing this, so you hit as few bumps as possible, if any. Over the next 25+ articles, you’ll learn almost everything there is to practical time series analysis and forecasting with Python. No prior knowledge is assumed.

Today we’ll go over a high-level overview and answer the big questions:

  • What’s a Time Series?
  • Where is Time Series Analysis Used?
  • What Options Does Python Provide for Time Series Analysis?
  • Where to Learn Time Series Analysis and Forecasting?

Don’t feel like reading? Watch instead:


So, What’s a Time Series?

Almost all companies measure something over time – such as sales, revenue, or anything else. Therefore, time series analysis skills are a must for any data analysts and scientists – even juniors!

To make sense of time series data, it has to be collected over time in the same intervals. For example, measuring visits on your website every day at 3 PM makes sense, but doing the same multiple times one day and forgetting about it the other day doesn’t.

Let’s take a look at an example. The following figure shows light weight vehicle sales in thousands of units:

Image 1 - Light weight vehicle sales in thousands of units (source: FRED)
Image 1 – Light weight vehicle sales in thousands of units (source: FRED)

This an excellent example of a time series. The data is collected for the last 40-something years at monthly intervals. The shaded areas represent US recessions.

But how exactly would you analyze this? To answer this question, you’ll have to understand two fundamental concepts in time series – trend and seasonality.

As the name suggests, trend represents the general movement over time, while seasonality represents changes in behavior in the course of a single season. For example, most monthly sampled data have yearly seasonality, meaning some patterns repeat in certain months every year, regardless of the trend.

Let’s load the dataset in Python to explore these components further. Just to verify, here’s how the default Matplolib visualization of the same dataset looks like:

Image 2 - Light weight vehicle sales plot with Matplotlib (image by author)
Image 2 – Light weight vehicle sales plot with Matplotlib (image by author)

You can decompose any time series into trend, seasonal, and residual components with Python. In the most simpler words, the residual component shows everything not captured by combining trend and seasonality. Here’s how this decomposition plot looks like for our dataset:

Image 3 - Light weight vehicle sales decomposition (image by author)
Image 3 – Light weight vehicle sales decomposition (image by author)

As you can see, this plot consists of four components. The first one shows the original time series, and the other three account for trend, seasonality, and residuals. Don’t worry about the code behind this visualization yet, just focus on the big picture.

There’s a lot more to the topic of decomposition, but we’ll cover it some other time.

Let’s answer the next big question – where are time series used?


Where are Time Series Used?

We already mentioned that most companies collect some sort of data over time. But why? Let’s go over a couple of use cases.

Pattern Analysis

You can’t uncover many patterns and relationships in time series data by looking at raw numbers. Charts are easier to look at, but harder to analyze directly. That’s where the activity of pattern analysis comes into play.

The Stumpy library is the perfect tool for the job – and you’ll learn the ins and outs in this article series. For now, just take a look at the following figure:

Image 4 - Pattern finding with Python and Stumpy (source: Stumpy docs)
Image 4 – Pattern finding with Python and Stumpy (source: Stumpy docs)

Pattern recognition and analysis can make future periods that much easier to forecast.

Anomaly Detection

The folks at Stumpy also provide a way to detect anomalies in time series. In a nutshell, the anomaly is a value that is severely different from anything you’d expect. Detecting anomalies in normally distributed values is as easy as isolating records located a couple of standard deviations from the mean, but is not as easy with time series data.

Here’s what Stumpy can do for you:

Image 5 - Anomaly detection with Python and Stumpy (source: Stumpy docs)
Image 5 – Anomaly detection with Python and Stumpy (source: Stumpy docs)

You’ll learn how to detect anomalies later in the series. For now, just focus on the big picture.

General Forecasting

One of the most obvious reasons to collect time series data is to make future forecasts. Luckily, Python provides more forecasting techniques than you can remember. Acronyms like AR, MA, EXSM, ARMA, ARIMA, SARIMA, SARIMAX, VAR, VARMA, RNN, LSTM, GRU might sound like a foreign language now, but you’ll soon fully understand all of them.

Take a look at the following figure – it shows the last 20 years of historical data and 2 years of forecasts:

Image 6–2 years of light weight vehicle sales forecasts (image by author)
Image 6–2 years of light weight vehicle sales forecasts (image by author)

The algorithm used to generate these predictions is called Tripple Exponential Smoothing (Holt-Winters), but don’t worry about it for now. It is one of the simplest algorithms available, and still produces amazing results.

There are many more time series analysis use cases, but these three should be enough to get you motivated.


What Options Does Python Provide for Time Series Analysis?

Python is an excellent language for time series analysis. Here’s a list of libraries we’ll use through the series with a brief description and use cases:

  • Pandas – A fundamental library for data analysis. It allows you to work with date-time indexes, date ranges efficiently, do transformations like shifting, lagging, aggregating, and much more.
  • Statsmodels – Python library for statistical modeling. It allows you to use statistical models ranging from simple moving average to seasonal and vector autoregression. You can also use it to test for stationarity, among other things.
  • Scikit-Learn – A general Machine Learning library for Python. It doesn’t come with any time series specific algorithms, but any regression algorithms can be used for time series forecasting if a time series is reframed as a supervised machine learning problem.
  • TensorFlow – The most popular deep learning library for Python. We’ll use it to explore how Recurrent Neural Networks (RNN) and their variations (LSTM, GRU) can be used to forecast time series. We’ll also go over some basic use of Convolutional Neural Networks for time series.
  • Prophet – A time series forecasting library from Facebook. It is based on additive models. We’ll also explore its bigger brother – Neural Prophet.
  • Stumpy – A Python library that efficiently computes the matrix profile, which can then be used for pattern and anomaly detection, among other things.
  • PyCaret – A fantastic Python library for automated machine learning, which supports time series as of the most recent release.

It is a lot, but no stone will be left unturned.


Where to Learn Time Series?

Well, right here! There are numerous books and online courses on Time Series Analysis and forecasting, but these generally lack in at least one of the following areas:

  • Not written for programmers – Extensive math knowledge is assumed. Books and courses usually go into too much mathematics way too soon. Don’t get me wrong, this entire series will explain the math behind the algorithms, but there’s no need to derive everything from scratch.
  • Get outdated quickly – It’s difficult for books and courses to stay up to date with the most recent developments. That’s not the case for an article series, because additional articles can be written at any point in time.

With that being said, you should also know that this will be a long series. Expect at least 25 articles, each having at least 1500 words. These will cover everything useful you would learn in a college-level class, but also insights and tools used in practice.

My aim is to release 1–2 articles per week, depending on the complexity of the topic, so the entire series should be out in a couple of months. You’ll find links to every article below, as soon as they are published.

I hope you’re as excited as I am. Stay tuned, and you’ll learn how to solve any time series task without issues.


Loved the article? Become a Medium member to continue learning without limits. I’ll receive a portion of your membership fee if you use the following link, with no extra cost to you.

Join Medium with my referral link – Dario Radečić


Learn More


Stay Connected


Related Articles