The world’s leading publication for data science, AI, and ML professionals.

Estimation, Prediction and Forecasting

Estimation vs prediction. Prediction vs Forecast. Read along for astute distinction

Photo by Markus Winkler on Unsplash
Photo by Markus Winkler on Unsplash

Estimation implies finding the optimal parameter using historical data whereas prediction uses the data to compute the random value of the unseen data.

The highlighted words in the above statement need some context setting before we proceed further:

We need lot of historical data to learn dependencies for machine learning and modelling. The data typically involves multiple observations, where each observation consists of multiple variables. This multivariate observation x belongs to random variable X whose distribution lies in the realm of a finite set of possible distributions called as ‘the states of nature’.

Estimation is the process of optimizing the true state of nature. Loosely speaking, estimation is related to model building i.e. finding the most appropriate parameter that best describes the multivariate distribution of historical data, for e.g. if we have five independent variables, X1, X2….X5 and Y as the target variable. Then, estimation involves the process of finding f(x) which is the closest approximation of the true state of nature denoted by g(θ).

Parameter estimation on training data
Parameter estimation on training data

Whereas, prediction leverages the already built model to compute the out of sample values. It is a process of calculating the value of another random variable Z whose distribution is related to the true state of the nature (this property plays a pivotal role in any machine learning algorithm). Predictions are considered good when they agree over all the possible values of Z, on an average.

Prediction on unseen data
Prediction on unseen data

There are multiple ways to interpret the difference between the two, let’s also explore the Bayesian intuition:

Estimation is after the occurrence of the event i.e. posterior probability. Prediction is a kind of estimation before the occurrence of the event i.e. apriori probability.

Let’s summarize our understanding on estimation and prediction: To make predictions on unseen data, we fit a model on training dataset that learns an estimator f(x), which is used to make predictions on new data.

Now, that we understand what the prediction is, let’s see how it is different from Forecasting.

Forecasting problems are a subset of prediction problems wherein both use the historical data and talk about the future events. The only difference between forecasting and prediction is the explicit addition of temporal dimension in forecasting.

Forecast is a time-based prediction i.e. it is more appropriate while dealing with time series data. Prediction, on the other hand, need not be time based only, it can be based on multiple causal factors that influence the target variable.

I stumbled across a very fresh perspective of explaining the difference between the prediction and forecast using the analogy of the origin of the words themselves.

I will brief on this innovative illustration in this post, but you can read more about it at the original post here.

Forecast is more process-oriented and follows a certain methodology of doing something. In a way, it assumes that the past behavior is a good enough indicator of what is going to happen in the future.

Prediction considers all historical processes, influencing variables and interactions to reveal the future.

In summary, all forecasts are predictions but not all predictions are forecasts.

Hope you now have clarity on the difference between estimation and prediction. The post also highlights the distinction between prediction vs forecast.

Happy Reading!!!

References: https://stats.stackexchange.com/questions/17773/what-is-the-difference-between-estimation-and-prediction/17789#17789


Related Articles