The world’s leading publication for data science, AI, and ML professionals.

IID: Meaning and Interpretation for Beginners

Independent and Identically Distributed

Photo by Yu Kato on Unsplash
Photo by Yu Kato on Unsplash

In statistics, Data Analysis, and machine learning topics, the concept of IID frequently appears as a fundamental assumption or condition. It stands for "independent and identically distributed". An IID random variable or sequence is an important component of a statistical or machine models, also playing a role in time series analysis.

In this post, in an intuitive way, I explain the concept of IID in three different contexts: sampling, modelling, and predictability. An application with R code is presented in the context of time series analysis and predictability.


IID in Sampling

The notation X ~ IID(μ,σ²) represents sampling of (X1, …, Xn) in a purely random way from the population with __ the mean μ and variance σ². That is,

  • each successive realization of X is independent, showing no association with the previous one or with the one after; and
  • each successive realization of X is obtained from the same distribution with identical mean and variance.

Examples

Suppose a sample (X1, …, Xn) is collected from the distribution of annual incomes of individuals of a country.

  1. A researcher has selected income of a male for X1, a female for X2, a male of X3, then a female for X4, and this pattern is kept to Xn. This is not an IID sampling, because a predictable or systematic pattern in sampling is non-random, in violation of the condition of independence.
  2. A researcher has selected (X1, … X500) from the poorest group of individuals and then (X501, … X1000) from the richest group. This is not an IID sampling, because the two groups have different income distributions with different means and variances, in violation of the condition of identicality.

IID in Modelling

Suppose Y is the variable of interest you want to model or explain. Then, it can be decomposed into two parts: namely,

Y = Systematic Component + Unsystematic Component.

The systematic component is the part of Y driven by the fundamental relationship with other factors. It is the component that can be explained or expected from theories, common sense, or stylized facts. It is the fundamental part of Y that is associated with substantive and practical importance.

The unsystematic component is the part of Y that is not driven by the fundamentals, which cannot be explained nor expected by theories, reasoning, or stylized facts. It captures variations of Y that cannot be explained by its systematic component. It should be purely random and idiosyncratic, without any systematic or predictable pattern. It is referred to as an error term in a statistical model, which is often represented as an IID random variable.

For example, consider a linear regression model of the following form:

Here, α + βX in (1) is the systematic component and the error term u in (1) is the unsystematic component.

If the value of β is close 0 or practically negligible, then the variable X has a low explanatory power (measured by R²) for Y, indicating that it is cannot satisfactorily explain the fundamental variation of Y.

The error term u is assumed to be an IID random variable with zero mean and fixed variance, denoted as u ~ IID(0, σ²), which is purely random representing the unsystematic or unexpected variation in Y.

If u is not purely random and has a noticeable pattern, then the systematic component may not correctly specified because it is missing something substantive or fundamental.

Example: Autocorrelation

Suppose that the error term has a following pattern:

This is a linear dependence (or autocorrelation), which is a systematic pattern. This predictable pattern should be incorporated into the model part, which will in turn better explain the systematic component of Y. One way of achieving this is to include a lagged term of Y in (2). That is,

The lag of Yt included in (3) is able to capture the autocorrelation of error term in (2), so that the error term e in (3) is an IID.

Example: Heteroskedasticity

Suppose that the error term shows the following systematic pattern:

This pattern of error term is called heteroskedasticity where the variability of error term changes as a function of X variable. For example, suppose Y is food expenditure and X is disposable income for individuals. The equation (4) means that high-income earners show a higher variability in food expenditure.

This is a predictable pattern, and the error term with the property of (4) violates the assumption of IID, because the variance of the error term is not a constant. To incorporate this pattern into the systematic component, the generalized or weighted least-squares estimation can be conducted in the following way:

The equation (5) is a regression with transformed variables, which can be written as

where

The above transformations of Y and X provide the transformed error term (ut* ) in (6), which is an IID and no longer heteroskedastic. That is,

This means that a systematic pattern in the error term is now effectively incorporated into the systematic component by the above transformation.

Image Created by the Author
Image Created by the Author

The above plots present the effect of the transformation in an intuitive way. Before the transformation (plot on the left), the variable Y shows an increasing variability as a function of X, which is a reflection of the heteroskedasticity. The transformation effectively incorporates the heteroskedastic pattern into the systematic component of Y, and the transformed error term is now an IID random variable, as the right-hand side plot shows.

Many of the model diagnostic tests in regression or Machine Learning models are designed to check if the error term follows an IID random variable, using the residuals from the estimated model. This is also called the residual analysis. Through the residual analysis and diagnostic checks, specification of the systematic component of the model can be improved.


IID and predictability

Being purely random, an IID sequence shows no predictable pattern at all. That is, its past history provides no information about the future course of the sequence.

Example: Autoregressive model

Consider an autoregressive model of order 1, denoted AR(1),

where ut ~ IID(0,σ²) and -1 < ρ < 1 (ρ ≠ 0).

If ρ = 0, the time series Yt is an IID and non-predictable, since it shows no dependence on its own past, driven only by unpredictable shocks.

For simplicity, let us assume that Y0 = 0 and ρ ≠ 0 and conduct the following continual substitution:

Y1 = u1;

Y2 = ρY1 + u2 = ρu1 + u2;

Y3 = ρY2 + u3 = ρ²u1 + ρ u2 + u3;

Y4 = ρY3 + u4 = ρ³u1 + ρ²u2 + ρu3 + u4;

with the general expression being

The equation (6) shows that a time series (such as an autoregression) can be expressed as a moving-average of the past and current IID errors (or shocks), with exponentially declining weights.

Note that that the distant shocks such as u1 and u2 in (8) have little impact on Yt, because their weights are negligible. For example, when ρ = 0.5 and t = 100, _ρ_⁹⁹ and _ρ_⁹⁸ are practically 0. Only the current or recent shock such as u100, u99, and u98 may matter practically.

Hence, if a researcher at time t has a good estimate of ρ (from data) and observed the current and recent shocks such as ut, ut-1, ut-2, and ut-3, she or he may be able to predict the value of Yt+1, with a reasonable accuracy, by projecting the moving-average in (8) into the future.

Example: Random walk

When ρ = 1, the time series in (7) become a random walk where the current change of Y is an IID shock that is purely unpredictable: i.e.,

In this case, from (8) with ρ = 1, we have

In other words, a random walk is sum of all past and current IID shocks with an equal weight of 1. As a result, distant shocks are equally important as the recent and current shocks. For example, if t = 100, the shock u1 has the same impact on Y100 as u100.

As a sum of all past and current shocks, a random walk time series is purely unpredictable. It also shows a high degree of uncertainty and persistence (dependence on past), with the analytical results that

This means that the variability of a random walk increases with time, indicative of high degree of uncertainty and low degree of predictability over time.

In addition the correlation between Yt and Yt-k are almost equal to 1, for nearly all values of k. For example, Y100 and Y99 are correlated with the correlation coefficient of 99/100 = 0.99, when t = 100.


Application

As an application, basic descriptive properties of an IID process, AR(1) time series of ρ ∈ {0.3, 0.6, 0.9}, and a random walk are compared using time plots and autocorrelation functions.

Time plots

Time Plots: Image Created by the Author
Time Plots: Image Created by the Author
  • An IID series Y1, as an AR(1) time series with ρ = 0, shows no pattern at all, randomly and frequently fluctuating around the mean of 0. It has a strong tendency to revert back to the mean.
  • For Y2 to Y4, as the value of ρ increases from 0.3 to 0.9, the time series gets smoother and less frequent, reflecting an increasing degree of dependence on its own past. The degree of mean-reversion also declines as the value of ρ gets higher.
  • A random walk Y5 shows a trend which can change its direction randomly (called stochastic trend). It shows an increasing variability over time, as shown in the first result in (9), with a little tendency to revert back to its mean over time (mean-aversion).

Autocorrelation Functions

Autocorrelation functions (Image provided by the Author)
Autocorrelation functions (Image provided by the Author)

The autocorrelation function of a time series plots Corr(Yt,Yt-k) against the lag value of k. It provides a visual summary of the dependence of structure of a time series. For example, Corr(Yt,Yt-1) measures how much the values of Y 1-period apart are correlated. The blue band is 95% confidence band, and a value of autocorrelation inside this band means that the correlation is statistically no different from 0 at the 5% level of significance.

  • An IID time series Y1 has all of the autocorrelation values practically negligible and statistically 0.
  • As the value of ρ increases from 0.3 to 0.9, the degree of dependence Y on its own past increases, as more autocorrelation values get practically larger and statistically different from 0.
  • A random walk time series Y5 has all autocorrelation values extremely close to 1, indicative of extremely high degree of dependence on its own past (persistence). This is a reflection of the second property given in (9).

This application presents the basic statistical properties of an IID time series, in comparison with those of AR(1) and random walk. It illustrates that the degree of dependence on past (or predictability) changes as the value of AR(1) coefficient changes from 0 to 1, i.e., from an IID time series to a random walk. As explained above, a time series is predictable when the degree of dependence is moderately strong, characterized by the value of ρ greater than 0 but less than 1.

R code

The time series and plots are generated using the following R code:

set.seed(1234)

n=500  # Sample size
# IID
Y1 = rnorm(n)    
# AR(1) with rho = 0.3, 0.6, and 0.9
Y2 = arima.sim(list(order=c(1,0,0), ar=0.3), n)
Y3 = arima.sim(list(order=c(1,0,0), ar=0.6), n)
Y4 = arima.sim(list(order=c(1,0,0), ar=0.9), n)
# Random Walk
Y5 = cumsum(rnorm(n))

par(mfrow=c(3,1))
# Time plots
plot.ts(Y1,main="IID",lwd=2)
plot.ts(Y2,main="AR(1) with rho=0.3",lwd=2)
plot.ts(Y3,main="AR(1) with rho=0.6",lwd=2)
plot.ts(Y4,main="AR(1) with rho=0.9",lwd=2)
plot.ts(Y5,main="Random Walk",lwd=2)

# Autocorrelation functions
acf(Y1,main="IID"); 
acf(Y2,main="AR(1) with rho=0.3"); 
acf(Y3,main="AR(1) with rho=0.6"); 
acf(Y4,main="AR(1) with rho=0.9"); 
acf(Y5,main="Random Walk");

Conclusion

The concept of IID is fundamental in statistical analysis and machine learning models. This post has reviewed the IID in three different contexts: sampling, modelling, and predictability in Time Series Analysis. An application is presented, which compares the basic descriptive statistical properties of an IID time series with those of stationary AR(1) and a random walk.


Related Articles