
Bootstrap resampling techniques are an indispensable tool for studying the properties of an estimator, in particular its distribution and variability.
One of the cases in which it is most used is that of linear regression, where the estimators of interest are the angular coefficients of the regression line, whose distribution is estimated to verify its significance, especially when there is a heteroskedasticity problem or when the p-value is quite borderline.
I will show you two different Bootstrap resampling techniques by applying them with software R.
Bootstrap of statistical units
The bootstrap of statistical units consists of resampling the rows of our dataset as if they were equiprobable units of a finite population.
In this way, new B samples are generated on which estimate our linear regression model and evaluate the properties of the parameters of interest.
Let’s see immediately how to use this technique on R, estimating our model on the Prestige dataset of the Car package and applying the appropriate bootstrap algorithm.

Here is the output of the summary of our model.
It is immediately noted that in the residuals there could be an asymmetry on the left and that the income and education coefficients are significantly different from 0, while the intercept will be verified later with the distribution generated by the bootstrap.

From the residual plot, it is clear that there is a model specification problem, most likely there is no linear relationship between the two variables.
This problem must be solved by transforming the covariates or the target variable or by using a polynomial model.
Now let’s run the bootstrap algorithm and evaluate the standard errors and distributions.

The standard errors are very similar to those of the initial model and therefore we can conclude that the significances also remain such.
But let’s look at the distribution of bootstrap estimators.

The distributions of the intercept and the second coefficient are almost coincident to the Gaussian, this makes us sure about the effective validity of the asymptotic properties and therefore of the estimate of the two parameters and their significance.
As regards the first angular coefficient, a strong left asymmetry is noted, and thus the parameter estimated by the bootstrap is much more reliable and advisable than the one estimated by the initial model, although it is confirmed to be significantly different from zero from the interval of confidence BCA, which I will deepen in another article.
Bootstrap of residuals
Another technique for bootstrap regression is characterized by resampling the residuals.
It is used when, in certain studies or experiments, it is recommended to keep the covariates fixed.
Having initially estimated the model y * = f (x), the covariates and coefficients are kept fixed and the residuals are randomly resampled, as can be seen from the following formula.

Using residuals runs the risk of not getting correct estimates if the model is not correctly specified. In fact, we will now see that it will not be reliable for our initial model since there is a linearity problem.

The bootstrap of the residuals, in this case, consistently underestimates the standard error, thus overestimating the significance and the confidence intervals.
In conclusion
The bootstrap is a very powerful tool that any statistician and data scientist should have in their toolbox.
Just think of the use that can be made of it in the construction of robust models or, even better, in the construction of hypothesis tests that are not parametric and free from initial assumptions.
Thanks so much for reading.